Re: New Table Access Methods for Multi and Single Inserts

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: New Table Access Methods for Multi and Single Inserts
30+ messages / 5 participants
[nested] [flat]

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-03-21 04:14 Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-03-21 04:14 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Tue, Mar 19, 2024 at 10:40 AM Masahiko Sawada <[email protected]> wrote:
>
> I've not reviewed the patches in depth yet, but run performance tests
> for CREATE MATERIALIZED VIEW. The test scenarios is:

Thanks for looking into this.

> Is there any reason why we copy a buffer-heap tuple to another
> buffer-heap tuple? Which results in that we increments the buffer
> refcount and register it to ResourceOwner for every tuples. I guess
> that the destination tuple slot is not necessarily a buffer-heap, and
> we could use VirtualTupleTableSlot instead. It would in turn require
> copying a heap tuple. I might be missing something but it improved the
> performance at least in my env. The change I made was:
>
> -       dstslot = table_slot_create(state->rel, NULL);
> +       //dstslot = table_slot_create(state->rel, NULL);
> +       dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
> +                                    &TTSOpsVirtual);
> +
>
> And the execution times are:
> 1588.984 ms
> 1591.618 ms
> 1582.519 ms

Yes, usingVirtualTupleTableSlot helps improve the performance a lot.
Below are results from my testing. Note that CMV, RMV, CTAS stand for
CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, CREATE TABLE AS
respectively. These commands got faster by 62.54%, 68.87%, 74.31% or
2.67, 3.21, 3.89 times respectively. I've used the test case specified
at [1].

HEAD:
CMV:
Time: 6276.468 ms (00:06.276)
CTAS:
Time: 8141.632 ms (00:08.142)
RMV:
Time: 14747.139 ms (00:14.747)

PATCHED:
CMV:
Time: 2350.282 ms (00:02.350)
CTAS:
Time: 2091.427 ms (00:02.091)
RMV:
Time: 4590.180 ms (00:04.590)

I quickly looked at the description of what a "virtual" tuple is from
src/include/executor/tuptable.h [2]. IIUC, it is invented for
minimizing data copying, but it also says that it's the responsibility
of the generating plan node to be sure these resources are not
released for as long as the virtual tuple needs to be valid or is
materialized. While it says this, as far as this patch is concerned,
the virtual slot gets materialized when we copy the tuples from source
slot (can be any type of slot) to destination slot (which is virtual
slot). See ExecCopySlot->
tts_virtual_copyslot->tts_virtual_materialize. This way,
tts_virtual_copyslot ensures the tuples storage doesn't depend on
external memory because all the datums that aren't passed by value are
copied into the slot's memory context.

With the above understanding, it looks safe to use virtual slots for
the multi insert buffered slots. I'm not so sure if I'm missing
anything here.

[1]
cd $PWD/pg17/bin
rm -rf data logfile
./initdb -D data
./pg_ctl -D data -l logfile start

./psql -d postgres
\timing
drop table test cascade;
create unlogged table test (c int);
insert into test select generate_series(1, 10000000);
create materialized view test_mv as select * from test;
create table test_copy as select * from test;
insert into test select generate_series(1, 10000000);
refresh materialized view test_mv;

[2]
 * A "virtual" tuple is an optimization used to minimize physical data copying
 * in a nest of plan nodes.  Until materialized pass-by-reference Datums in
 * the slot point to storage that is not directly associated with the
 * TupleTableSlot; generally they will point to part of a tuple stored in a
 * lower plan node's output TupleTableSlot, or to a function result
 * constructed in a plan node's per-tuple econtext.  It is the responsibility
 * of the generating plan node to be sure these resources are not released for
 * as long as the virtual tuple needs to be valid or is materialized.  Note
 * also that a virtual tuple does not have any "system columns".

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-03-21 07:40 ` Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-03-21 07:40 UTC (permalink / raw)
  To: Masahiko Sawada <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Thu, Mar 21, 2024 at 9:44 AM Bharath Rupireddy
<[email protected]> wrote:
>
> Yes, usingVirtualTupleTableSlot helps improve the performance a lot.
> Below are results from my testing. Note that CMV, RMV, CTAS stand for
> CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, CREATE TABLE AS
> respectively. These commands got faster by 62.54%, 68.87%, 74.31% or
> 2.67, 3.21, 3.89 times respectively. I've used the test case specified
> at [1].
>
> HEAD:
> CMV:
> Time: 6276.468 ms (00:06.276)
> CTAS:
> Time: 8141.632 ms (00:08.142)
> RMV:
> Time: 14747.139 ms (00:14.747)
>
> PATCHED:
> CMV:
> Time: 2350.282 ms (00:02.350)
> CTAS:
> Time: 2091.427 ms (00:02.091)
> RMV:
> Time: 4590.180 ms (00:04.590)
>
> I quickly looked at the description of what a "virtual" tuple is from
> src/include/executor/tuptable.h [2]. IIUC, it is invented for
> minimizing data copying, but it also says that it's the responsibility
> of the generating plan node to be sure these resources are not
> released for as long as the virtual tuple needs to be valid or is
> materialized. While it says this, as far as this patch is concerned,
> the virtual slot gets materialized when we copy the tuples from source
> slot (can be any type of slot) to destination slot (which is virtual
> slot). See ExecCopySlot->
> tts_virtual_copyslot->tts_virtual_materialize. This way,
> tts_virtual_copyslot ensures the tuples storage doesn't depend on
> external memory because all the datums that aren't passed by value are
> copied into the slot's memory context.
>
> With the above understanding, it looks safe to use virtual slots for
> the multi insert buffered slots. I'm not so sure if I'm missing
> anything here.

I'm attaching the v13 patches using virtual tuple slots for buffered
tuples for multi inserts.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v13-0001-New-table-AMs-for-single-and-multi-inserts.patch (16.4K, 2-v13-0001-New-table-AMs-for-single-and-multi-inserts.patch)
  download | inline diff:
From d69e7f0810d78b80a63a5dff65425daa5e0731c7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 21 Mar 2024 07:02:43 +0000
Subject: [PATCH v13 1/4] New table AMs for single and multi inserts

---
 src/backend/access/heap/heapam.c         | 235 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 138 +++++++++++++
 4 files changed, 431 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625..fd2b3814dd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2442,6 +2443,240 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+								  &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b..84793f324e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2562,6 +2562,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..053be18110 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8249b37bbf..842cbdd16e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1451,6 +1502,93 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+	{
+		elog(ERROR, "table_insert_begin access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+	{
+		elog(ERROR, "table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+	{
+		elog(ERROR, "table_multi_insert_slots access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		elog(ERROR, "table_multi_insert_flush access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		elog(ERROR, "table_insert_end access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/octet-stream] v13-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch (2.7K, 3-v13-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch)
  download | inline diff:
From c760ac19397d022b898305f261c03a2924a4cda5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 21 Mar 2024 07:03:22 +0000
Subject: [PATCH v13 2/4] Optimize CREATE TABLE AS with multi inserts

---
 src/backend/commands/createas.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..7a4415c62f 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +590,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +608,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
-- 
2.34.1



  [application/octet-stream] v13-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch (2.9K, 4-v13-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch)
  download | inline diff:
From 951b655bd74ecf216d53cea8b05343fc6729f6ca Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 21 Mar 2024 07:03:58 +0000
Subject: [PATCH v13 3/4] Optimize REFRESH MATERIALIZED VIEW with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..cc9f8ad627 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -47,10 +47,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -454,13 +451,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +482,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -504,14 +496,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/octet-stream] v13-0004-Use-new-multi-insert-table-AM-for-COPY-FROM.patch (6.3K, 5-v13-0004-Use-new-multi-insert-table-AM-for-COPY-FROM.patch)
  download | inline diff:
From c1c62afbbeeb7f3f9b386a26dfbb6179c8de4a82 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 21 Mar 2024 07:04:31 +0000
Subject: [PATCH v13 4/4] Use new multi insert table AM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 8908a440e1..c2a81d4df1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -74,10 +74,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -220,14 +219,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -242,7 +258,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -319,8 +335,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -392,13 +406,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -406,18 +415,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -432,7 +430,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -490,20 +488,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -590,13 +583,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -612,6 +617,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-03-23 00:17   ` Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-03-23 00:17 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; Masahiko Sawada <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Thu, 2024-03-21 at 13:10 +0530, Bharath Rupireddy wrote:
> I'm attaching the v13 patches using virtual tuple slots for buffered
> tuples for multi inserts.

Comments:

* Do I understand correctly that CMV, RMV, and CTAS experience a
performance benefit, but COPY FROM does not? And is that because COPY
already used table_multi_insert, whereas CMV and RMV did not?

* In the COPY FROM code, it looks like it's deciding whether to flush
based on MAX_BUFFERED_TUPLES, but the slot array is allocated with
MAX_BUFFERED_SLOTS (they happen to be the same for heap, but perhaps
not for other AMs). The copy code shouldn't be using internal knowledge
of the multi-insert code; it should know somehow from the API when the
right time is to flush.

* How is the memory management expected to work? It looks like COPY
FROM is using the ExprContext when running the input functions, but we
really want to be using a memory context owned by the table AM, right?

* What's the point of the table_multi_insert_slots() "num_slots"
argument? The only caller simply discards it.

* table_tuple_insert_v2 isn't called anywhere, what's it for?

* the "v2" naming is inconsistent -- it seems you only added it in
places where there's a name conflict, which makes it hard to tell which
API methods go together. I'm not sure how widely table_multi_insert* is
used outside of core, so it's possible that we may even be able to just
change those APIs and the few extensions that call it can be updated.

* Memory usage tracking should be done in the AM by allocating
everything in a single context so it's easy to check the size. Don't
manually add up memory.

* I don't understand: "Caller may have got the slot using
heap_multi_insert_next_free_slot, filled it and passed. So, skip
copying in such a case." If the COPY FROM had a WHERE clause and
skipped a tuple after filling the slot, doesn't that mean the slot has
bogus data from the last tuple?

* We'd like this to work for insert-into-select (IIS) and logical
replication, too. Do you see any problem there, or is it just a matter
of code?

* Andres had some comments[1] that don't seem entirely addressed.
  - You are still allocating the AM-specific part of TableModifyState
as a separately-allocated chunk.
  - It's still called TableInsertState rather than TableModifyState as
he suggested. If you change that, you should also change to
table_modify_begin/end.
  - CID: I suppose Andres is considering the use case of "BEGIN; ...
ten thousand inserts ... COMMIT;". I don't think this problem is really
solvable (discussed below) but we should have some response/consensus
on that point.
  - He mentioned that we only need one new method implemented by the
AM. I don't know if one is enough, but 7 does seem excessive. I have
some simplification ideas below.

Overall:

If I understand correctly, there are two ways to use the API:

1. used by CTAS, MV:

  tistate = table_insert_begin(...);
  table_multi_insert_v2(tistate, tup1);
  ...
  table_multi_insert_v2(tistate, tupN);
  table_insert_end(tistate);

2. used by COPY ... FROM:

  tistate = table_insert_begin(..., SKIP_FLUSH);
  if (multi_insert_slot_array_is_full())
     table_multi_insert_flush(tistate);
  slot = table_insert_next_free_slot(tistate);
  ... fill slot with tup1
  table_multi_insert_v2(tistate, tup1);
  ...
  slot = table_insert_next_free_slot(tistate);
  ... fill slot with tupN
  table_multi_insert_v2(tistate, tupN);
  table_insert_end(tistate);

Those two uses need comments explaining what's going on. It appears the
SKIP_FLUSH flag is used to indicate which use the caller intends.

Use #2 is not enforced well by either the API or runtime checks. If the
caller neglects to check for a full buffer, it appears that it will
just overrun the slots array.

Also, for use #2, table_multi_insert_v2() doesn't do much other than
incrementing the memory used. The slot will never be NULL because it
was obtained with table_multi_insert_next_free_slot(), and the other
two branches don't happen when SKIP_FLUSH is true.

The real benefit to COPY of your new API is that the AM can manage
slots for itself, and how many tuples may be tracked (which might be a
lot higher for non-heap AMs).

I agree with Luc Vlaming's comment[2] that more should be left to the
table AM. Your patch tries too hard to work with the copyfrom.c slot
array, somehow sharing it with the table AM. That adds complexity to
the API and feels like a layering violation.

We also shouldn't mandate a slot array in the API. Each slot is 64
bytes -- a lot of overhead for small tuples. For a non-heap AM, it's
much better to store the tuple data in a big contiguous chunk with
minimal overhead.

Let's just have a simple API like:

  tmstate = table_modify_begin(...);
  table_modify_save_insert(tmstate, tup1);
  ...
  table_modify_save_insert(tmstate, tupN);
  table_modify_end(tmstate);

and leave it up to the AM to do all the buffering and flushing work (as
Luc Vlaming suggested[2]).

That leaves one problem, which is: how do we update the indexes and
call AR triggers while flushing? I think the best way is to just have a
callback in the TableModifyState that is called during flush. (I don't
think that would affect performance, but worth double-checking.)

We have to disable this whole multi-insert mechanism if there are
volatile BR/AR triggers, because those are supposed to see already-
inserted tuples. That's not a problem with your patch but it is a bit
unfortunate -- triggers can be costly already, but this increases the
penalty. There may be some theoretical ways to avoid this problem, like
reading tuples out of the unflushed buffer during a SELECT, which
sounds a little too clever (though perhaps not completely crazy if the
AM is in control of both?).

For potentially working with multi-updates/deletes, it might be as
simple as tracking the old TIDs along with the slots and having new
_save_update and _save_delete methods. I haven't thought deeply about
that, and I'm not sure we have a good example AM to work with, but it
seems plausible that we could make something useful here.

To batch multiple different INSERT statements within a transaction just
seems like a really hard problem. That could mean different CIDs, but
also different subtransaction IDs. Constraint violation errors will
happen at the time of flushing, which could be many commands later from
the one that actually violates the constraint. And what if someone
issues a SELECT in the middle of the transaction, how does it see the
already-inserted-but-not-flushed tuples? If that's not hard enough
already, then you would also need to extend low-level APIs to accept
arbitrary CIDs and subxact IDs when storing tuples during a flush. The
only way I could imagine solving all of these problems is declaring
somehow that your transaction won't do any of these complicated things,
and that you don't mind getting constraint violations at the wrong
time. So I recommend that you punt on this problem.

Regards,
	Jeff Davis

[1]
https://www.postgresql.org/message-id/20230603223824.o7iyochli2dwwi7k%40alap3.anarazel.de
[2]
https://www.postgresql.org/message-id/508af801-6356-d36b-1867-011ac6df8f55%40swarm64.com

^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
@ 2024-03-25 19:58     ` Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-03-25 19:58 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Sat, Mar 23, 2024 at 5:47 AM Jeff Davis <[email protected]> wrote:
>
> Comments:

Thanks for looking into it.

> * Do I understand correctly that CMV, RMV, and CTAS experience a
> performance benefit, but COPY FROM does not? And is that because COPY
> already used table_multi_insert, whereas CMV and RMV did not?

Yes, that's right. COPY FROM is already optimized with multi inserts.

I now have a feeling that I need to simplify the patches. I'm thinking
of dropping the COPY FROM patch using the new multi insert API for the
following reasons:
1. We can now remove some of the new APIs (table_multi_insert_slots
and table_multi_insert_next_free_slot) that were just invented for
COPY FROM.
2. COPY FROM is already optimized with multi inserts, so no real gain
is expected with the new multi insert API.
3. As we are inching towards feature freeze, simplifying the patches
by having only the necessary things increases the probability of
getting this in.
4. The real benefit of this whole new multi insert API is seen if used
for the commands CMV, RMV, CTAS. These commands got faster by 62.54%,
68.87%, 74.31% or 2.67, 3.21, 3.89 times respectively.
5. This leaves with really simple APIs. No need for callback stuff for
dealing with indexes, triggers etc. as CMV, RMV, CTAS cannot have any
of them.

The new APIs are more extensible, memory management is taken care of
by AM, and with TableModifyState as the structure name and more
meaningful API names. The callback for triggers/indexes etc. aren't
taken care of as I'm now only focusing on CTAS, CMV, RMV
optimizations.

Please see the attached v14 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v14-0001-Introduce-table-modify-access-methods.patch (12.5K, 2-v14-0001-Introduce-table-modify-access-methods.patch)
  download | inline diff:
From 2de89705c6b2d03020988db0cc8857a0bf19b38e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 25 Mar 2024 07:09:25 +0000
Subject: [PATCH v14 1/3] Introduce table modify access methods

---
 src/backend/access/heap/heapam.c         | 163 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   6 +
 src/include/access/heapam.h              |  48 +++++++
 src/include/access/tableam.h             | 103 ++++++++++++++
 src/tools/pgindent/typedefs.list         |   4 +
 5 files changed, 324 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625..d1ef2464ef 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2442,6 +2443,168 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(TableModifyKind kind, Relation rel, int flags)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	state = palloc0(sizeof(TableModifyState));
+	state->kind = kind;
+	state->rel = rel;
+	state->flags = flags;
+	state->mctx = context;
+
+	if (kind == TM_KIND_INSERT)
+	{
+		HeapInsertState *istate;
+
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			HeapMultiInsertState *mistate;
+
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+	}
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state, CommandId cid,
+						  int options, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state->kind == TM_KIND_INSERT);
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state, cid, options);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state, CommandId cid,
+						 int options)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state->kind == TM_KIND_INSERT);
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  cid, options, istate->bistate);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->kind == TM_KIND_INSERT)
+	{
+		HeapInsertState *istate = (HeapInsertState *) state->data;
+
+		if (istate->mistate != NULL)
+		{
+			HeapMultiInsertState *mistate = istate->mistate;
+
+			Assert(mistate->cur_slots == 0 &&
+				   mistate->cur_size == 0);
+
+			for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+				ExecDropSingleTupleTableSlot(mistate->slots[i]);
+		}
+
+		if (istate->bistate != NULL)
+			FreeBulkInsertState(istate->bistate);
+	}
+
+	MemoryContextDelete(state->mctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2b7c702642..4437425de9 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2564,6 +2564,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..2b526550df 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,38 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +307,22 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(TableModifyKind kind,
+										   Relation rel,
+										   int flags);
+
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  CommandId cid,
+									  int options,
+									  TupleTableSlot *slot);
+
+extern void heap_modify_buffer_flush(TableModifyState *state,
+									 CommandId cid,
+									 int options);
+
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 65834caeb1..3fc6d93555 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,33 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Holds table modify kind */
+typedef enum TableModifyKind
+{
+	TM_KIND_NONE,
+	TM_KIND_INSERT
+} TableModifyKind;
+
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	TableModifyKind kind;
+	Relation	rel;
+	int			flags;
+	MemoryContext mctx;
+
+	/* Table AM specific data starts here */
+	void	   *data;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +549,21 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableModifyState *(*tuple_modify_begin) (TableModifyKind kind,
+											 Relation rel,
+											 int flags);
+
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   CommandId cid,
+											   int options,
+											   TupleTableSlot *slot);
+
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state,
+											  CommandId cid,
+											  int options);
+
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1462,6 +1504,67 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableModifyState *
+table_modify_begin(TableModifyKind kind, Relation rel, int flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_modify_begin)
+	{
+		return rel->rd_tableam->tuple_modify_begin(kind, rel, flags);
+	}
+	else
+	{
+		elog(ERROR, "table_modify_begin access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, CommandId cid,
+						   int options, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state,
+														   cid,
+														   options,
+														   slot);
+	}
+	else
+		elog(ERROR, "table_modify_buffer_insert access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state, CommandId cid,
+						  int options)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_flush)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_flush(state,
+														  cid,
+														  options);
+	}
+	else
+		elog(ERROR, "table_modify_buffer_flush access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else
+		elog(ERROR, "table_modify_end access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e2a0525dd4..8396ec4ff0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1122,6 +1122,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2808,6 +2810,8 @@ TableFuncScan
 TableFuncScanState
 TableInfo
 TableLikeClause
+TableModifyKind
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v14-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch (2.4K, 3-v14-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch)
  download | inline diff:
From 83d4b28c44aa02b1a6ac414998128b14e7fb6193 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 25 Mar 2024 07:09:47 +0000
Subject: [PATCH v14 2/3] Optimize CREATE TABLE AS with multi inserts

---
 src/backend/commands/createas.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..e11af16523 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -55,7 +55,7 @@ typedef struct
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -560,9 +560,12 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(TM_KIND_INSERT,
+											 intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +593,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate,
+								   myState->output_cid,
+								   myState->ti_options,
+								   slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -613,8 +615,9 @@ intorel_shutdown(DestReceiver *self)
 
 	if (!into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		table_modify_buffer_flush(myState->mstate, myState->output_cid,
+								  myState->ti_options);
+		table_modify_end(myState->mstate);
 	}
 
 	/* close rel, but keep lock until commit */
-- 
2.34.1



  [application/x-patch] v14-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch (2.5K, 4-v14-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch)
  download | inline diff:
From 051b4015942475b84b204846b00941dc752995d3 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 25 Mar 2024 07:10:07 +0000
Subject: [PATCH v14 3/3] Optimize REFRESH MATERIALIZED VIEW with multi inserts

---
 src/backend/commands/matview.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..cabc5bc80b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -50,7 +50,7 @@ typedef struct
 	Relation	transientrel;	/* relation to write to */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -460,7 +460,10 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	myState->transientrel = transientrel;
 	myState->output_cid = GetCurrentCommandId(true);
 	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(TM_KIND_INSERT,
+										 transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +488,10 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate,
+							   myState->output_cid,
+							   myState->ti_options,
+							   slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +506,9 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_buffer_flush(myState->mstate, myState->output_cid,
+							  myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-03-26 15:37       ` Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-03-26 15:37 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote:
> I'm thinking
> of dropping the COPY FROM patch using the new multi insert API for
> the
> following reasons: ...

I agree with all of this. We do want COPY ... FROM support, but there
are some details to work out and we don't want to make a big code
change at this point in the cycle.

> The new APIs are more extensible, memory management is taken care of
> by AM, and with TableModifyState as the structure name and more
> meaningful API names. The callback for triggers/indexes etc. aren't
> taken care of as I'm now only focusing on CTAS, CMV, RMV
> optimizations.
> 
> Please see the attached v14 patches.

* No need for a 'kind' field in TableModifyState. The state should be
aware of the kinds of changes that it has received and that may need to
be flushed later -- for now, only inserts, but possibly updates/deletes
in the future.

* If the AM doesn't support the bulk methods, fall back to retail
inserts instead of throwing an error.

* It seems like this API will eventually replace table_multi_insert and
table_finish_bulk_insert completely. Do those APIs have any advantage
remaining over the new one proposed here?

* Right now I don't any important use of the flush method. It seems
that could be accomplished in the finish method, and flush could just
be an internal detail when the memory is exhausted. If we find a use
for it later, we can always add it, but right now it seems unnecessary.

* We need to be careful about cases where the command can be successful
but the writes are not flushed. I don't tihnk that's a problem with the
current patch, but we will need to do something here when we expand to
INSERT INTO ... SELECT.

Andres, is this patch overall closer to what you had in mind in the
email here:

https://www.postgresql.org/message-id/[email protected]

?

Regards,
	Jeff Davis

^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
@ 2024-03-26 19:49         ` Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-03-26 19:49 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Tue, Mar 26, 2024 at 9:07 PM Jeff Davis <[email protected]> wrote:
>
> On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote:
> > I'm thinking
> > of dropping the COPY FROM patch using the new multi insert API for
> > the
> > following reasons: ...
>
> I agree with all of this. We do want COPY ... FROM support, but there
> are some details to work out and we don't want to make a big code
> change at this point in the cycle.

Right.

> > Please see the attached v14 patches.
>
> * No need for a 'kind' field in TableModifyState. The state should be
> aware of the kinds of changes that it has received and that may need to
> be flushed later -- for now, only inserts, but possibly updates/deletes
> in the future.

Removed 'kind' field with lazy initialization of required AM specific
modify (insert in this case) state. Since we don't have 'kind', I
chose the callback approach to cleanup the modify (insert in this
case) specific state at the end.

> * If the AM doesn't support the bulk methods, fall back to retail
> inserts instead of throwing an error.

For instance, CREATE MATERIALIZED VIEW foo_mv AS SELECT * FROM foo
USING bar_tam; doesn't work if bar_tam doesn't have the
table_tuple_insert implemented.

Similarly, with this new AM, the onus lies on the table AM
implementers to provide an implementation for these new AMs even if
they just do single inserts. But, I do agree that we must catch this
ahead during parse analysis itself, so I've added assertions in
GetTableAmRoutine().

> * It seems like this API will eventually replace table_multi_insert and
> table_finish_bulk_insert completely. Do those APIs have any advantage
> remaining over the new one proposed here?

table_multi_insert needs to be there for sure as COPY ... FROM uses
it. Not sure if we need to remove the optional callback
table_finish_bulk_insert though. Heap AM doesn't implement one, but
some other AM might. Having said that, with this new AM, whatever the
logic that used to be there in table_finish_bulk_insert previously,
table AM implementers will have to move them to table_modify_end.

FWIW, I can try writing a test table AM that uses this new AM but just
does single inserts, IOW, equivalent to table_tuple_insert().
Thoughts?

> * Right now I don't any important use of the flush method. It seems
> that could be accomplished in the finish method, and flush could just
> be an internal detail when the memory is exhausted. If we find a use
> for it later, we can always add it, but right now it seems unnecessary.

Firstly, we are not storing CommandId and options in TableModifyState,
because we expect CommandId to be changing (per Andres comment).
Secondly, we don't want to pass just the CommandId and options to
table_modify_end(). Thirdly, one just has to call the
table_modify_buffer_flush before the table_modify_end. Do you have any
other thoughts here?

> * We need to be careful about cases where the command can be successful
> but the writes are not flushed. I don't tihnk that's a problem with the
> current patch, but we will need to do something here when we expand to
> INSERT INTO ... SELECT.

You mean, writes are not flushed to the disk? Can you please elaborate
why it's different for INSERT INTO ... SELECT and not others? Can't
the new flush AM be helpful here to implement any flush related
things?

Please find the attached v15 patches with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v15-0001-Introduce-new-table-modify-access-methods.patch (12.7K, 2-v15-0001-Introduce-new-table-modify-access-methods.patch)
  download | inline diff:
From 99f5814ca8f561b09777dd0e7e06b2b0198751f3 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 26 Mar 2024 19:33:29 +0000
Subject: [PATCH v15 1/3] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c         | 178 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableamapi.c    |   5 +
 src/include/access/heapam.h              |  45 ++++++
 src/include/access/tableam.h             |  67 +++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 6 files changed, 303 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9a8c8e3348..f0d5cf5b5a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2439,6 +2440,181 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int flags)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->flags = flags;
+	state->mctx = context;
+	state->end_cb = NULL;		/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state, CommandId cid,
+						  int options, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->end_cb = heap_modify_insert_end;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state, cid, options);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state, CommandId cid,
+						 int options)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  cid, options, istate->bistate);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Heap insert specific callback used for cleaning up the insert state and
+ * buffered slots.
+ */
+static void
+heap_modify_insert_end(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->end_cb != NULL)
+		state->end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6abfe36dec..52ccf8377f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2622,6 +2622,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index ce637a5a5d..3104d19fd8 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -64,6 +64,11 @@ GetTableAmRoutine(Oid amhandler)
 
 	Assert(routine->tuple_insert != NULL);
 
+	Assert(routine->tuple_modify_begin != NULL);
+	Assert(routine->tuple_modify_buffer_insert != NULL);
+	Assert(routine->tuple_modify_buffer_flush != NULL);
+	Assert(routine->tuple_modify_insert_end != NULL);
+
 	/*
 	 * Could be made optional, but would require throwing error during
 	 * parse-analysis.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index f112245373..26403342db 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -233,6 +233,36 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -283,6 +313,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int flags);
+
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  CommandId cid,
+									  int options,
+									  TupleTableSlot *slot);
+
+extern void heap_modify_buffer_flush(TableModifyState *state,
+									 CommandId cid,
+									 int options);
+
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, int options,
 							 struct TM_FailureData *tmfd, bool changingPart,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index fc0e702715..109cbb769b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,32 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCP) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			flags;
+	MemoryContext mctx;
+
+	/* Table AM specific data starts here */
+	void	   *data;
+
+	TableModifyEndCP end_cb;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -531,6 +557,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int flags);
+
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   CommandId cid,
+											   int options,
+											   TupleTableSlot *slot);
+
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state,
+											  CommandId cid,
+											  int options);
+
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1473,6 +1513,33 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableModifyState *
+table_modify_begin(Relation rel, int flags)
+{
+	return rel->rd_tableam->tuple_modify_begin(rel, flags);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, CommandId cid,
+						   int options, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_insert(state, cid,
+													   options, slot);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state, CommandId cid,
+						  int options)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_flush(state, cid, options);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_end(state);
+}
+
 /*
  * Delete a tuple (and optionally lock the last tuple version).
  *
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index cfa9d5aaea..8ce8aae955 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1122,6 +1122,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2808,6 +2810,7 @@ TableFuncScan
 TableFuncScanState
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/octet-stream] v15-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch (2.4K, 3-v15-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch)
  download | inline diff:
From 1fe2bce6be41f587be83a839fcb605f853e3192d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 26 Mar 2024 19:33:51 +0000
Subject: [PATCH v15 2/3] Optimize CREATE TABLE AS with multi inserts

---
 src/backend/commands/createas.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..0201f81624 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -55,7 +55,7 @@ typedef struct
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -560,9 +560,11 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +592,10 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate,
+								   myState->output_cid,
+								   myState->ti_options,
+								   slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -613,8 +614,9 @@ intorel_shutdown(DestReceiver *self)
 
 	if (!into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		table_modify_buffer_flush(myState->mstate, myState->output_cid,
+								  myState->ti_options);
+		table_modify_end(myState->mstate);
 	}
 
 	/* close rel, but keep lock until commit */
-- 
2.34.1



  [application/octet-stream] v15-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch (2.4K, 4-v15-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch)
  download | inline diff:
From a157ab25f43428681c5d9016e9555612617e4d9b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 26 Mar 2024 19:34:29 +0000
Subject: [PATCH v15 3/3] Optimize REFRESH MATERIALIZED VIEW with multi inserts

---
 src/backend/commands/matview.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..560a359de3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -50,7 +50,7 @@ typedef struct
 	Relation	transientrel;	/* relation to write to */
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -460,7 +460,9 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	myState->transientrel = transientrel;
 	myState->output_cid = GetCurrentCommandId(true);
 	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +487,10 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate,
+							   myState->output_cid,
+							   myState->ti_options,
+							   slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +505,9 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_buffer_flush(myState->mstate, myState->output_cid,
+							  myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-03-27 08:12           ` Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-03-27 08:12 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Wed, 2024-03-27 at 01:19 +0530, Bharath Rupireddy wrote:
> 
> Similarly, with this new AM, the onus lies on the table AM
> implementers to provide an implementation for these new AMs even if
> they just do single inserts.

Why not fall back to using the plain tuple_insert? Surely some table
AMs might be simple and limited, and we shouldn't break them just
because they don't implement the new APIs.

> 
> table_multi_insert needs to be there for sure as COPY ... FROM uses
> it.

After we have these new APIs fully in place and used by COPY, what will
happen to those other APIs? Will they be deprecated or will there be a
reason to keep them?

> FWIW, I can try writing a test table AM that uses this new AM but
> just
> does single inserts, IOW, equivalent to table_tuple_insert().
> Thoughts?

More table AMs to test against would be great, but I also know that can
be a lot of work.

> 
> Firstly, we are not storing CommandId and options in
> TableModifyState,
> because we expect CommandId to be changing (per Andres comment).

Trying to make this feature work across multiple commands poses a lot
of challenges: what happens when there are SELECTs and subtransactions
and non-deferrable constraints?

Regardless, if we care about multiple CIDs, they should be stored along
with the tuples, not supplied at the time of flushing.

> You mean, writes are not flushed to the disk? Can you please
> elaborate
> why it's different for INSERT INTO ... SELECT and not others? Can't
> the new flush AM be helpful here to implement any flush related
> things?

Not a major problem. We can discuss while working on IIS support.


I am concnerned that the flush callback is not a part of the API. We
will clearly need that to support index insertions for COPY/IIS, so as-
is the API feels incomplete. Thoughts?

Regards,
	Jeff Davis








^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
@ 2024-03-31 15:48             ` Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-03-31 15:48 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Wed, Mar 27, 2024 at 1:42 PM Jeff Davis <[email protected]> wrote:
>
> On Wed, 2024-03-27 at 01:19 +0530, Bharath Rupireddy wrote:
> >
> > Similarly, with this new AM, the onus lies on the table AM
> > implementers to provide an implementation for these new AMs even if
> > they just do single inserts.
>
> Why not fall back to using the plain tuple_insert? Surely some table
> AMs might be simple and limited, and we shouldn't break them just
> because they don't implement the new APIs.

Hm. That might complicate table_modify_begin,
table_modify_buffer_insert and table_modify_end a bit. What do we put
in TableModifyState then? Do we create the bulk insert state
(BulkInsertStateData) outside? I think to give a better interface, can
we let TAM implementers support these new APIs in their own way? If
this sounds rather intrusive, we can just implement the fallback to
tuple_insert if these new API are not supported in the caller, for
example, do something like below in createas.c and matview.c.
Thoughts?

if (table_modify_buffer_insert() is defined)
   table_modify_buffer_insert(...);
else
{
  myState->bistate = GetBulkInsertState();
  table_tuple_insert(...);
}

> > table_multi_insert needs to be there for sure as COPY ... FROM uses
> > it.
>
> After we have these new APIs fully in place and used by COPY, what will
> happen to those other APIs? Will they be deprecated or will there be a
> reason to keep them?

Deprecated perhaps?

Please find the attached v16 patches for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v16-0001-Introduce-new-table-modify-access-methods.patch (12.6K, 2-v16-0001-Introduce-new-table-modify-access-methods.patch)
  download | inline diff:
From 85410b429917cf388c4b58883ddc304118c73143 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sun, 31 Mar 2024 15:34:16 +0000
Subject: [PATCH v16 1/2] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c         | 189 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   4 +
 src/include/access/heapam.h              |  41 +++++
 src/include/access/tableam.h             |  58 +++++++
 src/tools/pgindent/typedefs.list         |   3 +
 6 files changed, 299 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..69f8c597d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,8 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_buffer_flush(TableModifyState *state);
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2441,6 +2443,191 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	*insert_indexes = true;
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags, CommandId cid,
+				  int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_cb = heap_modify_insert_end;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+static void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	if (mistate->cur_slots == 0)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate,
+					  &state->insert_indexes);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_cb != NULL)
+		state->modify_end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 41a4bb0981..3d38da635d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2649,6 +2649,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index d9e23ef317..80d923bbdc 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -66,6 +66,10 @@ GetTableAmRoutine(Oid amhandler)
 
 	Assert(routine->tuple_insert != NULL);
 
+	Assert(routine->tuple_modify_begin != NULL);
+	Assert(routine->tuple_modify_buffer_insert != NULL);
+	Assert(routine->tuple_modify_end != NULL);
+
 	/*
 	 * Could be made optional, but would require throwing error during
 	 * parse-analysis.
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 32a3fbce96..4adfc1fb35 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -233,6 +233,36 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -283,6 +313,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate, bool *insert_indexes);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options);
+
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, int options,
 							 struct TM_FailureData *tmfd, bool changingPart,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index cf76fc29d4..de50f51078 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -248,6 +248,35 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCP) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mctx;
+	CommandId	cid;
+	int			options;
+	bool		insert_indexes;
+
+	/* Table AM specific data starts here */
+	void	   *data;
+
+	TableModifyEndCP modify_end_cb;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -534,6 +563,16 @@ typedef struct TableAmRoutine
 								 CommandId cid, int options, struct BulkInsertStateData *bistate,
 								 bool *insert_indexes);
 
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options);
+
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1464,6 +1503,25 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate, insert_indexes);
 }
 
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options)
+{
+	return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+											   cid, options);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_end(state);
+}
+
 /*
  * Delete a tuple (and optionally lock the last tuple version).
  *
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8d7bed411..f77c322709 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1122,6 +1122,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2809,6 +2811,7 @@ TableFuncScan
 TableFuncScanState
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v16-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch (5.5K, 3-v16-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch)
  download | inline diff:
From 32050367825d5f8dbb1330cc4f8ef7818eb544ed Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sun, 31 Mar 2024 15:34:57 +0000
Subject: [PATCH v16 2/2] Optimize CTAS, CMV, RMV with multi inserts

---
 src/backend/commands/createas.c | 27 +++++++++------------------
 src/backend/commands/matview.c  | 26 +++++++++-----------------
 2 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index afd3dace07..00c1271f93 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -578,7 +578,6 @@ static bool
 intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	bool		insertIndexes;
 
 	/* Nothing to insert if WITH NO DATA is specified. */
 	if (!myState->into->skipData)
@@ -591,12 +590,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate,
-						   &insertIndexes);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -614,10 +608,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ec13d0984..f03aa1cff3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,12 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -476,7 +477,6 @@ static bool
 transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
-	bool		insertIndexes;
 
 	/*
 	 * Note that the input slot might not be of the type of the target
@@ -486,13 +486,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate,
-					   &insertIndexes);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,9 +501,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-04-02 19:40               ` Jeff Davis <[email protected]>
  2024-04-03 09:02                 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 30+ messages in thread

From: Jeff Davis @ 2024-04-02 19:40 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Sun, 2024-03-31 at 21:18 +0530, Bharath Rupireddy wrote:
> if (table_modify_buffer_insert() is defined)
>    table_modify_buffer_insert(...);
> else
> {
>   myState->bistate = GetBulkInsertState();
>   table_tuple_insert(...);
> }

We can't alloc/free the bulk insert state for every insert call. I see
two options:

* Each caller needs to support two code paths: if the buffered insert
APIs are defined, then use those; otherwise the caller needs to manage
the bulk insert state itself and call the plain insert API.

* Have default implementation for the new API methods, so that the
default for the begin method would allocate the bulk insert state, and
the default for the buffered insert method would be to call plain
insert using the bulk insert state.

I'd prefer the latter, at least in the long term. But I haven't really
thought through the details, so perhaps we'd need to use the former.

> > 
> > After we have these new APIs fully in place and used by COPY, what
> > will
> > happen to those other APIs? Will they be deprecated or will there
> > be a
> > reason to keep them?
> 
> Deprecated perhaps?

Including Alexander on this thread, because he's making changes to the
multi-insert API. We need some consensus on where we are going with
these APIs before we make more changes, and what incremental steps make
sense in v17.

Here's where I think this API should go:

1. Have table_modify_begin/end and table_modify_buffer_insert, like
those that are implemented in your patch.

2. Add some kind of flush callback that will be called either while the
tuples are being flushed or after the tuples are flushed (but before
they are freed by the AM). (Aside: do we need to call it while the
tuples are being flushed to get the right visibility semantics for
after-row triggers?)

3. Add table_modify_buffer_{update|delete} APIs.

4. Some kind of API tweaks to help manage memory when modifying
pertitioned tables, so that the buffering doesn't get out of control.
Perhaps just reporting memory usage and allowing the caller to force
flushes would be enough.

5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
fairly straightforward, I believe, and handled by your patch. Indexes
are (re)built afterward, and no triggers are possible.

6. Use these new methods for CREATE TABLE ... AS. This is fairly
straightforward, I believe, and handled by your patch. No indexes or
triggers are possible.

7. Use these new methods for COPY. We have to be careful to avoid
regressions for the heap method, because it's already managing its own
buffers. If the AM manages the buffering, then it may require
additional copying of slots, which could be a disadvantage. To solve
this, we may need some minor API tweaks to avoid copying when the
caller guarantees that the memory will not be freed to early, or
perhaps expose the AM's memory context to copyfrom.c. Another thing to
consider is that the buffering in copyfrom.c is also used for FDWs, so
that buffering code path needs to be preserved in copyfrom.c even if
not used for AMs.

8. Use these new methods for INSERT INTO ... SELECT. One potential
challenge here is that execution nodes are not always run to
completion, so we need to be sure that the flush isn't forgotten in
that case.

9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
the buffer_insert/update/delete APIs; we don't need a separate merge
method. This probably requires that the AM maintain 3 separate buffers
to distinguish different kinds of changes at flush time (obviously
these can be initialized lazily to avoid overhead when not being used).

10. Use these new methods for logical apply.

11. Deprecate the multi_insert API.

Thoughts on this plan? Does your patch make sense in v17 as a stepping
stone, or should we try to make all of these API changes together in
v18?

Also, a sample AM code would be a huge benefit here. Writing a real AM
is hard, but perhaps we can at least have an example one to demonstrate
how to use these APIs?

Regards,
	Jeff Davis

^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
@ 2024-04-03 09:02                 ` Bharath Rupireddy <[email protected]>
  2024-04-03 12:25                   ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-04-03 09:02 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <[email protected]> wrote:
>
> On Sun, 2024-03-31 at 21:18 +0530, Bharath Rupireddy wrote:
> > if (table_modify_buffer_insert() is defined)
> >    table_modify_buffer_insert(...);
> > else
> > {
> >   myState->bistate = GetBulkInsertState();
> >   table_tuple_insert(...);
> > }
>
> We can't alloc/free the bulk insert state for every insert call. I see
> two options:
>
> * Each caller needs to support two code paths: if the buffered insert
> APIs are defined, then use those; otherwise the caller needs to manage
> the bulk insert state itself and call the plain insert API.
>
> * Have default implementation for the new API methods, so that the
> default for the begin method would allocate the bulk insert state, and
> the default for the buffered insert method would be to call plain
> insert using the bulk insert state.
>
> I'd prefer the latter, at least in the long term. But I haven't really
> thought through the details, so perhaps we'd need to use the former.

I too prefer the latter so that the caller doesn't have to have two
paths. The new API can just transparently fallback to single inserts.
I've implemented that in the attached v17 patch. I also tested the
default APIs manually, but I'll see if I can add some tests to it the
default API.

> > > After we have these new APIs fully in place and used by COPY, what
> > > will
> > > happen to those other APIs? Will they be deprecated or will there
> > > be a
> > > reason to keep them?
> >
> > Deprecated perhaps?
>
> Including Alexander on this thread, because he's making changes to the
> multi-insert API. We need some consensus on where we are going with
> these APIs before we make more changes, and what incremental steps make
> sense in v17.
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.
>
> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)
>
> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.
>
> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.
>
> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.
>
> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).
>
> 10. Use these new methods for logical apply.
>
> 11. Deprecate the multi_insert API.
>
> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

I'd like to see the new multi insert API (as proposed in the v17
patches) for PG17 if possible. The basic idea with these new APIs is
to let the AM implementers choose the right buffered insert strategy
(one can choose the AM specific slot type to buffer the tuples, choose
the AM specific memory and flushing decisions etc.). Another advantage
with these new multi insert API is that the CREATE MATERIALIZED VIEW,
REFRESH MATERIALIZED VIEW, CREATE TABLE AS commands for heap AM got
faster by 62.54%, 68.87%, 74.31% or 2.67, 3.21, 3.89 times
respectively. The performance improvement in REFRESH MATERIALIZED VIEW
can benefit customers running analytical workloads on postgres.

I'm fine if we gradually add more infrastructure to support COPY,
INSERT INTO SELECT, Logical Replication Apply, Table Rewrites in
future releases. I'm sure it requires a lot more thoughts and time.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The heap AM implements this new API. Also, there's a default
implementation for the new API falling back on to single inserts.
Aren't these sufficient to help AM implementers to come up with their
own implementations?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v17-0001-Introduce-new-table-modify-access-methods.patch (17.6K, 2-v17-0001-Introduce-new-table-modify-access-methods.patch)
  download | inline diff:
From 4e349a0d877a48ff4068f776e65dcfec49e96356 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 3 Apr 2024 08:36:50 +0000
Subject: [PATCH v17 1/2] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c         | 189 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableam.c       |  86 +++++++++++
 src/backend/access/table/tableamapi.c    |   8 +
 src/include/access/heapam.h              |  41 +++++
 src/include/access/tableam.h             | 106 +++++++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 7 files changed, 437 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..69f8c597d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,8 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_buffer_flush(TableModifyState *state);
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2441,6 +2443,191 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	*insert_indexes = true;
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags, CommandId cid,
+				  int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_cb = heap_modify_insert_end;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+static void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	if (mistate->cur_slots == 0)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate,
+					  &state->insert_indexes);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_cb != NULL)
+		state->modify_end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245..f3aa29851d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2638,6 +2638,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 805d222ceb..4c7b5433ec 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -21,6 +21,7 @@
 
 #include <math.h>
 
+#include "access/heapam.h"		/* just for BulkInsertState */
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
+#include "utils/memutils.h"
 
 /*
  * Constants to control the behavior of block allocation to parallel workers
@@ -48,6 +50,7 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+static void default_table_modify_insert_end(TableModifyState *state);
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -772,3 +775,86 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 	else
 		*allvisfrac = (double) relallvisible / curpages;
 }
+
+/*
+ * Initialize default table modify state.
+ */
+TableModifyState *
+default_table_modify_begin(Relation rel, int modify_flags, CommandId cid,
+						   int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"default_table_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Default table modify implementation for inserts.
+ */
+void
+default_table_modify_buffer_insert(TableModifyState *state,
+								   TupleTableSlot *slot)
+{
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize default table modify state */
+	if (state->data == NULL)
+	{
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			state->data = (BulkInsertState) GetBulkInsertState();
+
+		state->modify_end_cb = default_table_modify_insert_end;
+	}
+
+	/* Fallback to table AM single insert routine */
+	table_tuple_insert(state->rel,
+					   slot,
+					   state->cid,
+					   state->options,
+					   (BulkInsertState) state->data,
+					   &state->insert_indexes);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Default table modify insert specific callback used for performing work at
+ * the end like cleaning up the bulk insert state.
+ */
+static void
+default_table_modify_insert_end(TableModifyState *state)
+{
+	if (state->data != NULL)
+		FreeBulkInsertState((BulkInsertState) state->data);
+}
+
+/*
+ * Clean default table modify state.
+ */
+void
+default_table_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_cb != NULL)
+		state->modify_end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 55b8caeadf..9c095b93e7 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -95,6 +95,14 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	/* optional, but either all of them are defined or none. */
+	Assert((routine->tuple_modify_begin == NULL &&
+			routine->tuple_modify_buffer_insert == NULL &&
+			routine->tuple_modify_end == NULL) ||
+		   (routine->tuple_modify_begin != NULL &&
+			routine->tuple_modify_buffer_insert != NULL &&
+			routine->tuple_modify_end != NULL));
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c..b35ba5509b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -236,6 +236,36 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -286,6 +316,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate, bool *insert_indexes);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options);
+
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, int options,
 							 struct TM_FailureData *tmfd, bool changingPart,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2c1a540155..71b823af66 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -248,6 +248,35 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCP) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mctx;
+	CommandId	cid;
+	int			options;
+	bool		insert_indexes;
+
+	/* Table AM specific data starts here */
+	void	   *data;
+
+	TableModifyEndCP modify_end_cb;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -584,6 +613,18 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1604,6 +1645,71 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+extern TableModifyState *default_table_modify_begin(Relation rel, int modify_flags,
+													CommandId cid, int options);
+extern void	default_table_modify_buffer_insert(TableModifyState *state,
+											   TupleTableSlot *slot);
+extern void default_table_modify_end(TableModifyState *state);
+
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options)
+{
+	if (rel->rd_tableam &&
+		rel->rd_tableam->tuple_modify_begin != NULL)
+	{
+		return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+												   cid, options);
+	}
+	else if (rel->rd_tableam &&
+			 rel->rd_tableam->tuple_modify_begin == NULL)
+	{
+		/* Fallback to a default implementation */
+		return default_table_modify_begin(rel, modify_flags,
+										  cid, options);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_insert == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_insert(state, slot);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_end == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_end(state);
+	}
+	else
+		Assert(false);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2b01a3081e..edaa4d26f0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1123,6 +1123,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2814,6 +2816,7 @@ TableFuncScan
 TableFuncScanState
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v17-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch (5.5K, 3-v17-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch)
  download | inline diff:
From 3560a49e67774f96fb2e712845c370a18f9c7a77 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 3 Apr 2024 08:37:21 +0000
Subject: [PATCH v17 2/2] Optimize CTAS, CMV, RMV with multi inserts

---
 src/backend/commands/createas.c | 27 +++++++++------------------
 src/backend/commands/matview.c  | 26 +++++++++-----------------
 2 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index afd3dace07..00c1271f93 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -578,7 +578,6 @@ static bool
 intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	bool		insertIndexes;
 
 	/* Nothing to insert if WITH NO DATA is specified. */
 	if (!myState->into->skipData)
@@ -591,12 +590,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate,
-						   &insertIndexes);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -614,10 +608,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ec13d0984..f03aa1cff3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,12 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -476,7 +477,6 @@ static bool
 transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
-	bool		insertIndexes;
 
 	/*
 	 * Note that the input slot might not be of the type of the target
@@ -486,13 +486,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate,
-					   &insertIndexes);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,9 +501,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-03 09:02                 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
@ 2024-04-03 12:25                   ` Bharath Rupireddy <[email protected]>
  0 siblings, 0 replies; 30+ messages in thread

From: Bharath Rupireddy @ 2024-04-03 12:25 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, Apr 3, 2024 at 2:32 PM Bharath Rupireddy
<[email protected]> wrote:
>
> I too prefer the latter so that the caller doesn't have to have two
> paths. The new API can just transparently fallback to single inserts.
> I've implemented that in the attached v17 patch. I also tested the
> default APIs manually, but I'll see if I can add some tests to it the
> default API.

Fixed a compiler warning found via CF bot. Please find the attached
v18 patches. I'm sorry for the noise.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v18-0001-Introduce-new-table-modify-access-methods.patch (17.6K, 2-v18-0001-Introduce-new-table-modify-access-methods.patch)
  download | inline diff:
From ff1278b77e0d6ac6a49f0826602bd948e78c7a91 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 3 Apr 2024 11:56:14 +0000
Subject: [PATCH v18 1/2] Introduce new table modify access methods

---
 src/backend/access/heap/heapam.c         | 189 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableam.c       |  86 +++++++++++
 src/backend/access/table/tableamapi.c    |   8 +
 src/include/access/heapam.h              |  41 +++++
 src/include/access/tableam.h             | 108 +++++++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 7 files changed, 439 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b661d9811e..69f8c597d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -107,7 +108,8 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_buffer_flush(TableModifyState *state);
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2441,6 +2443,191 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	*insert_indexes = true;
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags, CommandId cid,
+				  int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_cb = heap_modify_insert_end;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+static void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	Assert(istate->bistate != NULL);
+
+	if (mistate->cur_slots == 0)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate,
+					  &state->insert_indexes);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_cb != NULL)
+		state->modify_end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c86000d245..f3aa29851d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2638,6 +2638,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 805d222ceb..4c7b5433ec 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -21,6 +21,7 @@
 
 #include <math.h>
 
+#include "access/heapam.h"		/* just for BulkInsertState */
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
+#include "utils/memutils.h"
 
 /*
  * Constants to control the behavior of block allocation to parallel workers
@@ -48,6 +50,7 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+static void default_table_modify_insert_end(TableModifyState *state);
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -772,3 +775,86 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 	else
 		*allvisfrac = (double) relallvisible / curpages;
 }
+
+/*
+ * Initialize default table modify state.
+ */
+TableModifyState *
+default_table_modify_begin(Relation rel, int modify_flags, CommandId cid,
+						   int options)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"default_table_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->insert_indexes = false;
+	state->modify_end_cb = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Default table modify implementation for inserts.
+ */
+void
+default_table_modify_buffer_insert(TableModifyState *state,
+								   TupleTableSlot *slot)
+{
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mctx);
+
+	/* First time through, initialize default table modify state */
+	if (state->data == NULL)
+	{
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			state->data = (BulkInsertState) GetBulkInsertState();
+
+		state->modify_end_cb = default_table_modify_insert_end;
+	}
+
+	/* Fallback to table AM single insert routine */
+	table_tuple_insert(state->rel,
+					   slot,
+					   state->cid,
+					   state->options,
+					   (BulkInsertState) state->data,
+					   &state->insert_indexes);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Default table modify insert specific callback used for performing work at
+ * the end like cleaning up the bulk insert state.
+ */
+static void
+default_table_modify_insert_end(TableModifyState *state)
+{
+	if (state->data != NULL)
+		FreeBulkInsertState((BulkInsertState) state->data);
+}
+
+/*
+ * Clean default table modify state.
+ */
+void
+default_table_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_cb != NULL)
+		state->modify_end_cb(state);
+
+	MemoryContextDelete(state->mctx);
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 55b8caeadf..9c095b93e7 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -95,6 +95,14 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	/* optional, but either all of them are defined or none. */
+	Assert((routine->tuple_modify_begin == NULL &&
+			routine->tuple_modify_buffer_insert == NULL &&
+			routine->tuple_modify_end == NULL) ||
+		   (routine->tuple_modify_begin != NULL &&
+			routine->tuple_modify_buffer_insert != NULL &&
+			routine->tuple_modify_end != NULL));
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b632fe953c..b35ba5509b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -236,6 +236,36 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -286,6 +316,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate, bool *insert_indexes);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options);
+
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, int options,
 							 struct TM_FailureData *tmfd, bool changingPart,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 2c1a540155..fef9202022 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -248,6 +248,35 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCP) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mctx;
+	CommandId	cid;
+	int			options;
+	bool		insert_indexes;
+
+	/* Table AM specific data starts here */
+	void	   *data;
+
+	TableModifyEndCP modify_end_cb;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -584,6 +613,18 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1604,6 +1645,73 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+extern TableModifyState *default_table_modify_begin(Relation rel, int modify_flags,
+													CommandId cid, int options);
+extern void	default_table_modify_buffer_insert(TableModifyState *state,
+											   TupleTableSlot *slot);
+extern void default_table_modify_end(TableModifyState *state);
+
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options)
+{
+	if (rel->rd_tableam &&
+		rel->rd_tableam->tuple_modify_begin != NULL)
+	{
+		return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+												   cid, options);
+	}
+	else if (rel->rd_tableam &&
+			 rel->rd_tableam->tuple_modify_begin == NULL)
+	{
+		/* Fallback to a default implementation */
+		return default_table_modify_begin(rel, modify_flags,
+										  cid, options);
+	}
+	else
+		Assert(false);
+
+	return NULL;
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_insert == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_insert(state, slot);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_end == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_end(state);
+	}
+	else
+		Assert(false);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2b01a3081e..edaa4d26f0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1123,6 +1123,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2814,6 +2816,7 @@ TableFuncScan
 TableFuncScanState
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v18-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch (5.5K, 3-v18-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch)
  download | inline diff:
From 7cbb0630ec2cfd278384676536577cf445c0a092 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 3 Apr 2024 11:56:31 +0000
Subject: [PATCH v18 2/2] Optimize CTAS, CMV, RMV with multi inserts

---
 src/backend/commands/createas.c | 27 +++++++++------------------
 src/backend/commands/matview.c  | 26 +++++++++-----------------
 2 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index afd3dace07..00c1271f93 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -578,7 +578,6 @@ static bool
 intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	bool		insertIndexes;
 
 	/* Nothing to insert if WITH NO DATA is specified. */
 	if (!myState->into->skipData)
@@ -591,12 +590,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate,
-						   &insertIndexes);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -614,10 +608,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ec13d0984..f03aa1cff3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,12 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -476,7 +477,6 @@ static bool
 transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
-	bool		insertIndexes;
 
 	/*
 	 * Note that the input slot might not be of the type of the target
@@ -486,13 +486,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate,
-					   &insertIndexes);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,9 +501,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
@ 2024-04-24 12:49                 ` Bharath Rupireddy <[email protected]>
  2024-04-24 16:07                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Pavel Stehule <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  1 sibling, 2 replies; 30+ messages in thread

From: Bharath Rupireddy @ 2024-04-24 12:49 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <[email protected]> wrote:
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.

I added table_modify_begin, table_modify_buffer_insert,
table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
authors now can define their own buffering strategy and flushing decisions
based on their tuple storage kinds and various other AM specific factors. I
also added a default implementation that falls back to single inserts when
no implementation is provided for these AM by AM authors. See the attached
v19-0001 patch.

> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)

I added a flush callback named TableModifyBufferFlushCallback; when
provided by callers invoked after tuples are flushed to disk from the
buffers but before the AM frees them up. Index insertions and AFTER ROW
INSERT triggers can be executed in this callback. See the v19-0001 patch
for how AM invokes the flush callback, and see either v19-0003 or v19-0004
or v19-0005 for how a caller can supply the callback and required context
to execute index insertions and AR triggers.

> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).

I haven't thought about these things yet. I can only focus on them after
seeing how the attached patches go from here.

> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.

Heap implementation for thes new Table AMs uses a separate memory context
for all of the operations. Please have a look and let me know if we need
anything more.

> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.

I used multi inserts for all of these including TABLE REWRITE commands such
as ALTER TABLE. See the attached v19-0002 patch. Check the testing section
below for benefits.

FWIW, following are some of the TABLE REWRITE commands that can get
benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTER COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
and so on.

> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.

I modified the COPY FROM code to use the new Table AMs, and performed some
tests which show no signs of regression. Check the testing section below
for more details. See the attached v19-0005 patch. With this,
table_multi_insert can be deprecated.

> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.

I did that in v19-0003. I did place the table_modify_end call in multiple
places including ExecEndModifyTable. I didn't find any issues with it.
Please have a look and let me know if we need the end call in more places.
Check the testing section below for benefits.

> 10. Use these new methods for logical apply.

I used multi inserts for Logical Replication apply. in v19-0004. Check the
testing section below for benefits.

FWIW, open-source pglogical does have multi insert support, check code
around
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
.

> 11. Deprecate the multi_insert API.

I did remove both table_multi_insert and table_finish_bulk_insert in
v19-0006. Perhaps, removing them isn't a great idea, but adding a
deprecation WARNING/ERROR until some more PG releases might be worth
looking at.

> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

If the design, code and benefits that these new Table AMs bring to the
table look good, I hope to see it for PG 18.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The attached patches already have implemented these new Table AMs for Heap.
I don't think we need a separate implementation to demonstrate. If others
feel so, I'm open to thoughts here.

Having said above, I'd like to reiterate the motivation behind the new
Table AMs for multi and single inserts.

1. A scan-like API with state being carried across is thought to be better
as suggested by Andres Freund -
https://www.postgresql.org/message-id/[email protected]
.
2. Allowing a Table AM to optimize operations across multiple inserts,
define its own buffering strategy and take its own flushing decisions based
on their tuple storage kinds and various other AM specific factors.
3. Improve performance of various SQL commands with multi inserts for Heap
AM.

The attached v19 patches might need some more detailed comments, some
documentation and some specific tests ensuring the multi inserts for Heap
are kicked-in for various commands. I'm open to thoughts here.

I did some testing to see how various commands benefit with multi inserts
using these new Table AM for heap. It's not only the improvement in
performance these commands see, but also the amount of WAL that gets
generated reduces greatly. After all, multi inserts optimize the insertions
by writing less WAL. IOW, writing WAL record per page if multiple rows fit
into a single data page as opposed to WAL record per row.

Test case 1: 100 million rows, 2 columns (int and float)

Command                        | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS                | 121        | 77            | 36.3        |
1.57
CREATE MATERIALIZED VIEW       | 101        | 49            | 51.4        |
2.06
REFRESH MATERIALIZED VIEW      | 113        | 54            | 52.2        |
2.09
ALTER TABLE (TABLE REWRITE)    | 124        | 81            | 34.6        |
1.53
COPY FROM                      | 71         | 72            | 0           |
1
INSERT INTO ... SELECT         | 117        | 62            | 47          |
1.88
LOGICAL REPLICATION APPLY      | 393        | 306           | 22.1        |
1.28

Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS                | 6.8              | 2.4                 |
64.7         | 2.83
CREATE MATERIALIZED VIEW       | 7.2              | 2.3                 |
68           | 3.13
REFRESH MATERIALIZED VIEW      | 10               | 5.1                 |
49           | 1.96
ALTER TABLE (TABLE REWRITE)    | 8                | 3.2                 |
60           | 2.5
COPY FROM                      | 2.9              | 3                   | 0
           | 1
INSERT INTO ... SELECT         | 8                | 3                   |
62.5         | 2.66
LOGICAL REPLICATION APPLY      | 7.5              | 2.3                 |
69.3         | 3.26

Test case 2: 1 billion rows, 1 column (int)

Command                        | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS                | 794        | 386           | 51.38       |
2.05
CREATE MATERIALIZED VIEW       | 1006       | 563           | 44.03       |
1.78
REFRESH MATERIALIZED VIEW      | 977        | 603           | 38.28       |
1.62
ALTER TABLE (TABLE REWRITE)    | 1189       | 714           | 39.94       |
1.66
COPY FROM                      | 321        | 330           | -0.02       |
0.97
INSERT INTO ... SELECT         | 1084       | 586           | 45.94       |
1.84
LOGICAL REPLICATION APPLY      | 3530       | 2982          | 15.52       |
1.18

Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS                | 60               | 12                  |
80           | 5
CREATE MATERIALIZED VIEW       | 60               | 12                  |
80           | 5
REFRESH MATERIALIZED VIEW      | 60               | 12                  |
80           | 5
ALTER TABLE (TABLE REWRITE)    | 123              | 31                  |
60           | 2.5
COPY FROM                      | 12               | 12                  | 0
           | 1
INSERT INTO ... SELECT         | 120              | 24                  |
80           | 5
LOGICAL REPLICATION APPLY      | 61               | 12                  |
80.32        | 5

Test setup:
./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
install.log && make -j 8 install > install.log 2>&1 &

wal_level=logical
max_wal_size = 256GB
checkpoint_timeout = 1h

Test system is EC2 instance of type c5.4xlarge:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            7
    BogoMIPS:            5999.99
Caches (sum of all):
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    8 MiB (8 instances)
  L3:                    35.8 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
RAM:
  MemTotal:       32036536 kB

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v19-0001-Introduce-new-Table-Access-Methods-for-single-an.patch (20.3K, 3-v19-0001-Introduce-new-Table-Access-Methods-for-single-an.patch)
  download | inline diff:
From 75666da998aaa8fbc60d62ad8c160a5c227065e6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 04:12:29 +0000
Subject: [PATCH v19 1/6] Introduce new Table Access Methods for single and
 multi inserts

---
 src/backend/access/heap/heapam.c         | 202 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableam.c       |  95 +++++++++++
 src/backend/access/table/tableamapi.c    |  10 ++
 src/include/access/heapam.h              |  44 +++++
 src/include/access/tableam.h             | 146 ++++++++++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 7 files changed, 505 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4a4cf76269..37c6ed232c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -112,7 +113,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end_callback(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2608,6 +2609,205 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags,
+				  CommandId cid, int options,
+				  TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				  void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+			mistate->mem_cxt = AllocSetContextCreate(CurrentMemoryContext,
+													 "heap_multi_insert memory context",
+													 ALLOCSET_DEFAULT_SIZES);
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_callback = heap_modify_insert_end_callback;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert may leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_cxt);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate);
+
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_cxt);
+
+	if (state->modify_buffer_flush_callback != NULL)
+		state->modify_buffer_flush_callback(state->modify_buffer_flush_context,
+											mistate->slots, mistate->cur_slots);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end_callback(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_cxt);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..eda0c73a16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2615,6 +2615,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e57a0b7ea3..0e4ce1aca6 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -21,6 +21,7 @@
 
 #include <math.h>
 
+#include "access/heapam.h"		/* just for BulkInsertState */
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
+#include "utils/memutils.h"
 
 /*
  * Constants to control the behavior of block allocation to parallel workers
@@ -48,6 +50,7 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+static void default_table_modify_insert_end_callback(TableModifyState *state);
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -756,3 +759,95 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 	else
 		*allvisfrac = (double) relallvisible / curpages;
 }
+
+/*
+ * Initialize default table modify state.
+ */
+TableModifyState *
+default_table_modify_begin(Relation rel, int modify_flags,
+						   CommandId cid, int options,
+						   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+						   void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"default_table_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Default table modify implementation for inserts.
+ */
+void
+default_table_modify_buffer_insert(TableModifyState *state,
+								   TupleTableSlot *slot)
+{
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize default table modify state */
+	if (state->data == NULL)
+	{
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			state->data = (BulkInsertState) GetBulkInsertState();
+
+		state->modify_end_callback = default_table_modify_insert_end_callback;
+	}
+
+	/* Fallback to table AM single insert routine */
+	table_tuple_insert(state->rel,
+					   slot,
+					   state->cid,
+					   state->options,
+					   (BulkInsertState) state->data);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Default table modify implementation for flush.
+ */
+void
+default_table_modify_buffer_flush(TableModifyState *state)
+{
+	/* no-op */
+}
+
+/*
+ * Default table modify insert specific callback used for performing work at
+ * the end like cleaning up the bulk insert state.
+ */
+static void
+default_table_modify_insert_end_callback(TableModifyState *state)
+{
+	if (state->data != NULL)
+		FreeBulkInsertState((BulkInsertState) state->data);
+}
+
+/*
+ * Clean default table modify state.
+ */
+void
+default_table_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index ce637a5a5d..96ac951af6 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,16 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	/* optional, but either all of them are defined or none. */
+	Assert((routine->tuple_modify_begin == NULL &&
+			routine->tuple_modify_buffer_insert == NULL &&
+			routine->tuple_modify_buffer_flush == NULL &&
+			routine->tuple_modify_end == NULL) ||
+		   (routine->tuple_modify_begin != NULL &&
+			routine->tuple_modify_buffer_insert != NULL &&
+			routine->tuple_modify_buffer_flush != NULL &&
+			routine->tuple_modify_end != NULL));
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a5045ce..c10ebbb5ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -271,6 +271,38 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+
+	MemoryContext mem_cxt;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -321,6 +353,18 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+										   void *modify_buffer_flush_context);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..ddb6e6f3a5 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -255,6 +255,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Callback invoked for each tuple that gets flushed to disk from buffer */
+typedef void (*TableModifyBufferFlushCallback) (void *context,
+												TupleTableSlot **slots,
+												int nslots);
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCallback) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mem_cxt;
+	CommandId	cid;
+	int			options;
+
+	/* Flush callback and its context */
+	TableModifyBufferFlushCallback modify_buffer_flush_callback;
+	void	   *modify_buffer_flush_context;
+
+	/* Table AM specific data */
+	void	   *data;
+
+	TableModifyEndCallback modify_end_callback;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -578,6 +615,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCallback modify_buffer_flush_callback,
+											 void *modify_buffer_flush_context);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1609,6 +1661,100 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+extern TableModifyState *default_table_modify_begin(Relation rel, int modify_flags,
+													CommandId cid, int options,
+													TableModifyBufferFlushCallback modify_buffer_flush_callback,
+													void *modify_buffer_flush_context);
+extern void default_table_modify_buffer_insert(TableModifyState *state,
+											   TupleTableSlot *slot);
+extern void default_table_modify_buffer_flush(TableModifyState *state);
+extern void default_table_modify_end(TableModifyState *state);
+
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options,
+				   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				   void *modify_buffer_flush_context)
+{
+	if (rel->rd_tableam &&
+		rel->rd_tableam->tuple_modify_begin != NULL)
+	{
+		return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+												   cid, options,
+												   modify_buffer_flush_callback,
+												   modify_buffer_flush_context);
+	}
+	else if (rel->rd_tableam &&
+			 rel->rd_tableam->tuple_modify_begin == NULL)
+	{
+		/* Fallback to a default implementation */
+		return default_table_modify_begin(rel, modify_flags,
+										  cid, options,
+										  modify_buffer_flush_callback,
+										  modify_buffer_flush_context);
+	}
+	else
+		Assert(false);
+
+	return NULL;
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_insert == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_insert(state, slot);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_flush != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_flush == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_flush(state);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_end == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_end(state);
+	}
+	else
+		Assert(false);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d551ada325..ebde07bcde 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1130,6 +1130,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2844,6 +2846,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v19-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch (7.0K, 4-v19-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch)
  download | inline diff:
From 5a6dd7ac0cae831fbd8294710997dd484c089fcb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 04:12:58 +0000
Subject: [PATCH v19 2/6] Optimize CTAS, CMV, RMV and TABLE REWRITES with multi
 inserts

---
 src/backend/commands/createas.c  | 27 +++++++++++----------------
 src/backend/commands/matview.c   | 26 +++++++++++---------------
 src/backend/commands/tablecmds.c | 31 +++++++++++--------------------
 3 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..2d6fffbf07 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,21 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 NULL,
+											 NULL);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +592,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +610,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..bb97e2fa5f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,14 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN,
+										 NULL,
+										 NULL);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +488,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +503,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3556240c8e..0c984aa656 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6060,10 +6060,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableModifyState *mstate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -6082,18 +6080,15 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 * Prepare a BulkInsertState and options for table_tuple_insert.  The FSM
 	 * is empty, so don't bother using it.
 	 */
-	if (newrel)
+	if (newrel && mstate == NULL)
 	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
-	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		mstate = table_modify_begin(newrel,
+									TM_FLAG_MULTI_INSERTS |
+									TM_FLAG_BAS_BULKWRITE,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									NULL,
+									NULL);
 	}
 
 	/*
@@ -6392,8 +6387,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_modify_buffer_insert(mstate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -6414,10 +6408,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
-
-		table_finish_bulk_insert(newrel, ti_options);
-
+		table_modify_end(mstate);
 		table_close(newrel, NoLock);
 	}
 }
-- 
2.34.1



  [application/x-patch] v19-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch (10.1K, 5-v19-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch)
  download | inline diff:
From d3f0c64e85417e6fcf164656481ea80732b9bd87 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 04:15:49 +0000
Subject: [PATCH v19 3/6] Optimize INSERT INTO ... SELECT with multi inserts

---
 contrib/test_decoding/expected/stream.out |   2 +-
 src/backend/executor/nodeModifyTable.c    | 177 +++++++++++++++++++---
 src/tools/pgindent/typedefs.list          |   1 +
 3 files changed, 161 insertions(+), 19 deletions(-)

diff --git a/contrib/test_decoding/expected/stream.out b/contrib/test_decoding/expected/stream.out
index 4ab2d47bf8..c19facb3c9 100644
--- a/contrib/test_decoding/expected/stream.out
+++ b/contrib/test_decoding/expected/stream.out
@@ -101,10 +101,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'incl
  streaming change for transaction
  streaming change for transaction
  streaming change for transaction
- streaming change for transaction
  closing a streamed block for transaction
  opening a streamed block for transaction
  streaming change for transaction
+ streaming change for transaction
  closing a streamed block for transaction
  committing streamed transaction
 (17 rows)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3659..434e3f8411 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -114,6 +114,19 @@ typedef struct UpdateContext
 	LockTupleMode lockmode;
 } UpdateContext;
 
+typedef struct InsertModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+	ModifyTableState *mtstate;
+} InsertModifyBufferFlushContext;
+
+static InsertModifyBufferFlushContext *insert_modify_buffer_flush_context = NULL;
+static TableModifyState *table_modify_state = NULL;
+
+static void InsertModifyBufferFlushCallback(void *context,
+											TupleTableSlot **slots,
+											int nslots);
 
 static void ExecBatchInsert(ModifyTableState *mtstate,
 							ResultRelInfo *resultRelInfo,
@@ -726,6 +739,61 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
 	return ExecProject(newProj);
 }
 
+static void
+InsertModifyBufferFlushCallback(void *context, TupleTableSlot **slots, int nslots)
+{
+	InsertModifyBufferFlushContext *ctx = (InsertModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	ModifyTableState *mtstate = ctx->mtstate;
+	int			i;
+
+	if (nslots <= 0)
+		return;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+	for (i = 0; i < nslots; i++)
+	{
+		/*
+		 * If there are any indexes, update them for all the inserted tuples,
+		 * and run AFTER ROW INSERT triggers.
+		 */
+		if (resultRelInfo->ri_NumIndices > 0)
+		{
+			List	   *recheckIndexes;
+
+			recheckIndexes =
+				ExecInsertIndexTuples(resultRelInfo,
+									  slots[i], estate, false,
+									  false, NULL, NIL, false);
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], recheckIndexes,
+								 mtstate->mt_transition_capture);
+			list_free(recheckIndexes);
+		}
+
+		/*
+		 * There's no indexes, but see if we need to run AFTER ROW INSERT
+		 * triggers anyway.
+		 */
+		else if (resultRelInfo->ri_TrigDesc != NULL &&
+				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		{
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], NIL,
+								 mtstate->mt_transition_capture);
+		}
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInsert
  *
@@ -751,7 +819,8 @@ ExecInsert(ModifyTableContext *context,
 		   TupleTableSlot *slot,
 		   bool canSetTag,
 		   TupleTableSlot **inserted_tuple,
-		   ResultRelInfo **insert_destrel)
+		   ResultRelInfo **insert_destrel,
+		   bool canMultiInsert)
 {
 	ModifyTableState *mtstate = context->mtstate;
 	EState	   *estate = context->estate;
@@ -764,6 +833,7 @@ ExecInsert(ModifyTableContext *context,
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	MemoryContext oldContext;
+	bool		ar_insert_triggers_executed = false;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -1126,17 +1196,53 @@ ExecInsert(ModifyTableContext *context,
 		}
 		else
 		{
-			/* insert the tuple normally */
-			table_tuple_insert(resultRelationDesc, slot,
-							   estate->es_output_cid,
-							   0, NULL);
+			if (canMultiInsert &&
+				proute == NULL &&
+				resultRelInfo->ri_WithCheckOptions == NIL &&
+				resultRelInfo->ri_projectReturning == NULL)
+			{
+				if (insert_modify_buffer_flush_context == NULL)
+				{
+					insert_modify_buffer_flush_context =
+						(InsertModifyBufferFlushContext *) palloc0(sizeof(InsertModifyBufferFlushContext));
+					insert_modify_buffer_flush_context->resultRelInfo = resultRelInfo;
+					insert_modify_buffer_flush_context->estate = estate;
+					insert_modify_buffer_flush_context->mtstate = mtstate;
+				}
 
-			/* insert index entries for tuple */
-			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-													   slot, estate, false,
-													   false, NULL, NIL,
-													   false);
+				if (table_modify_state == NULL)
+				{
+					table_modify_state = table_modify_begin(resultRelInfo->ri_RelationDesc,
+															TM_FLAG_MULTI_INSERTS,
+															estate->es_output_cid,
+															0,
+															InsertModifyBufferFlushCallback,
+															insert_modify_buffer_flush_context);
+				}
+
+				table_modify_buffer_insert(table_modify_state, slot);
+				ar_insert_triggers_executed = true;
+			}
+			else
+			{
+				/* insert the tuple normally */
+				table_tuple_insert(resultRelationDesc, slot,
+								   estate->es_output_cid,
+								   0, NULL);
+
+				/* insert index entries for tuple */
+				if (resultRelInfo->ri_NumIndices > 0)
+					recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+														   slot, estate, false,
+														   false, NULL, NIL,
+														   false);
+
+				ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+									 mtstate->mt_transition_capture);
+
+				list_free(recheckIndexes);
+				ar_insert_triggers_executed = true;
+			}
 		}
 	}
 
@@ -1170,10 +1276,12 @@ ExecInsert(ModifyTableContext *context,
 	}
 
 	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
-
-	list_free(recheckIndexes);
+	if (!ar_insert_triggers_executed)
+	{
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 ar_insert_trig_tcs);
+		list_free(recheckIndexes);
+	}
 
 	/*
 	 * Check any WITH CHECK OPTION constraints from parent views.  We are
@@ -1869,7 +1977,7 @@ ExecCrossPartitionUpdate(ModifyTableContext *context,
 	/* Tuple routing starts from the root table. */
 	context->cpUpdateReturningSlot =
 		ExecInsert(context, mtstate->rootResultRelInfo, slot, canSetTag,
-				   inserted_tuple, insert_destrel);
+				   inserted_tuple, insert_destrel, false);
 
 	/*
 	 * Reset the transition state that may possibly have been written by
@@ -3364,7 +3472,7 @@ ExecMergeNotMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 				mtstate->mt_merge_action = action;
 
 				rslot = ExecInsert(context, mtstate->rootResultRelInfo,
-								   newslot, canSetTag, NULL, NULL);
+								   newslot, canSetTag, NULL, NULL, false);
 				mtstate->mt_merge_inserted += 1;
 				break;
 			case CMD_NOTHING:
@@ -3749,6 +3857,10 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		canMultiInsert = false;
+
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -3844,6 +3956,10 @@ ExecModifyTable(PlanState *pstate)
 		if (TupIsNull(context.planSlot))
 			break;
 
+		if (operation == CMD_INSERT &&
+			nodeTag(subplanstate) == T_SeqScanState)
+			canMultiInsert = true;
+
 		/*
 		 * When there are multiple result relations, each tuple contains a
 		 * junk column that gives the OID of the rel from which it came.
@@ -4057,7 +4173,7 @@ ExecModifyTable(PlanState *pstate)
 					ExecInitInsertProjection(node, resultRelInfo);
 				slot = ExecGetInsertNewTuple(resultRelInfo, context.planSlot);
 				slot = ExecInsert(&context, resultRelInfo, slot,
-								  node->canSetTag, NULL, NULL);
+								  node->canSetTag, NULL, NULL, canMultiInsert);
 				break;
 
 			case CMD_UPDATE:
@@ -4116,6 +4232,17 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	if (table_modify_state != NULL)
+	{
+		Assert(operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Insert remaining tuples for batch insert.
 	 */
@@ -4228,6 +4355,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_merge_updated = 0;
 	mtstate->mt_merge_deleted = 0;
 
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
+
 	/*----------
 	 * Resolve the target relation. This is the same as:
 	 *
@@ -4681,6 +4811,17 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	if (table_modify_state != NULL)
+	{
+		Assert(node->operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ebde07bcde..11c4d99430 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1226,6 +1226,7 @@ InjectionPointEntry
 InjectionPointSharedState
 InlineCodeBlock
 InProgressIO
+InsertModifyBufferFlushContext
 InsertStmt
 Instrumentation
 Int128AggState
-- 
2.34.1



  [application/x-patch] v19-0004-Optimize-Logical-Replication-apply-with-multi-in.patch (19.9K, 6-v19-0004-Optimize-Logical-Replication-apply-with-multi-in.patch)
  download | inline diff:
From 4f4cf2f380a18c7a754b2fcd979af4617c6aff52 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 04:16:07 +0000
Subject: [PATCH v19 4/6] Optimize Logical Replication apply with multi inserts

---
 src/backend/executor/execReplication.c   |  39 +++
 src/backend/replication/logical/proto.c  |  24 ++
 src/backend/replication/logical/worker.c | 357 ++++++++++++++++++++++-
 src/include/executor/executor.h          |   4 +
 src/include/replication/logicalproto.h   |   2 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 415 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd577..fae1375537 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -544,6 +544,45 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 	}
 }
 
+void
+ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+						ResultRelInfo *resultRelInfo,
+						EState *estate, TupleTableSlot *slot)
+{
+	bool		skip_tuple = false;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+
+	/* For now we support only tables. */
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+
+	CheckCmdReplicaIdentity(rel, CMD_INSERT);
+
+	/* BEFORE ROW INSERT Triggers */
+	if (resultRelInfo->ri_TrigDesc &&
+		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
+	{
+		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			skip_tuple = true;	/* "do nothing" */
+	}
+
+	if (!skip_tuple)
+	{
+		/* Compute stored generated columns */
+		if (rel->rd_att->constr &&
+			rel->rd_att->constr->has_generated_stored)
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
+
+		/* Check the constraints of the tuple */
+		if (rel->rd_att->constr)
+			ExecConstraints(resultRelInfo, slot, estate);
+		if (rel->rd_rel->relispartition)
+			ExecPartitionCheck(resultRelInfo, slot, estate, true);
+
+		table_modify_buffer_insert(MultiInsertState, slot);
+	}
+}
+
 /*
  * Find the searchslot tuple and update it with data in the slot,
  * update the indexes, and execute any constraints and per-row triggers.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 95c09c9516..46d38aebd2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -427,6 +427,30 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	logicalrep_write_tuple(out, rel, newslot, binary, columns);
 }
 
+LogicalRepRelId
+logicalrep_read_relid(StringInfo in)
+{
+	LogicalRepRelId relid;
+
+	/* read the relation id */
+	relid = pq_getmsgint(in, 4);
+
+	return relid;
+}
+
+void
+logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup)
+{
+	char		action;
+
+	action = pq_getmsgbyte(in);
+	if (action != 'N')
+		elog(ERROR, "expected new tuple but got %d",
+			 action);
+
+	logicalrep_read_tuple(in, newtup);
+}
+
 /*
  * Read INSERT from stream.
  *
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b5a80fe3e8..3440883847 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -148,7 +148,6 @@
 #include <unistd.h>
 
 #include "access/table.h"
-#include "access/tableam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "catalog/indexing.h"
@@ -416,6 +415,30 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,
 													 ParallelApplyWorkerInfo **winfo);
 
+typedef enum LRMultiInsertReturnStatus
+{
+	LR_MULTI_INSERT_NONE = 0,
+	LR_MULTI_INSERT_REL_SKIPPED,
+	LR_MULTI_INSERT_DISALLOWED,
+	LR_MULTI_INSERT_DONE,
+} LRMultiInsertReturnStatus;
+
+static TableModifyState *MultiInsertState = NULL;
+static LogicalRepRelMapEntry *LastRel = NULL;
+static LogicalRepRelId LastMultiInsertRelId = InvalidOid;
+static ApplyExecutionData *LastEData = NULL;
+static TupleTableSlot *LastRemoteSlot = NULL;
+
+typedef struct LRModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} LRModifyBufferFlushContext;
+
+static LRModifyBufferFlushContext *modify_buffer_flush_context = NULL;
+static void LRModifyBufferFlushCallback(void *context, TupleTableSlot **slots, int nslots);
+static void FinishMultiInserts(void);
+
 /*
  * Form the origin name for the subscription.
  *
@@ -1017,6 +1040,8 @@ apply_handle_commit(StringInfo s)
 {
 	LogicalRepCommitData commit_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit(s, &commit_data);
 
 	if (commit_data.commit_lsn != remote_final_lsn)
@@ -1043,6 +1068,8 @@ apply_handle_begin_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData begin_data;
 
+	FinishMultiInserts();
+
 	/* Tablesync should never receive prepare. */
 	if (am_tablesync_worker())
 		ereport(ERROR,
@@ -1109,6 +1136,8 @@ apply_handle_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData prepare_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_prepare(s, &prepare_data);
 
 	if (prepare_data.prepare_lsn != remote_final_lsn)
@@ -1171,6 +1200,8 @@ apply_handle_commit_prepared(StringInfo s)
 	LogicalRepCommitPreparedTxnData prepare_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit_prepared(s, &prepare_data);
 	set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
 
@@ -1220,6 +1251,8 @@ apply_handle_rollback_prepared(StringInfo s)
 	LogicalRepRollbackPreparedTxnData rollback_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_rollback_prepared(s, &rollback_data);
 	set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
 
@@ -1277,6 +1310,8 @@ apply_handle_stream_prepare(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1304,6 +1339,8 @@ apply_handle_stream_prepare(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
 								   prepare_data.xid, prepare_data.prepare_lsn);
 
+			FinishMultiInserts();
+
 			/* Mark the transaction as prepared. */
 			apply_handle_prepare_internal(&prepare_data);
 
@@ -1407,6 +1444,8 @@ apply_handle_stream_prepare(StringInfo s)
 static void
 apply_handle_origin(StringInfo s)
 {
+	FinishMultiInserts();
+
 	/*
 	 * ORIGIN message can only come inside streaming transaction or inside
 	 * remote transaction and before any actual writes.
@@ -1473,6 +1512,8 @@ apply_handle_stream_start(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1628,6 +1669,8 @@ apply_handle_stream_stop(StringInfo s)
 	ParallelApplyWorkerInfo *winfo;
 	TransApplyAction apply_action;
 
+	FinishMultiInserts();
+
 	if (!in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1821,6 +1864,8 @@ apply_handle_stream_abort(StringInfo s)
 	StringInfoData original_msg = *s;
 	bool		toplevel_xact;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2138,6 +2183,8 @@ apply_handle_stream_commit(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2159,6 +2206,8 @@ apply_handle_stream_commit(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
 								   commit_data.commit_lsn);
 
+			FinishMultiInserts();
+
 			apply_handle_commit_internal(&commit_data);
 
 			/* Unlink the files with serialized changes and subxact info. */
@@ -2302,6 +2351,8 @@ apply_handle_relation(StringInfo s)
 {
 	LogicalRepRelation *rel;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
 		return;
 
@@ -2325,6 +2376,8 @@ apply_handle_type(StringInfo s)
 {
 	LogicalRepTyp typ;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
 		return;
 
@@ -2363,16 +2416,132 @@ TargetPrivilegesCheck(Relation rel, AclMode mode)
 						RelationGetRelationName(rel))));
 }
 
-/*
- * Handle INSERT message.
- */
+static void
+FinishMultiInserts(void)
+{
+	LogicalRepMsgType saved_command;
+
+	if (MultiInsertState == NULL)
+		return;
+
+	Assert(OidIsValid(LastMultiInsertRelId));
+	Assert(LastEData != NULL);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	ExecDropSingleTupleTableSlot(LastRemoteSlot);
+	LastRemoteSlot = NULL;
+
+	table_modify_end(MultiInsertState);
+	MultiInsertState = NULL;
+	LastMultiInsertRelId = InvalidOid;
+
+	pfree(modify_buffer_flush_context);
+	modify_buffer_flush_context = NULL;
+
+	ExecCloseIndices(LastEData->targetRelInfo);
+
+	finish_edata(LastEData);
+	LastEData = NULL;
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+
+	logicalrep_rel_close(LastRel, NoLock);
+	LastRel = NULL;
+
+	end_replication_step();
+}
 
 static void
-apply_handle_insert(StringInfo s)
+LRModifyBufferFlushCallback(void *context, TupleTableSlot **slots, int nslots)
+{
+	LRModifyBufferFlushContext *ctx = (LRModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	int			i;
+	LogicalRepMsgType saved_command;
+
+	if (nslots <= 0)
+		return;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	/* Caller must take care of opening and closing the indices */
+	for (i = 0; i < nslots; i++)
+	{
+		/*
+		 * If there are any indexes, update them for all the inserted tuples,
+		 * and run AFTER ROW INSERT triggers.
+		 */
+		if (resultRelInfo->ri_NumIndices > 0)
+		{
+			List	   *recheckIndexes;
+
+			recheckIndexes =
+				ExecInsertIndexTuples(resultRelInfo,
+									  slots[i], estate, false,
+									  false, NULL, NIL, false);
+
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], recheckIndexes,
+								 NULL);
+
+			list_free(recheckIndexes);
+		}
+
+		/*
+		 * There's no indexes, but see if we need to run AFTER ROW INSERT
+		 * triggers anyway.
+		 */
+		else if (resultRelInfo->ri_TrigDesc != NULL &&
+				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		{
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], NIL,
+								 NULL);
+		}
+
+		/*
+		 * XXX we should in theory pass a TransitionCaptureState object to the
+		 * above to capture transition tuples, but after statement triggers
+		 * don't actually get fired by replication yet anyway
+		 */
+	}
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+}
+
+static LRMultiInsertReturnStatus
+do_multi_inserts(StringInfo s, LogicalRepRelId *relid)
 {
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
-	LogicalRepRelId relid;
 	UserContext ucxt;
 	ApplyExecutionData *edata;
 	EState	   *estate;
@@ -2380,17 +2549,143 @@ apply_handle_insert(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	if (MultiInsertState == NULL)
+		begin_replication_step();
+
+	*relid = logicalrep_read_relid(s);
+
+	if (MultiInsertState != NULL &&
+		(LastMultiInsertRelId != InvalidOid &&
+		 *relid != InvalidOid &&
+		 LastMultiInsertRelId != *relid))
+		FinishMultiInserts();
+
+	if (MultiInsertState == NULL)
+		rel = logicalrep_rel_open(*relid, RowExclusiveLock);
+	else
+		rel = LastRel;
+
+	if (!should_apply_changes_for_rel(rel))
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_REL_SKIPPED;
+	}
+
+	/* For a partitioned table, let's not do multi inserts. */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_DISALLOWED;
+	}
+
 	/*
-	 * Quick return if we are skipping data modification changes or handling
-	 * streamed transactions.
+	 * Make sure that any user-supplied code runs as the table owner, unless
+	 * the user has opted out of that behavior.
 	 */
-	if (is_skipping_changes() ||
-		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
-		return;
+	run_as_owner = MySubscription->runasowner;
+	if (!run_as_owner)
+		SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = rel;
+
+	if (MultiInsertState == NULL)
+	{
+		oldctx = MemoryContextSwitchTo(TopTransactionContext);
+
+		/* Initialize the executor state. */
+		LastEData = edata = create_edata_for_relation(rel);
+		estate = edata->estate;
+
+		LastRemoteSlot = remoteslot = MakeTupleTableSlot(RelationGetDescr(rel->localrel),
+														 &TTSOpsVirtual);
+
+		modify_buffer_flush_context = (LRModifyBufferFlushContext *) palloc(sizeof(LRModifyBufferFlushContext));
+		modify_buffer_flush_context->resultRelInfo = edata->targetRelInfo;
+		modify_buffer_flush_context->estate = estate;
+
+		MultiInsertState = table_modify_begin(edata->targetRelInfo->ri_RelationDesc,
+											  TM_FLAG_MULTI_INSERTS |
+											  TM_FLAG_BAS_BULKWRITE,
+											  GetCurrentCommandId(true),
+											  0,
+											  LRModifyBufferFlushCallback,
+											  modify_buffer_flush_context);
+		LastRel = rel;
+		LastMultiInsertRelId = *relid;
+
+		/* We must open indexes here. */
+		ExecOpenIndices(edata->targetRelInfo, false);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+	else
+	{
+		CommandId	cid;
+
+		edata = LastEData;
+		estate = edata->estate;
+		ResetExprContext(GetPerTupleExprContext(estate));
+		ExecClearTuple(LastRemoteSlot);
+		remoteslot = LastRemoteSlot;
+		cid = GetCurrentCommandId(true);
+		MultiInsertState->cid = cid;
+		estate->es_output_cid = cid;
+	}
+
+	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
+	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+	slot_store_data(remoteslot, rel, &newtup);
+	slot_fill_defaults(rel, estate, remoteslot);
+	MemoryContextSwitchTo(oldctx);
+
+	TargetPrivilegesCheck(edata->targetRelInfo->ri_RelationDesc, ACL_INSERT);
+	ExecRelationMultiInsert(MultiInsertState, edata->targetRelInfo, estate, remoteslot);
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	if (!run_as_owner)
+		RestoreUserContext(&ucxt);
+
+	Assert(MultiInsertState != NULL);
+
+	CommandCounterIncrement();
+
+	return LR_MULTI_INSERT_DONE;
+}
+
+static bool
+do_single_inserts(StringInfo s, LogicalRepRelId relid)
+{
+	LogicalRepRelMapEntry *rel;
+	LogicalRepTupleData newtup;
+	UserContext ucxt;
+	ApplyExecutionData *edata;
+	EState	   *estate;
+	TupleTableSlot *remoteslot;
+	MemoryContext oldctx;
+	bool		run_as_owner;
+
+	Assert(relid != InvalidOid);
 
 	begin_replication_step();
 
-	relid = logicalrep_read_insert(s, &newtup);
 	rel = logicalrep_rel_open(relid, RowExclusiveLock);
 	if (!should_apply_changes_for_rel(rel))
 	{
@@ -2400,7 +2695,7 @@ apply_handle_insert(StringInfo s)
 		 */
 		logicalrep_rel_close(rel, RowExclusiveLock);
 		end_replication_step();
-		return;
+		return false;
 	}
 
 	/*
@@ -2422,6 +2717,7 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 
 	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	slot_store_data(remoteslot, rel, &newtup);
 	slot_fill_defaults(rel, estate, remoteslot);
@@ -2446,6 +2742,35 @@ apply_handle_insert(StringInfo s)
 	logicalrep_rel_close(rel, NoLock);
 
 	end_replication_step();
+
+	return true;
+}
+
+/*
+ * Handle INSERT message.
+ */
+static void
+apply_handle_insert(StringInfo s)
+{
+	LRMultiInsertReturnStatus mi_status;
+	LogicalRepRelId relid;
+
+	/*
+	 * Quick return if we are skipping data modification changes or handling
+	 * streamed transactions.
+	 */
+	if (is_skipping_changes() ||
+		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+		return;
+
+	mi_status = do_multi_inserts(s, &relid);
+	if (mi_status == LR_MULTI_INSERT_REL_SKIPPED ||
+		mi_status == LR_MULTI_INSERT_DONE)
+		return;
+
+	do_single_inserts(s, relid);
+
+	return;
 }
 
 /*
@@ -2532,6 +2857,8 @@ apply_handle_update(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -2713,6 +3040,8 @@ apply_handle_delete(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -3154,6 +3483,8 @@ apply_handle_truncate(StringInfo s)
 	ListCell   *lc;
 	LOCKMODE	lockmode = AccessExclusiveLock;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..8f10ea977b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -14,6 +14,7 @@
 #ifndef EXECUTOR_H
 #define EXECUTOR_H
 
+#include "access/tableam.h"
 #include "executor/execdesc.h"
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
@@ -656,6 +657,9 @@ extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 									 EState *estate, TupleTableSlot *slot);
+extern void ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+									ResultRelInfo *resultRelInfo,
+									EState *estate, TupleTableSlot *slot);
 extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index c409638a2e..3f3a7f0a31 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -226,6 +226,8 @@ extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
 									bool binary, Bitmapset *columns);
+extern LogicalRepRelId logicalrep_read_relid(StringInfo in);
+extern void logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 11c4d99430..70f23808e2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1456,6 +1456,8 @@ LPTHREAD_START_ROUTINE
 LPTSTR
 LPVOID
 LPWSTR
+LRModifyBufferFlushContext
+LRMultiInsertReturnStatus
 LSEG
 LUID
 LVRelState
-- 
2.34.1



  [application/x-patch] v19-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch (13.1K, 7-v19-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch)
  download | inline diff:
From d83bf5bc0bfc5f45d85e38a876a7db94f12803da Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 05:18:58 +0000
Subject: [PATCH v19 5/6] Use new multi insert Table AM for COPY FROM

---
 src/backend/commands/copyfrom.c          | 230 +++++++++++++++--------
 src/include/commands/copyfrom_internal.h |   4 +-
 src/tools/pgindent/typedefs.list         |   1 +
 3 files changed, 153 insertions(+), 82 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..8572c5b730 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -71,13 +71,21 @@
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+typedef struct CopyModifyBufferFlushContext
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} CopyModifyBufferFlushContext;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *multislot;
+	CopyModifyBufferFlushContext *modify_buffer_flush_context;
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -99,6 +107,7 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
+static void CopyModifyBufferFlushCallback(void *context, TupleTableSlot **slots, int nslots);
 
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
@@ -218,14 +227,38 @@ CopyLimitPrintoutLength(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->modify_buffer_flush_context = (CopyModifyBufferFlushContext *) palloc(sizeof(CopyModifyBufferFlushContext));
+		buffer->modify_buffer_flush_context->cstate = cstate;
+		buffer->modify_buffer_flush_context->resultRelInfo = rri;
+		buffer->modify_buffer_flush_context->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											TM_FLAG_MULTI_INSERTS |
+											TM_FLAG_BAS_BULKWRITE,
+											miinfo->mycid,
+											miinfo->ti_options,
+											CopyModifyBufferFlushCallback,
+											buffer->modify_buffer_flush_context);
+		buffer->slots = NULL;
+		buffer->multislot = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->multislot = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -236,11 +269,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -273,7 +307,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -317,8 +351,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -390,13 +422,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -404,56 +431,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
+		table_modify_buffer_flush(buffer->mstate);
+
 		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
 		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
-
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
-
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
-
-			ExecClearTuple(slots[i]);
-		}
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -469,6 +452,64 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	buffer->nused = 0;
 }
 
+static void
+CopyModifyBufferFlushCallback(void *context, TupleTableSlot **slots, int nslots)
+{
+	CopyModifyBufferFlushContext *ctx = (CopyModifyBufferFlushContext *) context;
+	CopyFromState cstate = ctx->cstate;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+	int			i;
+
+	if (nslots <= 0)
+		return;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+	for (i = 0; i < nslots; i++)
+	{
+		/*
+		 * If there are any indexes, update them for all the inserted tuples,
+		 * and run AFTER ROW INSERT triggers.
+		 */
+		if (resultRelInfo->ri_NumIndices > 0)
+		{
+			List	   *recheckIndexes;
+
+			cstate->cur_lineno = buffer->linenos[i];
+			recheckIndexes =
+				ExecInsertIndexTuples(resultRelInfo,
+									  slots[i], estate, false,
+									  false, NULL, NIL, false);
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], recheckIndexes,
+								 cstate->transition_capture);
+			list_free(recheckIndexes);
+		}
+
+		/*
+		 * There's no indexes, but see if we need to run AFTER ROW INSERT
+		 * triggers anyway.
+		 */
+		else if (resultRelInfo->ri_TrigDesc != NULL &&
+				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		{
+			cstate->cur_lineno = buffer->linenos[i];
+			ExecARInsertTriggers(estate, resultRelInfo,
+								 slots[i], NIL,
+								 cstate->transition_capture);
+		}
+	}
+}
+
 /*
  * Drop used slots and free member for this buffer.
  *
@@ -489,19 +530,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->multislot);
+		pfree(buffer->modify_buffer_flush_context);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -588,13 +628,32 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->multislot == NULL)
+			buffer->multislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												   &TTSOpsVirtual);
+
+		/* Caller must clear the slot */
+		slot = buffer->multislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -608,7 +667,17 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->multislot);
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -830,7 +899,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -1080,7 +1149,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 70f23808e2..bd8c87be33 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,7 @@ CopyHeaderChoice
 CopyInsertMethod
 CopyMethod
 CopyLogVerbosityChoice
+CopyModifyBufferFlushContext
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
-- 
2.34.1



  [application/x-patch] v19-0006-Remove-table_multi_insert-and-table_finish_bulk_.patch (6.0K, 8-v19-0006-Remove-table_multi_insert-and-table_finish_bulk_.patch)
  download | inline diff:
From 1b5d3c04f21e764756f89f0456d0f96e2b2350de Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 23 Apr 2024 05:25:21 +0000
Subject: [PATCH v19 6/6] Remove table_multi_insert and
 table_finish_bulk_insert

---
 src/backend/access/heap/heapam_handler.c |  1 -
 src/backend/access/table/tableamapi.c    |  1 -
 src/backend/commands/tablecmds.c         |  4 --
 src/include/access/tableam.h             | 56 +-----------------------
 4 files changed, 1 insertion(+), 61 deletions(-)

diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index eda0c73a16..fe9701773a 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2614,7 +2614,6 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert = heapam_tuple_insert,
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
-	.multi_insert = heap_multi_insert,
 
 	.tuple_modify_begin = heap_modify_begin,
 	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 96ac951af6..0af8f1ac1f 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -71,7 +71,6 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_insert_speculative != NULL);
 	Assert(routine->tuple_complete_speculative != NULL);
 
-	Assert(routine->multi_insert != NULL);
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0c984aa656..22bcb12abb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -20969,8 +20969,6 @@ deleteSplitPartitionContext(SplitPartitionContext *pc, int ti_options)
 	ExecDropSingleTupleTableSlot(pc->dstslot);
 	FreeBulkInsertState(pc->bistate);
 
-	table_finish_bulk_insert(pc->partRel, ti_options);
-
 	pfree(pc);
 }
 
@@ -21453,8 +21451,6 @@ moveMergedTablesRows(Relation rel, List *mergingPartitionsList,
 
 	ExecDropSingleTupleTableSlot(dstslot);
 	FreeBulkInsertState(bistate);
-
-	table_finish_bulk_insert(newPartRel, ti_options);
 }
 
 /*
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index ddb6e6f3a5..82798fd641 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -563,10 +563,6 @@ typedef struct TableAmRoutine
 											   uint32 specToken,
 											   bool succeeded);
 
-	/* see table_multi_insert() for reference about parameters */
-	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
-								 CommandId cid, int options, struct BulkInsertStateData *bistate);
-
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -600,21 +596,6 @@ typedef struct TableAmRoutine
 							   uint8 flags,
 							   TM_FailureData *tmfd);
 
-	/*
-	 * Perform operations necessary to complete insertions made via
-	 * tuple_insert and multi_insert with a BulkInsertState specified. In-tree
-	 * access methods ceased to use this.
-	 *
-	 * Typically callers of tuple_insert and multi_insert will just pass all
-	 * the flags that apply to them, and each AM has to decide which of them
-	 * make sense for it, and then only take actions in finish_bulk_insert for
-	 * those flags, and ignore others.
-	 *
-	 * Optional callback.
-	 */
-	void		(*finish_bulk_insert) (Relation rel, int options);
-
-
 	/* ------------------------------------------------------------------------
 	 * Table Modify related functions.
 	 * ------------------------------------------------------------------------
@@ -1453,8 +1434,7 @@ table_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
  * heap's TOAST table, too, if the tuple requires any out-of-line data.
  *
  * The BulkInsertState object (if any; bistate can be NULL for default
- * behavior) is also just passed through to RelationGetBufferForTuple. If
- * `bistate` is provided, table_finish_bulk_insert() needs to be called.
+ * behavior) is also just passed through to RelationGetBufferForTuple.
  *
  * On return the slot's tts_tid and tts_tableOid are updated to reflect the
  * insertion. But note that any toasting of fields within the slot is NOT
@@ -1501,28 +1481,6 @@ table_tuple_complete_speculative(Relation rel, TupleTableSlot *slot,
 												succeeded);
 }
 
-/*
- * Insert multiple tuples into a table.
- *
- * This is like table_tuple_insert(), but inserts multiple tuples in one
- * operation. That's often faster than calling table_tuple_insert() in a loop,
- * because e.g. the AM can reduce WAL logging and page locking overhead.
- *
- * Except for taking `nslots` tuples as input, and an array of TupleTableSlots
- * in `slots`, the parameters for table_multi_insert() are the same as for
- * table_tuple_insert().
- *
- * Note: this leaks memory into the current memory context. You can create a
- * temporary context before calling this, if that's a problem.
- */
-static inline void
-table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
-				   CommandId cid, int options, struct BulkInsertStateData *bistate)
-{
-	rel->rd_tableam->multi_insert(rel, slots, nslots,
-								  cid, options, bistate);
-}
-
 /*
  * Delete a tuple.
  *
@@ -1649,18 +1607,6 @@ table_tuple_lock(Relation rel, ItemPointer tid, Snapshot snapshot,
 									   flags, tmfd);
 }
 
-/*
- * Perform operations necessary to complete insertions made via
- * tuple_insert and multi_insert with a BulkInsertState specified.
- */
-static inline void
-table_finish_bulk_insert(Relation rel, int options)
-{
-	/* optional callback */
-	if (rel->rd_tableam && rel->rd_tableam->finish_bulk_insert)
-		rel->rd_tableam->finish_bulk_insert(rel, options);
-}
-
 /* ------------------------------------------------------------------------
  * Table Modify related functions.
  * ------------------------------------------------------------------------
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-04-24 16:07                   ` Pavel Stehule <[email protected]>
  1 sibling, 0 replies; 30+ messages in thread

From: Pavel Stehule @ 2024-04-24 16:07 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Jeff Davis <[email protected]>; Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy <
[email protected]> napsal:

> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <[email protected]> wrote:
> >
> > Here's where I think this API should go:
> >
> > 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> > those that are implemented in your patch.
>
> I added table_modify_begin, table_modify_buffer_insert,
> table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
> authors now can define their own buffering strategy and flushing decisions
> based on their tuple storage kinds and various other AM specific factors. I
> also added a default implementation that falls back to single inserts when
> no implementation is provided for these AM by AM authors. See the attached
> v19-0001 patch.
>
> > 2. Add some kind of flush callback that will be called either while the
> > tuples are being flushed or after the tuples are flushed (but before
> > they are freed by the AM). (Aside: do we need to call it while the
> > tuples are being flushed to get the right visibility semantics for
> > after-row triggers?)
>
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER ROW
> INSERT triggers can be executed in this callback. See the v19-0001 patch
> for how AM invokes the flush callback, and see either v19-0003 or v19-0004
> or v19-0005 for how a caller can supply the callback and required context
> to execute index insertions and AR triggers.
>
> > 3. Add table_modify_buffer_{update|delete} APIs.
> >
> > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> > the buffer_insert/update/delete APIs; we don't need a separate merge
> > method. This probably requires that the AM maintain 3 separate buffers
> > to distinguish different kinds of changes at flush time (obviously
> > these can be initialized lazily to avoid overhead when not being used).
>
> I haven't thought about these things yet. I can only focus on them after
> seeing how the attached patches go from here.
>
> > 4. Some kind of API tweaks to help manage memory when modifying
> > pertitioned tables, so that the buffering doesn't get out of control.
> > Perhaps just reporting memory usage and allowing the caller to force
> > flushes would be enough.
>
> Heap implementation for thes new Table AMs uses a separate memory context
> for all of the operations. Please have a look and let me know if we need
> anything more.
>
> > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> > fairly straightforward, I believe, and handled by your patch. Indexes
> > are (re)built afterward, and no triggers are possible.
> >
> > 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> > straightforward, I believe, and handled by your patch. No indexes or
> > triggers are possible.
>
> I used multi inserts for all of these including TABLE REWRITE commands
> such as ALTER TABLE. See the attached v19-0002 patch. Check the testing
> section below for benefits.
>
> FWIW, following are some of the TABLE REWRITE commands that can get
> benefitted:
>
> ALTER TABLE tbl ALTER c1 TYPE bigint;
> ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
> ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
> ALTER TABLE itest3 ALTER COLUMN a TYPE int;
> ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
> ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
> and so on.
>
> > 7. Use these new methods for COPY. We have to be careful to avoid
> > regressions for the heap method, because it's already managing its own
> > buffers. If the AM manages the buffering, then it may require
> > additional copying of slots, which could be a disadvantage. To solve
> > this, we may need some minor API tweaks to avoid copying when the
> > caller guarantees that the memory will not be freed to early, or
> > perhaps expose the AM's memory context to copyfrom.c. Another thing to
> > consider is that the buffering in copyfrom.c is also used for FDWs, so
> > that buffering code path needs to be preserved in copyfrom.c even if
> > not used for AMs.
>
> I modified the COPY FROM code to use the new Table AMs, and performed some
> tests which show no signs of regression. Check the testing section below
> for more details. See the attached v19-0005 patch. With this,
> table_multi_insert can be deprecated.
>
> > 8. Use these new methods for INSERT INTO ... SELECT. One potential
> > challenge here is that execution nodes are not always run to
> > completion, so we need to be sure that the flush isn't forgotten in
> > that case.
>
> I did that in v19-0003. I did place the table_modify_end call in multiple
> places including ExecEndModifyTable. I didn't find any issues with it.
> Please have a look and let me know if we need the end call in more places.
> Check the testing section below for benefits.
>
> > 10. Use these new methods for logical apply.
>
> I used multi inserts for Logical Replication apply. in v19-0004. Check the
> testing section below for benefits.
>
> FWIW, open-source pglogical does have multi insert support, check code
> around
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
> .
>
> > 11. Deprecate the multi_insert API.
>
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006. Perhaps, removing them isn't a great idea, but adding a
> deprecation WARNING/ERROR until some more PG releases might be worth
> looking at.
>
> > Thoughts on this plan? Does your patch make sense in v17 as a stepping
> > stone, or should we try to make all of these API changes together in
> > v18?
>
> If the design, code and benefits that these new Table AMs bring to the
> table look good, I hope to see it for PG 18.
>
> > Also, a sample AM code would be a huge benefit here. Writing a real AM
> > is hard, but perhaps we can at least have an example one to demonstrate
> > how to use these APIs?
>
> The attached patches already have implemented these new Table AMs for
> Heap. I don't think we need a separate implementation to demonstrate. If
> others feel so, I'm open to thoughts here.
>
> Having said above, I'd like to reiterate the motivation behind the new
> Table AMs for multi and single inserts.
>
> 1. A scan-like API with state being carried across is thought to be better
> as suggested by Andres Freund -
> https://www.postgresql.org/message-id/[email protected]
> .
> 2. Allowing a Table AM to optimize operations across multiple inserts,
> define its own buffering strategy and take its own flushing decisions based
> on their tuple storage kinds and various other AM specific factors.
> 3. Improve performance of various SQL commands with multi inserts for Heap
> AM.
>
> The attached v19 patches might need some more detailed comments, some
> documentation and some specific tests ensuring the multi inserts for Heap
> are kicked-in for various commands. I'm open to thoughts here.
>
> I did some testing to see how various commands benefit with multi inserts
> using these new Table AM for heap. It's not only the improvement in
> performance these commands see, but also the amount of WAL that gets
> generated reduces greatly. After all, multi inserts optimize the insertions
> by writing less WAL. IOW, writing WAL record per page if multiple rows fit
> into a single data page as opposed to WAL record per row.
>
> Test case 1: 100 million rows, 2 columns (int and float)
>
> Command                        | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS                | 121        | 77            | 36.3
>  | 1.57
> CREATE MATERIALIZED VIEW       | 101        | 49            | 51.4
>  | 2.06
> REFRESH MATERIALIZED VIEW      | 113        | 54            | 52.2
>  | 2.09
> ALTER TABLE (TABLE REWRITE)    | 124        | 81            | 34.6
>  | 1.53
> COPY FROM                      | 71         | 72            | 0
> | 1
> INSERT INTO ... SELECT         | 117        | 62            | 47
>  | 1.88
> LOGICAL REPLICATION APPLY      | 393        | 306           | 22.1
>  | 1.28
>
> Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS                | 6.8              | 2.4                 |
> 64.7         | 2.83
> CREATE MATERIALIZED VIEW       | 7.2              | 2.3                 |
> 68           | 3.13
> REFRESH MATERIALIZED VIEW      | 10               | 5.1                 |
> 49           | 1.96
> ALTER TABLE (TABLE REWRITE)    | 8                | 3.2                 |
> 60           | 2.5
> COPY FROM                      | 2.9              | 3                   |
> 0            | 1
> INSERT INTO ... SELECT         | 8                | 3                   |
> 62.5         | 2.66
> LOGICAL REPLICATION APPLY      | 7.5              | 2.3                 |
> 69.3         | 3.26
>
> Test case 2: 1 billion rows, 1 column (int)
>
> Command                        | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS                | 794        | 386           | 51.38
> | 2.05
> CREATE MATERIALIZED VIEW       | 1006       | 563           | 44.03
> | 1.78
> REFRESH MATERIALIZED VIEW      | 977        | 603           | 38.28
> | 1.62
> ALTER TABLE (TABLE REWRITE)    | 1189       | 714           | 39.94
> | 1.66
> COPY FROM                      | 321        | 330           | -0.02
> | 0.97
> INSERT INTO ... SELECT         | 1084       | 586           | 45.94
> | 1.84
> LOGICAL REPLICATION APPLY      | 3530       | 2982          | 15.52
> | 1.18
>
> Command                        | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS                | 60               | 12                  |
> 80           | 5
> CREATE MATERIALIZED VIEW       | 60               | 12                  |
> 80           | 5
> REFRESH MATERIALIZED VIEW      | 60               | 12                  |
> 80           | 5
> ALTER TABLE (TABLE REWRITE)    | 123              | 31                  |
> 60           | 2.5
> COPY FROM                      | 12               | 12                  |
> 0            | 1
> INSERT INTO ... SELECT         | 120              | 24                  |
> 80           | 5
> LOGICAL REPLICATION APPLY      | 61               | 12                  |
> 80.32        | 5
>

looks pretty impressive!

Pavel


>
> Test setup:
> ./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
> install.log && make -j 8 install > install.log 2>&1 &
>
> wal_level=logical
> max_wal_size = 256GB
> checkpoint_timeout = 1h
>
> Test system is EC2 instance of type c5.4xlarge:
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         46 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  16
>   On-line CPU(s) list:   0-15
> Vendor ID:               GenuineIntel
>   Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
>     CPU family:          6
>     Model:               85
>     Thread(s) per core:  2
>     Core(s) per socket:  8
>     Socket(s):           1
>     Stepping:            7
>     BogoMIPS:            5999.99
> Caches (sum of all):
>   L1d:                   256 KiB (8 instances)
>   L1i:                   256 KiB (8 instances)
>   L2:                    8 MiB (8 instances)
>   L3:                    35.8 MiB (1 instance)
> NUMA:
>   NUMA node(s):          1
>   NUMA node0 CPU(s):     0-15
> RAM:
>   MemTotal:       32036536 kB
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-04-25 16:41                   ` Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-04-25 16:41 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, 2024-04-24 at 18:19 +0530, Bharath Rupireddy wrote:
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER
> ROW INSERT triggers can be executed in this callback. See the v19-
> 0001 patch for how AM invokes the flush callback, and see either v19-
> 0003 or v19-0004 or v19-0005 for how a caller can supply the callback
> and required context to execute index insertions and AR triggers.

The flush callback takes a pointer to an array of slot pointers, and I
don't think that's the right API. I think the callback should be called
on each slot individually.

We shouldn't assume that a table AM stores buffered inserts as an array
of slot pointers. A TupleTableSlot has a fair amount of memory overhead
(64 bytes), so most AMs wouldn't want to pay that overhead for every
tuple. COPY does, but that's because the number of buffered tuples is
fairly small.

> 
> 
> > 11. Deprecate the multi_insert API.
> 
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006.

That's OK with me. Let's leave those functions out for now.

> 
> If the design, code and benefits that these new Table AMs bring to
> the table look good, I hope to see it for PG 18.

Sounds good.

Regards,
	Jeff Davis








^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-04-29 06:06                     ` Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-04-29 06:06 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Thu, Apr 25, 2024 at 10:11 PM Jeff Davis <[email protected]> wrote:
>
> On Wed, 2024-04-24 at 18:19 +0530, Bharath Rupireddy wrote:
> > I added a flush callback named TableModifyBufferFlushCallback; when
> > provided by callers invoked after tuples are flushed to disk from the
> > buffers but before the AM frees them up. Index insertions and AFTER
> > ROW INSERT triggers can be executed in this callback. See the v19-
> > 0001 patch for how AM invokes the flush callback, and see either v19-
> > 0003 or v19-0004 or v19-0005 for how a caller can supply the callback
> > and required context to execute index insertions and AR triggers.
>
> The flush callback takes a pointer to an array of slot pointers, and I
> don't think that's the right API. I think the callback should be called
> on each slot individually.
>
> We shouldn't assume that a table AM stores buffered inserts as an array
> of slot pointers. A TupleTableSlot has a fair amount of memory overhead
> (64 bytes), so most AMs wouldn't want to pay that overhead for every
> tuple. COPY does, but that's because the number of buffered tuples is
> fairly small.

I get your point. An AM can choose to implement the buffering strategy
by just storing the plain tuple rather than the tuple slots in which
case the flush callback with an array of tuple slots won't work.
Therefore, I now changed the flush callback to accept only a single
tuple slot.

> > > 11. Deprecate the multi_insert API.
> >
> > I did remove both table_multi_insert and table_finish_bulk_insert in
> > v19-0006.
>
> That's OK with me. Let's leave those functions out for now.

Okay. Dropped the 0006 patch for now.

Please see the attached v20 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v20-0001-Introduce-new-Table-Access-Methods-for-single-an.patch (20.5K, 2-v20-0001-Introduce-new-Table-Access-Methods-for-single-an.patch)
  download | inline diff:
From 06cd6e242cd2fa3514aa1c76596bc4b4ad330040 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 27 Apr 2024 15:40:27 +0000
Subject: [PATCH v20 1/5] Introduce new Table Access Methods for single and
 multi inserts

---
 src/backend/access/heap/heapam.c         | 205 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableam.c       |  97 +++++++++++
 src/backend/access/table/tableamapi.c    |  10 ++
 src/include/access/heapam.h              |  44 +++++
 src/include/access/tableam.h             | 145 ++++++++++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 7 files changed, 509 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4a4cf76269..fdc50c42df 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -112,7 +113,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end_callback(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2608,6 +2609,208 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags,
+				  CommandId cid, int options,
+				  TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				  void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+			mistate->mem_cxt = AllocSetContextCreate(CurrentMemoryContext,
+													 "heap_multi_insert memory context",
+													 ALLOCSET_DEFAULT_SIZES);
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_callback = heap_modify_insert_end_callback;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * Memory allocated for the whole tuple is in slot's memory context, so
+	 * use it keep track of the total space occupied by all buffered tuples.
+	 */
+	if (TTS_SHOULDFREE(dstslot))
+		mistate->cur_size += MemoryContextMemAllocated(dstslot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert may leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_cxt);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate);
+
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_cxt);
+
+	if (state->modify_buffer_flush_callback != NULL)
+	{
+		for (int i = 0; i < mistate->cur_slots; i++)
+			state->modify_buffer_flush_callback(state->modify_buffer_flush_context,
+												mistate->slots[i]);
+	}
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end_callback(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_cxt);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..eda0c73a16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2615,6 +2615,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e57a0b7ea3..35a3e43c59 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -21,6 +21,7 @@
 
 #include <math.h>
 
+#include "access/heapam.h"		/* just for BulkInsertState */
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
+#include "utils/memutils.h"
 
 /*
  * Constants to control the behavior of block allocation to parallel workers
@@ -48,6 +50,7 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+static void default_table_modify_insert_end_callback(TableModifyState *state);
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -756,3 +759,97 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 	else
 		*allvisfrac = (double) relallvisible / curpages;
 }
+
+/*
+ * Initialize default table modify state.
+ */
+TableModifyState *
+default_table_modify_begin(Relation rel, int modify_flags,
+						   CommandId cid, int options,
+						   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+						   void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"default_table_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Default table modify implementation for inserts.
+ */
+void
+default_table_modify_buffer_insert(TableModifyState *state,
+								   TupleTableSlot *slot)
+{
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize default table modify state */
+	if (state->data == NULL)
+	{
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			state->data = (BulkInsertState) GetBulkInsertState();
+
+		state->modify_end_callback = default_table_modify_insert_end_callback;
+	}
+
+	/* Fallback to table AM single insert routine */
+	table_tuple_insert(state->rel,
+					   slot,
+					   state->cid,
+					   state->options,
+					   (BulkInsertState) state->data);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Default table modify implementation for flush.
+ */
+void
+default_table_modify_buffer_flush(TableModifyState *state)
+{
+	/* no-op */
+}
+
+/*
+ * Default table modify insert specific callback used for performing work at
+ * the end like cleaning up the bulk insert state.
+ */
+static void
+default_table_modify_insert_end_callback(TableModifyState *state)
+{
+	if (state->data != NULL)
+		FreeBulkInsertState((BulkInsertState) state->data);
+}
+
+/*
+ * Clean default table modify state.
+ */
+void
+default_table_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index ce637a5a5d..96ac951af6 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,16 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	/* optional, but either all of them are defined or none. */
+	Assert((routine->tuple_modify_begin == NULL &&
+			routine->tuple_modify_buffer_insert == NULL &&
+			routine->tuple_modify_buffer_flush == NULL &&
+			routine->tuple_modify_end == NULL) ||
+		   (routine->tuple_modify_begin != NULL &&
+			routine->tuple_modify_buffer_insert != NULL &&
+			routine->tuple_modify_buffer_flush != NULL &&
+			routine->tuple_modify_end != NULL));
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a5045ce..c10ebbb5ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -271,6 +271,38 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+
+	MemoryContext mem_cxt;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -321,6 +353,18 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+										   void *modify_buffer_flush_context);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..2e96154d6e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -255,6 +255,42 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+struct TableModifyState;
+
+/* Callback invoked for each tuple that gets flushed to disk from buffer */
+typedef void (*TableModifyBufferFlushCallback) (void *context,
+												TupleTableSlot *slot);
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCallback) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mem_cxt;
+	CommandId	cid;
+	int			options;
+
+	/* Flush callback and its context */
+	TableModifyBufferFlushCallback modify_buffer_flush_callback;
+	void	   *modify_buffer_flush_context;
+
+	/* Table AM specific data */
+	void	   *data;
+
+	TableModifyEndCallback modify_end_callback;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -578,6 +614,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCallback modify_buffer_flush_callback,
+											 void *modify_buffer_flush_context);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1609,6 +1660,100 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+extern TableModifyState *default_table_modify_begin(Relation rel, int modify_flags,
+													CommandId cid, int options,
+													TableModifyBufferFlushCallback modify_buffer_flush_callback,
+													void *modify_buffer_flush_context);
+extern void default_table_modify_buffer_insert(TableModifyState *state,
+											   TupleTableSlot *slot);
+extern void default_table_modify_buffer_flush(TableModifyState *state);
+extern void default_table_modify_end(TableModifyState *state);
+
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options,
+				   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				   void *modify_buffer_flush_context)
+{
+	if (rel->rd_tableam &&
+		rel->rd_tableam->tuple_modify_begin != NULL)
+	{
+		return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+												   cid, options,
+												   modify_buffer_flush_callback,
+												   modify_buffer_flush_context);
+	}
+	else if (rel->rd_tableam &&
+			 rel->rd_tableam->tuple_modify_begin == NULL)
+	{
+		/* Fallback to a default implementation */
+		return default_table_modify_begin(rel, modify_flags,
+										  cid, options,
+										  modify_buffer_flush_callback,
+										  modify_buffer_flush_context);
+	}
+	else
+		Assert(false);
+
+	return NULL;
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_insert == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_insert(state, slot);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_flush != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_flush == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_flush(state);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_end == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_end(state);
+	}
+	else
+		Assert(false);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e10ff28ee5..11744f2ccc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1130,6 +1130,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2843,6 +2845,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v20-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch (7.0K, 3-v20-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch)
  download | inline diff:
From dc9b8b01d586b64d18628424fff9ed14b20b92d7 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 27 Apr 2024 15:41:12 +0000
Subject: [PATCH v20 2/5] Optimize CTAS, CMV, RMV and TABLE REWRITES with multi
 inserts

---
 src/backend/commands/createas.c  | 27 +++++++++++----------------
 src/backend/commands/matview.c   | 26 +++++++++++---------------
 src/backend/commands/tablecmds.c | 31 +++++++++++--------------------
 3 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..2d6fffbf07 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,21 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 NULL,
+											 NULL);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +592,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +610,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..bb97e2fa5f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,14 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN,
+										 NULL,
+										 NULL);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +488,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +503,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3556240c8e..0c984aa656 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6060,10 +6060,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableModifyState *mstate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -6082,18 +6080,15 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 * Prepare a BulkInsertState and options for table_tuple_insert.  The FSM
 	 * is empty, so don't bother using it.
 	 */
-	if (newrel)
+	if (newrel && mstate == NULL)
 	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
-	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		mstate = table_modify_begin(newrel,
+									TM_FLAG_MULTI_INSERTS |
+									TM_FLAG_BAS_BULKWRITE,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									NULL,
+									NULL);
 	}
 
 	/*
@@ -6392,8 +6387,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_modify_buffer_insert(mstate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -6414,10 +6408,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
-
-		table_finish_bulk_insert(newrel, ti_options);
-
+		table_modify_end(mstate);
 		table_close(newrel, NoLock);
 	}
 }
-- 
2.34.1



  [application/x-patch] v20-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch (9.2K, 4-v20-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch)
  download | inline diff:
From e60cb6e85ac4e319a3b619e24f66f6b44808fe70 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 27 Apr 2024 15:45:49 +0000
Subject: [PATCH v20 3/5] Optimize INSERT INTO ... SELECT with multi inserts

---
 src/backend/executor/nodeModifyTable.c | 170 ++++++++++++++++++++++---
 src/tools/pgindent/typedefs.list       |   1 +
 2 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3659..cd044c9dee 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -114,6 +114,18 @@ typedef struct UpdateContext
 	LockTupleMode lockmode;
 } UpdateContext;
 
+typedef struct InsertModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+	ModifyTableState *mtstate;
+} InsertModifyBufferFlushContext;
+
+static InsertModifyBufferFlushContext *insert_modify_buffer_flush_context = NULL;
+static TableModifyState *table_modify_state = NULL;
+
+static void InsertModifyBufferFlushCallback(void *context,
+											TupleTableSlot *slot);
 
 static void ExecBatchInsert(ModifyTableState *mtstate,
 							ResultRelInfo *resultRelInfo,
@@ -726,6 +738,55 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
 	return ExecProject(newProj);
 }
 
+static void
+InsertModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	InsertModifyBufferFlushContext *ctx = (InsertModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	ModifyTableState *mtstate = ctx->mtstate;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 mtstate->mt_transition_capture);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInsert
  *
@@ -751,7 +812,8 @@ ExecInsert(ModifyTableContext *context,
 		   TupleTableSlot *slot,
 		   bool canSetTag,
 		   TupleTableSlot **inserted_tuple,
-		   ResultRelInfo **insert_destrel)
+		   ResultRelInfo **insert_destrel,
+		   bool canMultiInsert)
 {
 	ModifyTableState *mtstate = context->mtstate;
 	EState	   *estate = context->estate;
@@ -764,6 +826,7 @@ ExecInsert(ModifyTableContext *context,
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	MemoryContext oldContext;
+	bool		ar_insert_triggers_executed = false;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -1126,17 +1189,53 @@ ExecInsert(ModifyTableContext *context,
 		}
 		else
 		{
-			/* insert the tuple normally */
-			table_tuple_insert(resultRelationDesc, slot,
-							   estate->es_output_cid,
-							   0, NULL);
+			if (canMultiInsert &&
+				proute == NULL &&
+				resultRelInfo->ri_WithCheckOptions == NIL &&
+				resultRelInfo->ri_projectReturning == NULL)
+			{
+				if (insert_modify_buffer_flush_context == NULL)
+				{
+					insert_modify_buffer_flush_context =
+						(InsertModifyBufferFlushContext *) palloc0(sizeof(InsertModifyBufferFlushContext));
+					insert_modify_buffer_flush_context->resultRelInfo = resultRelInfo;
+					insert_modify_buffer_flush_context->estate = estate;
+					insert_modify_buffer_flush_context->mtstate = mtstate;
+				}
 
-			/* insert index entries for tuple */
-			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-													   slot, estate, false,
-													   false, NULL, NIL,
-													   false);
+				if (table_modify_state == NULL)
+				{
+					table_modify_state = table_modify_begin(resultRelInfo->ri_RelationDesc,
+															TM_FLAG_MULTI_INSERTS,
+															estate->es_output_cid,
+															0,
+															InsertModifyBufferFlushCallback,
+															insert_modify_buffer_flush_context);
+				}
+
+				table_modify_buffer_insert(table_modify_state, slot);
+				ar_insert_triggers_executed = true;
+			}
+			else
+			{
+				/* insert the tuple normally */
+				table_tuple_insert(resultRelationDesc, slot,
+								   estate->es_output_cid,
+								   0, NULL);
+
+				/* insert index entries for tuple */
+				if (resultRelInfo->ri_NumIndices > 0)
+					recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+														   slot, estate, false,
+														   false, NULL, NIL,
+														   false);
+
+				ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+									 mtstate->mt_transition_capture);
+
+				list_free(recheckIndexes);
+				ar_insert_triggers_executed = true;
+			}
 		}
 	}
 
@@ -1170,10 +1269,12 @@ ExecInsert(ModifyTableContext *context,
 	}
 
 	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
-
-	list_free(recheckIndexes);
+	if (!ar_insert_triggers_executed)
+	{
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 ar_insert_trig_tcs);
+		list_free(recheckIndexes);
+	}
 
 	/*
 	 * Check any WITH CHECK OPTION constraints from parent views.  We are
@@ -1869,7 +1970,7 @@ ExecCrossPartitionUpdate(ModifyTableContext *context,
 	/* Tuple routing starts from the root table. */
 	context->cpUpdateReturningSlot =
 		ExecInsert(context, mtstate->rootResultRelInfo, slot, canSetTag,
-				   inserted_tuple, insert_destrel);
+				   inserted_tuple, insert_destrel, false);
 
 	/*
 	 * Reset the transition state that may possibly have been written by
@@ -3364,7 +3465,7 @@ ExecMergeNotMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 				mtstate->mt_merge_action = action;
 
 				rslot = ExecInsert(context, mtstate->rootResultRelInfo,
-								   newslot, canSetTag, NULL, NULL);
+								   newslot, canSetTag, NULL, NULL, false);
 				mtstate->mt_merge_inserted += 1;
 				break;
 			case CMD_NOTHING:
@@ -3749,6 +3850,10 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		canMultiInsert = false;
+
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -3844,6 +3949,10 @@ ExecModifyTable(PlanState *pstate)
 		if (TupIsNull(context.planSlot))
 			break;
 
+		if (operation == CMD_INSERT &&
+			nodeTag(subplanstate) == T_SeqScanState)
+			canMultiInsert = true;
+
 		/*
 		 * When there are multiple result relations, each tuple contains a
 		 * junk column that gives the OID of the rel from which it came.
@@ -4057,7 +4166,7 @@ ExecModifyTable(PlanState *pstate)
 					ExecInitInsertProjection(node, resultRelInfo);
 				slot = ExecGetInsertNewTuple(resultRelInfo, context.planSlot);
 				slot = ExecInsert(&context, resultRelInfo, slot,
-								  node->canSetTag, NULL, NULL);
+								  node->canSetTag, NULL, NULL, canMultiInsert);
 				break;
 
 			case CMD_UPDATE:
@@ -4116,6 +4225,17 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	if (table_modify_state != NULL)
+	{
+		Assert(operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Insert remaining tuples for batch insert.
 	 */
@@ -4228,6 +4348,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_merge_updated = 0;
 	mtstate->mt_merge_deleted = 0;
 
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
+
 	/*----------
 	 * Resolve the target relation. This is the same as:
 	 *
@@ -4681,6 +4804,17 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	if (table_modify_state != NULL)
+	{
+		Assert(node->operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 11744f2ccc..57aabf51d8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1226,6 +1226,7 @@ InjectionPointEntry
 InjectionPointSharedState
 InlineCodeBlock
 InProgressIO
+InsertModifyBufferFlushContext
 InsertStmt
 Instrumentation
 Int128AggState
-- 
2.34.1



  [application/x-patch] v20-0004-Optimize-Logical-Replication-apply-with-multi-in.patch (19.7K, 5-v20-0004-Optimize-Logical-Replication-apply-with-multi-in.patch)
  download | inline diff:
From 4040ada4886220bb979edab83f7f88d32679dcbd Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 27 Apr 2024 15:48:25 +0000
Subject: [PATCH v20 4/5] Optimize Logical Replication apply with multi inserts

---
 src/backend/executor/execReplication.c   |  39 +++
 src/backend/replication/logical/proto.c  |  24 ++
 src/backend/replication/logical/worker.c | 351 ++++++++++++++++++++++-
 src/include/executor/executor.h          |   4 +
 src/include/replication/logicalproto.h   |   2 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 409 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd577..fae1375537 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -544,6 +544,45 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 	}
 }
 
+void
+ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+						ResultRelInfo *resultRelInfo,
+						EState *estate, TupleTableSlot *slot)
+{
+	bool		skip_tuple = false;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+
+	/* For now we support only tables. */
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+
+	CheckCmdReplicaIdentity(rel, CMD_INSERT);
+
+	/* BEFORE ROW INSERT Triggers */
+	if (resultRelInfo->ri_TrigDesc &&
+		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
+	{
+		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			skip_tuple = true;	/* "do nothing" */
+	}
+
+	if (!skip_tuple)
+	{
+		/* Compute stored generated columns */
+		if (rel->rd_att->constr &&
+			rel->rd_att->constr->has_generated_stored)
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
+
+		/* Check the constraints of the tuple */
+		if (rel->rd_att->constr)
+			ExecConstraints(resultRelInfo, slot, estate);
+		if (rel->rd_rel->relispartition)
+			ExecPartitionCheck(resultRelInfo, slot, estate, true);
+
+		table_modify_buffer_insert(MultiInsertState, slot);
+	}
+}
+
 /*
  * Find the searchslot tuple and update it with data in the slot,
  * update the indexes, and execute any constraints and per-row triggers.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 95c09c9516..46d38aebd2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -427,6 +427,30 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	logicalrep_write_tuple(out, rel, newslot, binary, columns);
 }
 
+LogicalRepRelId
+logicalrep_read_relid(StringInfo in)
+{
+	LogicalRepRelId relid;
+
+	/* read the relation id */
+	relid = pq_getmsgint(in, 4);
+
+	return relid;
+}
+
+void
+logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup)
+{
+	char		action;
+
+	action = pq_getmsgbyte(in);
+	if (action != 'N')
+		elog(ERROR, "expected new tuple but got %d",
+			 action);
+
+	logicalrep_read_tuple(in, newtup);
+}
+
 /*
  * Read INSERT from stream.
  *
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b5a80fe3e8..d62772f590 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -148,7 +148,6 @@
 #include <unistd.h>
 
 #include "access/table.h"
-#include "access/tableam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "catalog/indexing.h"
@@ -416,6 +415,30 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,
 													 ParallelApplyWorkerInfo **winfo);
 
+typedef enum LRMultiInsertReturnStatus
+{
+	LR_MULTI_INSERT_NONE = 0,
+	LR_MULTI_INSERT_REL_SKIPPED,
+	LR_MULTI_INSERT_DISALLOWED,
+	LR_MULTI_INSERT_DONE,
+} LRMultiInsertReturnStatus;
+
+static TableModifyState *MultiInsertState = NULL;
+static LogicalRepRelMapEntry *LastRel = NULL;
+static LogicalRepRelId LastMultiInsertRelId = InvalidOid;
+static ApplyExecutionData *LastEData = NULL;
+static TupleTableSlot *LastRemoteSlot = NULL;
+
+typedef struct LRModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} LRModifyBufferFlushContext;
+
+static LRModifyBufferFlushContext *modify_buffer_flush_context = NULL;
+static void LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
+static void FinishMultiInserts(void);
+
 /*
  * Form the origin name for the subscription.
  *
@@ -1017,6 +1040,8 @@ apply_handle_commit(StringInfo s)
 {
 	LogicalRepCommitData commit_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit(s, &commit_data);
 
 	if (commit_data.commit_lsn != remote_final_lsn)
@@ -1043,6 +1068,8 @@ apply_handle_begin_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData begin_data;
 
+	FinishMultiInserts();
+
 	/* Tablesync should never receive prepare. */
 	if (am_tablesync_worker())
 		ereport(ERROR,
@@ -1109,6 +1136,8 @@ apply_handle_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData prepare_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_prepare(s, &prepare_data);
 
 	if (prepare_data.prepare_lsn != remote_final_lsn)
@@ -1171,6 +1200,8 @@ apply_handle_commit_prepared(StringInfo s)
 	LogicalRepCommitPreparedTxnData prepare_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit_prepared(s, &prepare_data);
 	set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
 
@@ -1220,6 +1251,8 @@ apply_handle_rollback_prepared(StringInfo s)
 	LogicalRepRollbackPreparedTxnData rollback_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_rollback_prepared(s, &rollback_data);
 	set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
 
@@ -1277,6 +1310,8 @@ apply_handle_stream_prepare(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1304,6 +1339,8 @@ apply_handle_stream_prepare(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
 								   prepare_data.xid, prepare_data.prepare_lsn);
 
+			FinishMultiInserts();
+
 			/* Mark the transaction as prepared. */
 			apply_handle_prepare_internal(&prepare_data);
 
@@ -1407,6 +1444,8 @@ apply_handle_stream_prepare(StringInfo s)
 static void
 apply_handle_origin(StringInfo s)
 {
+	FinishMultiInserts();
+
 	/*
 	 * ORIGIN message can only come inside streaming transaction or inside
 	 * remote transaction and before any actual writes.
@@ -1473,6 +1512,8 @@ apply_handle_stream_start(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1628,6 +1669,8 @@ apply_handle_stream_stop(StringInfo s)
 	ParallelApplyWorkerInfo *winfo;
 	TransApplyAction apply_action;
 
+	FinishMultiInserts();
+
 	if (!in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1821,6 +1864,8 @@ apply_handle_stream_abort(StringInfo s)
 	StringInfoData original_msg = *s;
 	bool		toplevel_xact;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2138,6 +2183,8 @@ apply_handle_stream_commit(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2159,6 +2206,8 @@ apply_handle_stream_commit(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
 								   commit_data.commit_lsn);
 
+			FinishMultiInserts();
+
 			apply_handle_commit_internal(&commit_data);
 
 			/* Unlink the files with serialized changes and subxact info. */
@@ -2302,6 +2351,8 @@ apply_handle_relation(StringInfo s)
 {
 	LogicalRepRelation *rel;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
 		return;
 
@@ -2325,6 +2376,8 @@ apply_handle_type(StringInfo s)
 {
 	LogicalRepTyp typ;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
 		return;
 
@@ -2363,16 +2416,126 @@ TargetPrivilegesCheck(Relation rel, AclMode mode)
 						RelationGetRelationName(rel))));
 }
 
-/*
- * Handle INSERT message.
- */
+static void
+FinishMultiInserts(void)
+{
+	LogicalRepMsgType saved_command;
+
+	if (MultiInsertState == NULL)
+		return;
+
+	Assert(OidIsValid(LastMultiInsertRelId));
+	Assert(LastEData != NULL);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	ExecDropSingleTupleTableSlot(LastRemoteSlot);
+	LastRemoteSlot = NULL;
+
+	table_modify_end(MultiInsertState);
+	MultiInsertState = NULL;
+	LastMultiInsertRelId = InvalidOid;
+
+	pfree(modify_buffer_flush_context);
+	modify_buffer_flush_context = NULL;
+
+	ExecCloseIndices(LastEData->targetRelInfo);
+
+	finish_edata(LastEData);
+	LastEData = NULL;
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+
+	logicalrep_rel_close(LastRel, NoLock);
+	LastRel = NULL;
+
+	end_replication_step();
+}
 
 static void
-apply_handle_insert(StringInfo s)
+LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	LRModifyBufferFlushContext *ctx = (LRModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	LogicalRepMsgType saved_command;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 NULL);
+
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 NULL);
+	}
+
+	/*
+	 * XXX we should in theory pass a TransitionCaptureState object to the
+	 * above to capture transition tuples, but after statement triggers don't
+	 * actually get fired by replication yet anyway
+	 */
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+}
+
+static LRMultiInsertReturnStatus
+do_multi_inserts(StringInfo s, LogicalRepRelId *relid)
 {
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
-	LogicalRepRelId relid;
 	UserContext ucxt;
 	ApplyExecutionData *edata;
 	EState	   *estate;
@@ -2380,17 +2543,143 @@ apply_handle_insert(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	if (MultiInsertState == NULL)
+		begin_replication_step();
+
+	*relid = logicalrep_read_relid(s);
+
+	if (MultiInsertState != NULL &&
+		(LastMultiInsertRelId != InvalidOid &&
+		 *relid != InvalidOid &&
+		 LastMultiInsertRelId != *relid))
+		FinishMultiInserts();
+
+	if (MultiInsertState == NULL)
+		rel = logicalrep_rel_open(*relid, RowExclusiveLock);
+	else
+		rel = LastRel;
+
+	if (!should_apply_changes_for_rel(rel))
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_REL_SKIPPED;
+	}
+
+	/* For a partitioned table, let's not do multi inserts. */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_DISALLOWED;
+	}
+
 	/*
-	 * Quick return if we are skipping data modification changes or handling
-	 * streamed transactions.
+	 * Make sure that any user-supplied code runs as the table owner, unless
+	 * the user has opted out of that behavior.
 	 */
-	if (is_skipping_changes() ||
-		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
-		return;
+	run_as_owner = MySubscription->runasowner;
+	if (!run_as_owner)
+		SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = rel;
+
+	if (MultiInsertState == NULL)
+	{
+		oldctx = MemoryContextSwitchTo(TopTransactionContext);
+
+		/* Initialize the executor state. */
+		LastEData = edata = create_edata_for_relation(rel);
+		estate = edata->estate;
+
+		LastRemoteSlot = remoteslot = MakeTupleTableSlot(RelationGetDescr(rel->localrel),
+														 &TTSOpsVirtual);
+
+		modify_buffer_flush_context = (LRModifyBufferFlushContext *) palloc(sizeof(LRModifyBufferFlushContext));
+		modify_buffer_flush_context->resultRelInfo = edata->targetRelInfo;
+		modify_buffer_flush_context->estate = estate;
+
+		MultiInsertState = table_modify_begin(edata->targetRelInfo->ri_RelationDesc,
+											  TM_FLAG_MULTI_INSERTS |
+											  TM_FLAG_BAS_BULKWRITE,
+											  GetCurrentCommandId(true),
+											  0,
+											  LRModifyBufferFlushCallback,
+											  modify_buffer_flush_context);
+		LastRel = rel;
+		LastMultiInsertRelId = *relid;
+
+		/* We must open indexes here. */
+		ExecOpenIndices(edata->targetRelInfo, false);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+	else
+	{
+		CommandId	cid;
+
+		edata = LastEData;
+		estate = edata->estate;
+		ResetExprContext(GetPerTupleExprContext(estate));
+		ExecClearTuple(LastRemoteSlot);
+		remoteslot = LastRemoteSlot;
+		cid = GetCurrentCommandId(true);
+		MultiInsertState->cid = cid;
+		estate->es_output_cid = cid;
+	}
+
+	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
+	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+	slot_store_data(remoteslot, rel, &newtup);
+	slot_fill_defaults(rel, estate, remoteslot);
+	MemoryContextSwitchTo(oldctx);
+
+	TargetPrivilegesCheck(edata->targetRelInfo->ri_RelationDesc, ACL_INSERT);
+	ExecRelationMultiInsert(MultiInsertState, edata->targetRelInfo, estate, remoteslot);
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	if (!run_as_owner)
+		RestoreUserContext(&ucxt);
+
+	Assert(MultiInsertState != NULL);
+
+	CommandCounterIncrement();
+
+	return LR_MULTI_INSERT_DONE;
+}
+
+static bool
+do_single_inserts(StringInfo s, LogicalRepRelId relid)
+{
+	LogicalRepRelMapEntry *rel;
+	LogicalRepTupleData newtup;
+	UserContext ucxt;
+	ApplyExecutionData *edata;
+	EState	   *estate;
+	TupleTableSlot *remoteslot;
+	MemoryContext oldctx;
+	bool		run_as_owner;
+
+	Assert(relid != InvalidOid);
 
 	begin_replication_step();
 
-	relid = logicalrep_read_insert(s, &newtup);
 	rel = logicalrep_rel_open(relid, RowExclusiveLock);
 	if (!should_apply_changes_for_rel(rel))
 	{
@@ -2400,7 +2689,7 @@ apply_handle_insert(StringInfo s)
 		 */
 		logicalrep_rel_close(rel, RowExclusiveLock);
 		end_replication_step();
-		return;
+		return false;
 	}
 
 	/*
@@ -2422,6 +2711,7 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 
 	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	slot_store_data(remoteslot, rel, &newtup);
 	slot_fill_defaults(rel, estate, remoteslot);
@@ -2446,6 +2736,35 @@ apply_handle_insert(StringInfo s)
 	logicalrep_rel_close(rel, NoLock);
 
 	end_replication_step();
+
+	return true;
+}
+
+/*
+ * Handle INSERT message.
+ */
+static void
+apply_handle_insert(StringInfo s)
+{
+	LRMultiInsertReturnStatus mi_status;
+	LogicalRepRelId relid;
+
+	/*
+	 * Quick return if we are skipping data modification changes or handling
+	 * streamed transactions.
+	 */
+	if (is_skipping_changes() ||
+		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+		return;
+
+	mi_status = do_multi_inserts(s, &relid);
+	if (mi_status == LR_MULTI_INSERT_REL_SKIPPED ||
+		mi_status == LR_MULTI_INSERT_DONE)
+		return;
+
+	do_single_inserts(s, relid);
+
+	return;
 }
 
 /*
@@ -2532,6 +2851,8 @@ apply_handle_update(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -2713,6 +3034,8 @@ apply_handle_delete(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -3154,6 +3477,8 @@ apply_handle_truncate(StringInfo s)
 	ListCell   *lc;
 	LOCKMODE	lockmode = AccessExclusiveLock;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..8f10ea977b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -14,6 +14,7 @@
 #ifndef EXECUTOR_H
 #define EXECUTOR_H
 
+#include "access/tableam.h"
 #include "executor/execdesc.h"
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
@@ -656,6 +657,9 @@ extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 									 EState *estate, TupleTableSlot *slot);
+extern void ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+									ResultRelInfo *resultRelInfo,
+									EState *estate, TupleTableSlot *slot);
 extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index c409638a2e..3f3a7f0a31 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -226,6 +226,8 @@ extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
 									bool binary, Bitmapset *columns);
+extern LogicalRepRelId logicalrep_read_relid(StringInfo in);
+extern void logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 57aabf51d8..9582503bb4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1454,6 +1454,8 @@ LPTHREAD_START_ROUTINE
 LPTSTR
 LPVOID
 LPWSTR
+LRModifyBufferFlushContext
+LRMultiInsertReturnStatus
 LSEG
 LUID
 LVRelState
-- 
2.34.1



  [application/x-patch] v20-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch (14.4K, 6-v20-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch)
  download | inline diff:
From 684d846c53dd47e8b1654d8a58e8cca0194bdf12 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Apr 2024 05:35:58 +0000
Subject: [PATCH v20 5/5] Use new multi insert Table AM for COPY FROM

---
 contrib/test_decoding/expected/stream.out |   2 +-
 src/backend/commands/copyfrom.c           | 235 ++++++++++++++--------
 src/include/commands/copyfrom_internal.h  |   4 +-
 src/tools/pgindent/typedefs.list          |   1 +
 4 files changed, 160 insertions(+), 82 deletions(-)

diff --git a/contrib/test_decoding/expected/stream.out b/contrib/test_decoding/expected/stream.out
index 4ab2d47bf8..c19facb3c9 100644
--- a/contrib/test_decoding/expected/stream.out
+++ b/contrib/test_decoding/expected/stream.out
@@ -101,10 +101,10 @@ SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,NULL, 'incl
  streaming change for transaction
  streaming change for transaction
  streaming change for transaction
- streaming change for transaction
  closing a streamed block for transaction
  opening a streamed block for transaction
  streaming change for transaction
+ streaming change for transaction
  closing a streamed block for transaction
  committing streamed transaction
 (17 rows)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..403adfe481 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -71,14 +71,25 @@
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+typedef struct CopyModifyBufferFlushContext
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} CopyModifyBufferFlushContext;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *multislot;
+	CopyModifyBufferFlushContext *modify_buffer_flush_context;
 	int			nused;			/* number of 'slots' containing tuples */
+	int			currslotno;		/* Current buffered slot number that's being
+								 * flushed; Used to get correct cur_lineno for
+								 * errors while in flush callback. */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
 } CopyMultiInsertBuffer;
@@ -99,6 +110,7 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
+static void CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
 
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
@@ -218,14 +230,38 @@ CopyLimitPrintoutLength(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->modify_buffer_flush_context = (CopyModifyBufferFlushContext *) palloc(sizeof(CopyModifyBufferFlushContext));
+		buffer->modify_buffer_flush_context->cstate = cstate;
+		buffer->modify_buffer_flush_context->resultRelInfo = rri;
+		buffer->modify_buffer_flush_context->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											TM_FLAG_MULTI_INSERTS |
+											TM_FLAG_BAS_BULKWRITE,
+											miinfo->mycid,
+											miinfo->ti_options,
+											CopyModifyBufferFlushCallback,
+											buffer->modify_buffer_flush_context);
+		buffer->slots = NULL;
+		buffer->multislot = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->multislot = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -236,11 +272,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -273,7 +310,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -317,8 +354,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -390,13 +425,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -404,56 +434,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
+		table_modify_buffer_flush(buffer->mstate);
 
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-			ExecClearTuple(slots[i]);
-		}
+		/*
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
+		 */
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -469,6 +461,60 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	buffer->nused = 0;
 }
 
+static void
+CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	CopyModifyBufferFlushContext *ctx = (CopyModifyBufferFlushContext *) context;
+	CopyFromState cstate = ctx->cstate;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 cstate->transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 cstate->transition_capture);
+	}
+
+	Assert(buffer->currslotno <= buffer->nused);
+}
+
 /*
  * Drop used slots and free member for this buffer.
  *
@@ -489,19 +535,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->multislot);
+		pfree(buffer->modify_buffer_flush_context);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -588,13 +633,32 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->multislot == NULL)
+			buffer->multislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												   &TTSOpsVirtual);
+
+		/* Caller must clear the slot */
+		slot = buffer->multislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -608,7 +672,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -616,6 +684,14 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	/* Record this slot as being used */
 	buffer->nused++;
 
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->multislot);
+		buffer->currslotno = 0;
+
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+
 	/* Update how many tuples are stored and their size */
 	miinfo->bufferedTuples++;
 	miinfo->bufferedBytes += tuplen;
@@ -830,7 +906,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -1080,7 +1156,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9582503bb4..0f0ad30188 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -486,6 +486,7 @@ CopyHeaderChoice
 CopyInsertMethod
 CopyMethod
 CopyLogVerbosityChoice
+CopyModifyBufferFlushContext
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-05-15 07:26                       ` Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-05-15 07:26 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, Apr 29, 2024 at 11:36 AM Bharath Rupireddy
<[email protected]> wrote:
>
> Please see the attached v20 patch set.

It looks like with the use of the new multi insert table access method
(TAM) for COPY (v20-0005), pgbench regressed about 35% [1]. The reason
is that the memory-based flushing decision the new TAM takes [2]
differs from that of what the COPY does today with table_multi_insert.
The COPY with table_multi_insert, maintains exact size of the tuples
in CopyFromState after it does the line parsing. For instance, the
tuple size of a table with two integer columns is 8 (4+4) bytes here.
The new TAM relies on the memory occupied by the slot's memory context
which holds the actual tuple as a good approximation for the tuple
size. But, this memory context size also includes a tuple header, so
the size here is not just 8 (4+4) bytes but more. Because of this, the
buffers get flushed sooner than that of the existing COPY with
table_multi_insert AM causing regression in pgbench which uses COPY
extensively. The new TAM aren't designed to be able to receive tuple
sizes from the callers, even if we do that, the API doesn't look
generic.

Here are couple of ideas to get away with this:

1. Try to get the actual tuple sizes excluding header sizes for each
column in the new TAM.
2. Try not to use the new TAM for COPY in which case the
table_multi_insert stays forever.
3. Try passing a flag to tell the new TAM that the caller does the
flushing and no need for an internal flushing.

I haven't explored the idea (1) in depth yet. If we find a way to do
so, it looks to me that we are going backwards since we need to strip
off headers for each column of a row for all of the rows. I suspect
this would cost a bit more and may not solve the regression.

With an eventual goal to get rid of table_multi_insert, (3) may not be
a better choice.

(3) seems reasonable to implement and reduce the regression. I did so
in the attached v21 patches. A new flag TM_SKIP_INTERNAL_BUFFER_FLUSH
is introduced in v21 patch, when specified, no internal flushing is
done, the caller has to flush the buffered tuples using
table_modify_buffer_flush(). Check the test results [3] HEAD 2.948 s,
PATCHED 2.946 s.

v21 also adds code to maintain tuple size for virtual tuple slots.
This helps make better memory-based flushing decisions in the new TAM.

Thoughts?

[1]
HEAD:
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.99 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.78 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.88 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.97 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.86 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.05 s, vacuum 0.21 s, primary keys 0.64 s).
done in 2.83 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.66 s).
done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.95 s, vacuum 0.20 s, primary keys 0.63 s).
done in 2.79 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.89 s, vacuum 0.21 s, primary keys 0.69 s).
done in 3.75 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.17 s, vacuum 0.32 s, primary keys 1.25 s).
done in 3.86 s (drop tables 0.00 s, create tables 0.08 s, client-side
generate 2.97 s, vacuum 0.21 s, primary keys 0.59 s).

AVG done in 2.948 s

v20 PATCHED:
done in 3.94 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.12 s, vacuum 0.19 s, primary keys 0.62 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.22 s, vacuum 0.20 s, primary keys 0.61 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.67 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.00 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.17 s, vacuum 0.20 s, primary keys 0.63 s).
done in 4.43 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.24 s, vacuum 0.21 s, primary keys 0.98 s).
done in 4.16 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.36 s, vacuum 0.20 s, primary keys 0.59 s).
done in 3.62 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.83 s, vacuum 0.20 s, primary keys 0.58 s).
done in 3.67 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.84 s, vacuum 0.21 s, primary keys 0.61 s).

AVG done in 3.986 s

[2]
+    /*
+     * Memory allocated for the whole tuple is in slot's memory context, so
+     * use it keep track of the total space occupied by all buffered tuples.
+     */
+    if (TTS_SHOULDFREE(slot))
+        mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+    if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+        mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+        heap_modify_buffer_flush(state);

[3]
v21 PATCHED:
done in 2.92 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.12 s, vacuum 0.21 s, primary keys 0.59 s).
done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.61 s).
done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.05 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.00 s, vacuum 0.21 s, primary keys 0.59 s).
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s).
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.03 s, vacuum 0.20 s, primary keys 0.59 s).
done in 2.85 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s).
done in 3.48 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.44 s, vacuum 0.23 s, primary keys 0.80 s).
done in 3.05 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.28 s, vacuum 0.21 s, primary keys 0.55 s).

AVG done in 2.946 s

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v21-0001-Introduce-new-Table-Access-Methods-for-single-an.patch (22.0K, 2-v21-0001-Introduce-new-Table-Access-Methods-for-single-an.patch)
  download | inline diff:
From 536616aaedab560084e69a3c389fd2bc2e1d489c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 15 May 2024 05:42:38 +0000
Subject: [PATCH v21 1/5] Introduce new Table Access Methods for single and
 multi inserts

---
 src/backend/access/heap/heapam.c         | 206 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableam.c       |  97 +++++++++++
 src/backend/access/table/tableamapi.c    |  10 ++
 src/backend/executor/execTuples.c        |   4 +
 src/include/access/heapam.h              |  44 +++++
 src/include/access/tableam.h             | 151 +++++++++++++++++
 src/include/executor/tuptable.h          |   6 +
 src/tools/pgindent/typedefs.list         |   3 +
 9 files changed, 526 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 4be0dee4de..b14b3379aa 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -112,7 +113,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end_callback(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2608,6 +2609,209 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags,
+				  CommandId cid, int options,
+				  TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				  void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+
+		if ((state->modify_flags & TM_FLAG_MULTI_INSERTS) != 0)
+		{
+			mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+			mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+			istate->mistate = mistate;
+			mistate->mem_cxt = AllocSetContextCreate(CurrentMemoryContext,
+													 "heap_multi_insert memory context",
+													 ALLOCSET_DEFAULT_SIZES);
+		}
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_callback = heap_modify_insert_end_callback;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	Assert(TTS_IS_VIRTUAL(dstslot));
+
+	/*
+	 * Note that the copy clears the previous destination slot contents, so
+	 * there's no need of explicit ExecClearTuple here.
+	 */
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+	mistate->cur_size += ((VirtualTupleTableSlot *) dstslot)->sz;
+
+	if ((state->modify_flags & TM_SKIP_INTERNAL_BUFFER_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	/* Quick exit if we have flushed already */
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert may leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_cxt);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate);
+
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_cxt);
+
+	if (state->modify_buffer_flush_callback != NULL)
+	{
+		for (int i = 0; i < mistate->cur_slots; i++)
+			state->modify_buffer_flush_callback(state->modify_buffer_flush_context,
+												mistate->slots[i]);
+	}
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end_callback(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0 &&
+			   mistate->cur_size == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_cxt);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..eda0c73a16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2615,6 +2615,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index e57a0b7ea3..35a3e43c59 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -21,6 +21,7 @@
 
 #include <math.h>
 
+#include "access/heapam.h"		/* just for BulkInsertState */
 #include "access/syncscan.h"
 #include "access/tableam.h"
 #include "access/xact.h"
@@ -29,6 +30,7 @@
 #include "storage/bufmgr.h"
 #include "storage/shmem.h"
 #include "storage/smgr.h"
+#include "utils/memutils.h"
 
 /*
  * Constants to control the behavior of block allocation to parallel workers
@@ -48,6 +50,7 @@
 char	   *default_table_access_method = DEFAULT_TABLE_ACCESS_METHOD;
 bool		synchronize_seqscans = true;
 
+static void default_table_modify_insert_end_callback(TableModifyState *state);
 
 /* ----------------------------------------------------------------------------
  * Slot functions.
@@ -756,3 +759,97 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
 	else
 		*allvisfrac = (double) relallvisible / curpages;
 }
+
+/*
+ * Initialize default table modify state.
+ */
+TableModifyState *
+default_table_modify_begin(Relation rel, int modify_flags,
+						   CommandId cid, int options,
+						   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+						   void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(CurrentMemoryContext,
+									"default_table_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Default table modify implementation for inserts.
+ */
+void
+default_table_modify_buffer_insert(TableModifyState *state,
+								   TupleTableSlot *slot)
+{
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize default table modify state */
+	if (state->data == NULL)
+	{
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			state->data = (BulkInsertState) GetBulkInsertState();
+
+		state->modify_end_callback = default_table_modify_insert_end_callback;
+	}
+
+	/* Fallback to table AM single insert routine */
+	table_tuple_insert(state->rel,
+					   slot,
+					   state->cid,
+					   state->options,
+					   (BulkInsertState) state->data);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Default table modify implementation for flush.
+ */
+void
+default_table_modify_buffer_flush(TableModifyState *state)
+{
+	/* no-op */
+}
+
+/*
+ * Default table modify insert specific callback used for performing work at
+ * the end like cleaning up the bulk insert state.
+ */
+static void
+default_table_modify_insert_end_callback(TableModifyState *state)
+{
+	if (state->data != NULL)
+		FreeBulkInsertState((BulkInsertState) state->data);
+}
+
+/*
+ * Clean default table modify state.
+ */
+void
+default_table_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index ce637a5a5d..96ac951af6 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,16 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	/* optional, but either all of them are defined or none. */
+	Assert((routine->tuple_modify_begin == NULL &&
+			routine->tuple_modify_buffer_insert == NULL &&
+			routine->tuple_modify_buffer_flush == NULL &&
+			routine->tuple_modify_end == NULL) ||
+		   (routine->tuple_modify_begin != NULL &&
+			routine->tuple_modify_buffer_insert != NULL &&
+			routine->tuple_modify_buffer_flush != NULL &&
+			routine->tuple_modify_end != NULL));
+
 	return routine;
 }
 
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 00dc339615..e02228858a 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -97,6 +97,7 @@ const TupleTableSlotOps TTSOpsBufferHeapTuple;
 static void
 tts_virtual_init(TupleTableSlot *slot)
 {
+	((VirtualTupleTableSlot *) slot)->sz = 0;
 }
 
 static void
@@ -113,6 +114,7 @@ tts_virtual_clear(TupleTableSlot *slot)
 
 		pfree(vslot->data);
 		vslot->data = NULL;
+		vslot->sz = 0;
 
 		slot->tts_flags &= ~TTS_FLAG_SHOULDFREE;
 	}
@@ -212,6 +214,8 @@ tts_virtual_materialize(TupleTableSlot *slot)
 		}
 	}
 
+	vslot->sz = sz;
+
 	/* all data is byval */
 	if (sz == 0)
 		return;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a5045ce..c10ebbb5ea 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -271,6 +271,38 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Approximate size of all tuples currently held in buffered slots */
+	Size		cur_size;
+
+	MemoryContext mem_cxt;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -321,6 +353,18 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+										   void *modify_buffer_flush_context);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..d9b2f4e03a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -255,6 +255,48 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TM_FLAG_MULTI_INSERTS	0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000002
+
+/*
+ * Skip internal flush of buffered tuples. Caller needs to flush via
+ * table_modify_buffer_flush().
+ */
+#define TM_SKIP_INTERNAL_BUFFER_FLUSH  0x000004
+
+struct TableModifyState;
+
+/* Callback invoked for each tuple that gets flushed to disk from buffer */
+typedef void (*TableModifyBufferFlushCallback) (void *context,
+												TupleTableSlot *slot);
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCallback) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mem_cxt;
+	CommandId	cid;
+	int			options;
+
+	/* Flush callback and its context */
+	TableModifyBufferFlushCallback modify_buffer_flush_callback;
+	void	   *modify_buffer_flush_context;
+
+	/* Table AM specific data */
+	void	   *data;
+
+	TableModifyEndCallback modify_end_callback;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -578,6 +620,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCallback modify_buffer_flush_callback,
+											 void *modify_buffer_flush_context);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1609,6 +1666,100 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+extern TableModifyState *default_table_modify_begin(Relation rel, int modify_flags,
+													CommandId cid, int options,
+													TableModifyBufferFlushCallback modify_buffer_flush_callback,
+													void *modify_buffer_flush_context);
+extern void default_table_modify_buffer_insert(TableModifyState *state,
+											   TupleTableSlot *slot);
+extern void default_table_modify_buffer_flush(TableModifyState *state);
+extern void default_table_modify_end(TableModifyState *state);
+
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options,
+				   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				   void *modify_buffer_flush_context)
+{
+	if (rel->rd_tableam &&
+		rel->rd_tableam->tuple_modify_begin != NULL)
+	{
+		return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+												   cid, options,
+												   modify_buffer_flush_callback,
+												   modify_buffer_flush_context);
+	}
+	else if (rel->rd_tableam &&
+			 rel->rd_tableam->tuple_modify_begin == NULL)
+	{
+		/* Fallback to a default implementation */
+		return default_table_modify_begin(rel, modify_flags,
+										  cid, options,
+										  modify_buffer_flush_callback,
+										  modify_buffer_flush_context);
+	}
+	else
+		Assert(false);
+
+	return NULL;
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_insert != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_insert == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_insert(state, slot);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_buffer_flush != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_buffer_flush == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_buffer_flush(state);
+	}
+	else
+		Assert(false);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_modify_end != NULL)
+	{
+		state->rel->rd_tableam->tuple_modify_end(state);
+	}
+	else if (state->rel->rd_tableam &&
+			 state->rel->rd_tableam->tuple_modify_end == NULL)
+	{
+		/* Fallback to a default implementation */
+		default_table_modify_end(state);
+	}
+	else
+		Assert(false);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index b82655e7e5..6940921078 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -247,6 +247,12 @@ typedef struct VirtualTupleTableSlot
 
 	TupleTableSlot base;
 
+	/*
+	 * Total size of all attributes that this virtual slot holds. Computed and
+	 * set during slot materialization.
+	 */
+	Size		sz;
+
 	char	   *data;			/* data for materialized slots */
 } VirtualTupleTableSlot;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2b83c340fb..6bfec4476b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1140,6 +1140,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2860,6 +2862,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/octet-stream] v21-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch (7.0K, 3-v21-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch)
  download | inline diff:
From bc554ec451461cc2dfbdefff04b171d1b2ce2fdc Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 15 May 2024 05:42:58 +0000
Subject: [PATCH v21 2/5] Optimize CTAS, CMV, RMV and TABLE REWRITES with multi
 inserts

---
 src/backend/commands/createas.c  | 27 +++++++++++----------------
 src/backend/commands/matview.c   | 26 +++++++++++---------------
 src/backend/commands/tablecmds.c | 31 +++++++++++--------------------
 3 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..2d6fffbf07 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,21 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_MULTI_INSERTS |
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 NULL,
+											 NULL);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +592,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +610,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..bb97e2fa5f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,14 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_MULTI_INSERTS |
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN,
+										 NULL,
+										 NULL);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +488,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +503,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 79c9c03183..bf6449e957 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5973,10 +5973,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableModifyState *mstate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -5995,18 +5993,15 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 * Prepare a BulkInsertState and options for table_tuple_insert.  The FSM
 	 * is empty, so don't bother using it.
 	 */
-	if (newrel)
+	if (newrel && mstate == NULL)
 	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
-	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		mstate = table_modify_begin(newrel,
+									TM_FLAG_MULTI_INSERTS |
+									TM_FLAG_BAS_BULKWRITE,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									NULL,
+									NULL);
 	}
 
 	/*
@@ -6304,8 +6299,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_modify_buffer_insert(mstate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -6326,10 +6320,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
-
-		table_finish_bulk_insert(newrel, ti_options);
-
+		table_modify_end(mstate);
 		table_close(newrel, NoLock);
 	}
 }
-- 
2.34.1



  [application/octet-stream] v21-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch (9.2K, 4-v21-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch)
  download | inline diff:
From 80196eed5a352770f7a1893ffa22d9801569bf16 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 15 May 2024 05:43:25 +0000
Subject: [PATCH v21 3/5] Optimize INSERT INTO ... SELECT with multi inserts

---
 src/backend/executor/nodeModifyTable.c | 170 ++++++++++++++++++++++---
 src/tools/pgindent/typedefs.list       |   1 +
 2 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3659..cd044c9dee 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -114,6 +114,18 @@ typedef struct UpdateContext
 	LockTupleMode lockmode;
 } UpdateContext;
 
+typedef struct InsertModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+	ModifyTableState *mtstate;
+} InsertModifyBufferFlushContext;
+
+static InsertModifyBufferFlushContext *insert_modify_buffer_flush_context = NULL;
+static TableModifyState *table_modify_state = NULL;
+
+static void InsertModifyBufferFlushCallback(void *context,
+											TupleTableSlot *slot);
 
 static void ExecBatchInsert(ModifyTableState *mtstate,
 							ResultRelInfo *resultRelInfo,
@@ -726,6 +738,55 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
 	return ExecProject(newProj);
 }
 
+static void
+InsertModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	InsertModifyBufferFlushContext *ctx = (InsertModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	ModifyTableState *mtstate = ctx->mtstate;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 mtstate->mt_transition_capture);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInsert
  *
@@ -751,7 +812,8 @@ ExecInsert(ModifyTableContext *context,
 		   TupleTableSlot *slot,
 		   bool canSetTag,
 		   TupleTableSlot **inserted_tuple,
-		   ResultRelInfo **insert_destrel)
+		   ResultRelInfo **insert_destrel,
+		   bool canMultiInsert)
 {
 	ModifyTableState *mtstate = context->mtstate;
 	EState	   *estate = context->estate;
@@ -764,6 +826,7 @@ ExecInsert(ModifyTableContext *context,
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	MemoryContext oldContext;
+	bool		ar_insert_triggers_executed = false;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -1126,17 +1189,53 @@ ExecInsert(ModifyTableContext *context,
 		}
 		else
 		{
-			/* insert the tuple normally */
-			table_tuple_insert(resultRelationDesc, slot,
-							   estate->es_output_cid,
-							   0, NULL);
+			if (canMultiInsert &&
+				proute == NULL &&
+				resultRelInfo->ri_WithCheckOptions == NIL &&
+				resultRelInfo->ri_projectReturning == NULL)
+			{
+				if (insert_modify_buffer_flush_context == NULL)
+				{
+					insert_modify_buffer_flush_context =
+						(InsertModifyBufferFlushContext *) palloc0(sizeof(InsertModifyBufferFlushContext));
+					insert_modify_buffer_flush_context->resultRelInfo = resultRelInfo;
+					insert_modify_buffer_flush_context->estate = estate;
+					insert_modify_buffer_flush_context->mtstate = mtstate;
+				}
 
-			/* insert index entries for tuple */
-			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-													   slot, estate, false,
-													   false, NULL, NIL,
-													   false);
+				if (table_modify_state == NULL)
+				{
+					table_modify_state = table_modify_begin(resultRelInfo->ri_RelationDesc,
+															TM_FLAG_MULTI_INSERTS,
+															estate->es_output_cid,
+															0,
+															InsertModifyBufferFlushCallback,
+															insert_modify_buffer_flush_context);
+				}
+
+				table_modify_buffer_insert(table_modify_state, slot);
+				ar_insert_triggers_executed = true;
+			}
+			else
+			{
+				/* insert the tuple normally */
+				table_tuple_insert(resultRelationDesc, slot,
+								   estate->es_output_cid,
+								   0, NULL);
+
+				/* insert index entries for tuple */
+				if (resultRelInfo->ri_NumIndices > 0)
+					recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+														   slot, estate, false,
+														   false, NULL, NIL,
+														   false);
+
+				ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+									 mtstate->mt_transition_capture);
+
+				list_free(recheckIndexes);
+				ar_insert_triggers_executed = true;
+			}
 		}
 	}
 
@@ -1170,10 +1269,12 @@ ExecInsert(ModifyTableContext *context,
 	}
 
 	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
-
-	list_free(recheckIndexes);
+	if (!ar_insert_triggers_executed)
+	{
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 ar_insert_trig_tcs);
+		list_free(recheckIndexes);
+	}
 
 	/*
 	 * Check any WITH CHECK OPTION constraints from parent views.  We are
@@ -1869,7 +1970,7 @@ ExecCrossPartitionUpdate(ModifyTableContext *context,
 	/* Tuple routing starts from the root table. */
 	context->cpUpdateReturningSlot =
 		ExecInsert(context, mtstate->rootResultRelInfo, slot, canSetTag,
-				   inserted_tuple, insert_destrel);
+				   inserted_tuple, insert_destrel, false);
 
 	/*
 	 * Reset the transition state that may possibly have been written by
@@ -3364,7 +3465,7 @@ ExecMergeNotMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 				mtstate->mt_merge_action = action;
 
 				rslot = ExecInsert(context, mtstate->rootResultRelInfo,
-								   newslot, canSetTag, NULL, NULL);
+								   newslot, canSetTag, NULL, NULL, false);
 				mtstate->mt_merge_inserted += 1;
 				break;
 			case CMD_NOTHING:
@@ -3749,6 +3850,10 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		canMultiInsert = false;
+
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -3844,6 +3949,10 @@ ExecModifyTable(PlanState *pstate)
 		if (TupIsNull(context.planSlot))
 			break;
 
+		if (operation == CMD_INSERT &&
+			nodeTag(subplanstate) == T_SeqScanState)
+			canMultiInsert = true;
+
 		/*
 		 * When there are multiple result relations, each tuple contains a
 		 * junk column that gives the OID of the rel from which it came.
@@ -4057,7 +4166,7 @@ ExecModifyTable(PlanState *pstate)
 					ExecInitInsertProjection(node, resultRelInfo);
 				slot = ExecGetInsertNewTuple(resultRelInfo, context.planSlot);
 				slot = ExecInsert(&context, resultRelInfo, slot,
-								  node->canSetTag, NULL, NULL);
+								  node->canSetTag, NULL, NULL, canMultiInsert);
 				break;
 
 			case CMD_UPDATE:
@@ -4116,6 +4225,17 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	if (table_modify_state != NULL)
+	{
+		Assert(operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Insert remaining tuples for batch insert.
 	 */
@@ -4228,6 +4348,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_merge_updated = 0;
 	mtstate->mt_merge_deleted = 0;
 
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
+
 	/*----------
 	 * Resolve the target relation. This is the same as:
 	 *
@@ -4681,6 +4804,17 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	if (table_modify_state != NULL)
+	{
+		Assert(node->operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6bfec4476b..5e3e900cb8 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1179,6 +1179,7 @@ ImportForeignSchema_function
 ImportQual
 InProgressEnt
 InProgressIO
+InsertModifyBufferFlushContext
 IncludeWal
 InclusionOpaque
 IncrementVarSublevelsUp_context
-- 
2.34.1



  [application/octet-stream] v21-0004-Optimize-Logical-Replication-apply-with-multi-in.patch (19.7K, 5-v21-0004-Optimize-Logical-Replication-apply-with-multi-in.patch)
  download | inline diff:
From 3ebe28045217778bb145ff12c91e593b940e4818 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 15 May 2024 05:43:48 +0000
Subject: [PATCH v21 4/5] Optimize Logical Replication apply with multi inserts

---
 src/backend/executor/execReplication.c   |  39 +++
 src/backend/replication/logical/proto.c  |  24 ++
 src/backend/replication/logical/worker.c | 351 ++++++++++++++++++++++-
 src/include/executor/executor.h          |   4 +
 src/include/replication/logicalproto.h   |   2 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 409 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd577..fae1375537 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -544,6 +544,45 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 	}
 }
 
+void
+ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+						ResultRelInfo *resultRelInfo,
+						EState *estate, TupleTableSlot *slot)
+{
+	bool		skip_tuple = false;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+
+	/* For now we support only tables. */
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+
+	CheckCmdReplicaIdentity(rel, CMD_INSERT);
+
+	/* BEFORE ROW INSERT Triggers */
+	if (resultRelInfo->ri_TrigDesc &&
+		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
+	{
+		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			skip_tuple = true;	/* "do nothing" */
+	}
+
+	if (!skip_tuple)
+	{
+		/* Compute stored generated columns */
+		if (rel->rd_att->constr &&
+			rel->rd_att->constr->has_generated_stored)
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
+
+		/* Check the constraints of the tuple */
+		if (rel->rd_att->constr)
+			ExecConstraints(resultRelInfo, slot, estate);
+		if (rel->rd_rel->relispartition)
+			ExecPartitionCheck(resultRelInfo, slot, estate, true);
+
+		table_modify_buffer_insert(MultiInsertState, slot);
+	}
+}
+
 /*
  * Find the searchslot tuple and update it with data in the slot,
  * update the indexes, and execute any constraints and per-row triggers.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 95c09c9516..46d38aebd2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -427,6 +427,30 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	logicalrep_write_tuple(out, rel, newslot, binary, columns);
 }
 
+LogicalRepRelId
+logicalrep_read_relid(StringInfo in)
+{
+	LogicalRepRelId relid;
+
+	/* read the relation id */
+	relid = pq_getmsgint(in, 4);
+
+	return relid;
+}
+
+void
+logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup)
+{
+	char		action;
+
+	action = pq_getmsgbyte(in);
+	if (action != 'N')
+		elog(ERROR, "expected new tuple but got %d",
+			 action);
+
+	logicalrep_read_tuple(in, newtup);
+}
+
 /*
  * Read INSERT from stream.
  *
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b5a80fe3e8..d62772f590 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -148,7 +148,6 @@
 #include <unistd.h>
 
 #include "access/table.h"
-#include "access/tableam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "catalog/indexing.h"
@@ -416,6 +415,30 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,
 													 ParallelApplyWorkerInfo **winfo);
 
+typedef enum LRMultiInsertReturnStatus
+{
+	LR_MULTI_INSERT_NONE = 0,
+	LR_MULTI_INSERT_REL_SKIPPED,
+	LR_MULTI_INSERT_DISALLOWED,
+	LR_MULTI_INSERT_DONE,
+} LRMultiInsertReturnStatus;
+
+static TableModifyState *MultiInsertState = NULL;
+static LogicalRepRelMapEntry *LastRel = NULL;
+static LogicalRepRelId LastMultiInsertRelId = InvalidOid;
+static ApplyExecutionData *LastEData = NULL;
+static TupleTableSlot *LastRemoteSlot = NULL;
+
+typedef struct LRModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} LRModifyBufferFlushContext;
+
+static LRModifyBufferFlushContext *modify_buffer_flush_context = NULL;
+static void LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
+static void FinishMultiInserts(void);
+
 /*
  * Form the origin name for the subscription.
  *
@@ -1017,6 +1040,8 @@ apply_handle_commit(StringInfo s)
 {
 	LogicalRepCommitData commit_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit(s, &commit_data);
 
 	if (commit_data.commit_lsn != remote_final_lsn)
@@ -1043,6 +1068,8 @@ apply_handle_begin_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData begin_data;
 
+	FinishMultiInserts();
+
 	/* Tablesync should never receive prepare. */
 	if (am_tablesync_worker())
 		ereport(ERROR,
@@ -1109,6 +1136,8 @@ apply_handle_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData prepare_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_prepare(s, &prepare_data);
 
 	if (prepare_data.prepare_lsn != remote_final_lsn)
@@ -1171,6 +1200,8 @@ apply_handle_commit_prepared(StringInfo s)
 	LogicalRepCommitPreparedTxnData prepare_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit_prepared(s, &prepare_data);
 	set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
 
@@ -1220,6 +1251,8 @@ apply_handle_rollback_prepared(StringInfo s)
 	LogicalRepRollbackPreparedTxnData rollback_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_rollback_prepared(s, &rollback_data);
 	set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
 
@@ -1277,6 +1310,8 @@ apply_handle_stream_prepare(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1304,6 +1339,8 @@ apply_handle_stream_prepare(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
 								   prepare_data.xid, prepare_data.prepare_lsn);
 
+			FinishMultiInserts();
+
 			/* Mark the transaction as prepared. */
 			apply_handle_prepare_internal(&prepare_data);
 
@@ -1407,6 +1444,8 @@ apply_handle_stream_prepare(StringInfo s)
 static void
 apply_handle_origin(StringInfo s)
 {
+	FinishMultiInserts();
+
 	/*
 	 * ORIGIN message can only come inside streaming transaction or inside
 	 * remote transaction and before any actual writes.
@@ -1473,6 +1512,8 @@ apply_handle_stream_start(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1628,6 +1669,8 @@ apply_handle_stream_stop(StringInfo s)
 	ParallelApplyWorkerInfo *winfo;
 	TransApplyAction apply_action;
 
+	FinishMultiInserts();
+
 	if (!in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1821,6 +1864,8 @@ apply_handle_stream_abort(StringInfo s)
 	StringInfoData original_msg = *s;
 	bool		toplevel_xact;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2138,6 +2183,8 @@ apply_handle_stream_commit(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2159,6 +2206,8 @@ apply_handle_stream_commit(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
 								   commit_data.commit_lsn);
 
+			FinishMultiInserts();
+
 			apply_handle_commit_internal(&commit_data);
 
 			/* Unlink the files with serialized changes and subxact info. */
@@ -2302,6 +2351,8 @@ apply_handle_relation(StringInfo s)
 {
 	LogicalRepRelation *rel;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
 		return;
 
@@ -2325,6 +2376,8 @@ apply_handle_type(StringInfo s)
 {
 	LogicalRepTyp typ;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
 		return;
 
@@ -2363,16 +2416,126 @@ TargetPrivilegesCheck(Relation rel, AclMode mode)
 						RelationGetRelationName(rel))));
 }
 
-/*
- * Handle INSERT message.
- */
+static void
+FinishMultiInserts(void)
+{
+	LogicalRepMsgType saved_command;
+
+	if (MultiInsertState == NULL)
+		return;
+
+	Assert(OidIsValid(LastMultiInsertRelId));
+	Assert(LastEData != NULL);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	ExecDropSingleTupleTableSlot(LastRemoteSlot);
+	LastRemoteSlot = NULL;
+
+	table_modify_end(MultiInsertState);
+	MultiInsertState = NULL;
+	LastMultiInsertRelId = InvalidOid;
+
+	pfree(modify_buffer_flush_context);
+	modify_buffer_flush_context = NULL;
+
+	ExecCloseIndices(LastEData->targetRelInfo);
+
+	finish_edata(LastEData);
+	LastEData = NULL;
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+
+	logicalrep_rel_close(LastRel, NoLock);
+	LastRel = NULL;
+
+	end_replication_step();
+}
 
 static void
-apply_handle_insert(StringInfo s)
+LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	LRModifyBufferFlushContext *ctx = (LRModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	LogicalRepMsgType saved_command;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 NULL);
+
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 NULL);
+	}
+
+	/*
+	 * XXX we should in theory pass a TransitionCaptureState object to the
+	 * above to capture transition tuples, but after statement triggers don't
+	 * actually get fired by replication yet anyway
+	 */
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+}
+
+static LRMultiInsertReturnStatus
+do_multi_inserts(StringInfo s, LogicalRepRelId *relid)
 {
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
-	LogicalRepRelId relid;
 	UserContext ucxt;
 	ApplyExecutionData *edata;
 	EState	   *estate;
@@ -2380,17 +2543,143 @@ apply_handle_insert(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	if (MultiInsertState == NULL)
+		begin_replication_step();
+
+	*relid = logicalrep_read_relid(s);
+
+	if (MultiInsertState != NULL &&
+		(LastMultiInsertRelId != InvalidOid &&
+		 *relid != InvalidOid &&
+		 LastMultiInsertRelId != *relid))
+		FinishMultiInserts();
+
+	if (MultiInsertState == NULL)
+		rel = logicalrep_rel_open(*relid, RowExclusiveLock);
+	else
+		rel = LastRel;
+
+	if (!should_apply_changes_for_rel(rel))
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_REL_SKIPPED;
+	}
+
+	/* For a partitioned table, let's not do multi inserts. */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_DISALLOWED;
+	}
+
 	/*
-	 * Quick return if we are skipping data modification changes or handling
-	 * streamed transactions.
+	 * Make sure that any user-supplied code runs as the table owner, unless
+	 * the user has opted out of that behavior.
 	 */
-	if (is_skipping_changes() ||
-		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
-		return;
+	run_as_owner = MySubscription->runasowner;
+	if (!run_as_owner)
+		SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = rel;
+
+	if (MultiInsertState == NULL)
+	{
+		oldctx = MemoryContextSwitchTo(TopTransactionContext);
+
+		/* Initialize the executor state. */
+		LastEData = edata = create_edata_for_relation(rel);
+		estate = edata->estate;
+
+		LastRemoteSlot = remoteslot = MakeTupleTableSlot(RelationGetDescr(rel->localrel),
+														 &TTSOpsVirtual);
+
+		modify_buffer_flush_context = (LRModifyBufferFlushContext *) palloc(sizeof(LRModifyBufferFlushContext));
+		modify_buffer_flush_context->resultRelInfo = edata->targetRelInfo;
+		modify_buffer_flush_context->estate = estate;
+
+		MultiInsertState = table_modify_begin(edata->targetRelInfo->ri_RelationDesc,
+											  TM_FLAG_MULTI_INSERTS |
+											  TM_FLAG_BAS_BULKWRITE,
+											  GetCurrentCommandId(true),
+											  0,
+											  LRModifyBufferFlushCallback,
+											  modify_buffer_flush_context);
+		LastRel = rel;
+		LastMultiInsertRelId = *relid;
+
+		/* We must open indexes here. */
+		ExecOpenIndices(edata->targetRelInfo, false);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+	else
+	{
+		CommandId	cid;
+
+		edata = LastEData;
+		estate = edata->estate;
+		ResetExprContext(GetPerTupleExprContext(estate));
+		ExecClearTuple(LastRemoteSlot);
+		remoteslot = LastRemoteSlot;
+		cid = GetCurrentCommandId(true);
+		MultiInsertState->cid = cid;
+		estate->es_output_cid = cid;
+	}
+
+	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
+	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+	slot_store_data(remoteslot, rel, &newtup);
+	slot_fill_defaults(rel, estate, remoteslot);
+	MemoryContextSwitchTo(oldctx);
+
+	TargetPrivilegesCheck(edata->targetRelInfo->ri_RelationDesc, ACL_INSERT);
+	ExecRelationMultiInsert(MultiInsertState, edata->targetRelInfo, estate, remoteslot);
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	if (!run_as_owner)
+		RestoreUserContext(&ucxt);
+
+	Assert(MultiInsertState != NULL);
+
+	CommandCounterIncrement();
+
+	return LR_MULTI_INSERT_DONE;
+}
+
+static bool
+do_single_inserts(StringInfo s, LogicalRepRelId relid)
+{
+	LogicalRepRelMapEntry *rel;
+	LogicalRepTupleData newtup;
+	UserContext ucxt;
+	ApplyExecutionData *edata;
+	EState	   *estate;
+	TupleTableSlot *remoteslot;
+	MemoryContext oldctx;
+	bool		run_as_owner;
+
+	Assert(relid != InvalidOid);
 
 	begin_replication_step();
 
-	relid = logicalrep_read_insert(s, &newtup);
 	rel = logicalrep_rel_open(relid, RowExclusiveLock);
 	if (!should_apply_changes_for_rel(rel))
 	{
@@ -2400,7 +2689,7 @@ apply_handle_insert(StringInfo s)
 		 */
 		logicalrep_rel_close(rel, RowExclusiveLock);
 		end_replication_step();
-		return;
+		return false;
 	}
 
 	/*
@@ -2422,6 +2711,7 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 
 	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	slot_store_data(remoteslot, rel, &newtup);
 	slot_fill_defaults(rel, estate, remoteslot);
@@ -2446,6 +2736,35 @@ apply_handle_insert(StringInfo s)
 	logicalrep_rel_close(rel, NoLock);
 
 	end_replication_step();
+
+	return true;
+}
+
+/*
+ * Handle INSERT message.
+ */
+static void
+apply_handle_insert(StringInfo s)
+{
+	LRMultiInsertReturnStatus mi_status;
+	LogicalRepRelId relid;
+
+	/*
+	 * Quick return if we are skipping data modification changes or handling
+	 * streamed transactions.
+	 */
+	if (is_skipping_changes() ||
+		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+		return;
+
+	mi_status = do_multi_inserts(s, &relid);
+	if (mi_status == LR_MULTI_INSERT_REL_SKIPPED ||
+		mi_status == LR_MULTI_INSERT_DONE)
+		return;
+
+	do_single_inserts(s, relid);
+
+	return;
 }
 
 /*
@@ -2532,6 +2851,8 @@ apply_handle_update(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -2713,6 +3034,8 @@ apply_handle_delete(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -3154,6 +3477,8 @@ apply_handle_truncate(StringInfo s)
 	ListCell   *lc;
 	LOCKMODE	lockmode = AccessExclusiveLock;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..8f10ea977b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -14,6 +14,7 @@
 #ifndef EXECUTOR_H
 #define EXECUTOR_H
 
+#include "access/tableam.h"
 #include "executor/execdesc.h"
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
@@ -656,6 +657,9 @@ extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 									 EState *estate, TupleTableSlot *slot);
+extern void ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+									ResultRelInfo *resultRelInfo,
+									EState *estate, TupleTableSlot *slot);
 extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index c409638a2e..3f3a7f0a31 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -226,6 +226,8 @@ extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
 									bool binary, Bitmapset *columns);
+extern LogicalRepRelId logicalrep_read_relid(StringInfo in);
+extern void logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5e3e900cb8..8463343325 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1468,6 +1468,8 @@ LPTHREAD_START_ROUTINE
 LPTSTR
 LPVOID
 LPWSTR
+LRModifyBufferFlushContext
+LRMultiInsertReturnStatus
 LSEG
 LUID
 LVRelState
-- 
2.34.1



  [application/octet-stream] v21-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch (13.7K, 6-v21-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch)
  download | inline diff:
From 568a0361b5b25c5a4012f21ec2eacff0e49e95a5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 15 May 2024 05:56:33 +0000
Subject: [PATCH v21 5/5] Use new multi insert Table AM for COPY FROM

---
 src/backend/commands/copyfrom.c          | 236 +++++++++++++++--------
 src/include/commands/copyfrom_internal.h |   4 +-
 src/tools/pgindent/typedefs.list         |   1 +
 3 files changed, 160 insertions(+), 81 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..bf56dd23f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -71,14 +71,25 @@
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+typedef struct CopyModifyBufferFlushContext
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} CopyModifyBufferFlushContext;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *multislot;
+	CopyModifyBufferFlushContext *modify_buffer_flush_context;
 	int			nused;			/* number of 'slots' containing tuples */
+	int			currslotno;		/* Current buffered slot number that's being
+								 * flushed; Used to get correct cur_lineno for
+								 * errors while in flush callback. */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
 } CopyMultiInsertBuffer;
@@ -99,6 +110,7 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
+static void CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
 
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
@@ -218,14 +230,39 @@ CopyLimitPrintoutLength(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->modify_buffer_flush_context = (CopyModifyBufferFlushContext *) palloc(sizeof(CopyModifyBufferFlushContext));
+		buffer->modify_buffer_flush_context->cstate = cstate;
+		buffer->modify_buffer_flush_context->resultRelInfo = rri;
+		buffer->modify_buffer_flush_context->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											TM_FLAG_MULTI_INSERTS |
+											TM_FLAG_BAS_BULKWRITE |
+											TM_SKIP_INTERNAL_BUFFER_FLUSH,
+											miinfo->mycid,
+											miinfo->ti_options,
+											CopyModifyBufferFlushCallback,
+											buffer->modify_buffer_flush_context);
+		buffer->slots = NULL;
+		buffer->multislot = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->multislot = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -236,11 +273,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -273,7 +311,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -317,8 +355,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -390,13 +426,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -404,56 +435,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
+		table_modify_buffer_flush(buffer->mstate);
 
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-			ExecClearTuple(slots[i]);
-		}
+		/*
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
+		 */
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -469,6 +462,60 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	buffer->nused = 0;
 }
 
+static void
+CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	CopyModifyBufferFlushContext *ctx = (CopyModifyBufferFlushContext *) context;
+	CopyFromState cstate = ctx->cstate;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 cstate->transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 cstate->transition_capture);
+	}
+
+	Assert(buffer->currslotno <= buffer->nused);
+}
+
 /*
  * Drop used slots and free member for this buffer.
  *
@@ -489,19 +536,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->multislot);
+		pfree(buffer->modify_buffer_flush_context);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -588,13 +634,32 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->multislot == NULL)
+			buffer->multislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												   &TTSOpsVirtual);
+
+		/* Caller must clear the slot */
+		slot = buffer->multislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -608,7 +673,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -616,6 +685,14 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	/* Record this slot as being used */
 	buffer->nused++;
 
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->multislot);
+		buffer->currslotno = 0;
+
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+
 	/* Update how many tuples are stored and their size */
 	miinfo->bufferedTuples++;
 	miinfo->bufferedBytes += tuplen;
@@ -830,7 +907,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -1080,7 +1157,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8463343325..745019153d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -493,6 +493,7 @@ CopyHeaderChoice
 CopyInsertMethod
 CopyLogVerbosityChoice
 CopyMethod
+CopyModifyBufferFlushContext
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-05-15 23:31                         ` Jeff Davis <[email protected]>
  2024-05-16 19:00                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 30+ messages in thread

From: Jeff Davis @ 2024-05-15 23:31 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, 2024-05-15 at 12:56 +0530, Bharath Rupireddy wrote:
> Because of this, the
> buffers get flushed sooner than that of the existing COPY with
> table_multi_insert AM causing regression in pgbench which uses COPY
> extensively.

The flushing behavior is entirely controlled by the table AM. The heap
can use the same flushing logic that it did before, which is to hold
1000 tuples.

I like that it's accounting for memory, too, but it doesn't need to be
overly restrictive. Why not just use work_mem? That should hold 1000
reasonably-sized tuples, plus overhead.

Even better would be if we could take into account partitioning. That
might be out of scope for your current work, but it would be very
useful. We could have a couple new GUCs like modify_table_buffer and
modify_table_buffer_per_partition or something like that.

> 1. Try to get the actual tuple sizes excluding header sizes for each
> column in the new TAM.

I don't see the point in arbitrarily excluding the header.

> v21 also adds code to maintain tuple size for virtual tuple slots.
> This helps make better memory-based flushing decisions in the new
> TAM.

That seems wrong. We shouldn't need to change the TupleTableSlot
structure for this patch.


Comments on v21:

* All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

* The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
ExecInsert(). What's the disadvantage to using a bulk insert state
there?

* I'm a bit confused by TableModifyState->modify_end_callback. The AM
both sets the callback and calls the callback -- why can't the code
just go into the table_modify_end method?

* The code structure in table_modify_begin() (and related) is strange.
Can it be simplified or am I missing something?

* Why are table_modify_state and insert_modify_buffer_flush_context
globals? What if there are multiple modify nodes in a plan?

* Can you explain the design in logical rep?

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-05-16 19:00                           ` Jeff Davis <[email protected]>
  1 sibling, 0 replies; 30+ messages in thread

From: Jeff Davis @ 2024-05-16 19:00 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, 2024-05-15 at 16:31 -0700, Jeff Davis wrote:
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

To expand on this point:

For heap, the insert buffer is only 1000 tuples, which doesn't take
much memory. But for an AM that does any significant reorganization of
the input data, the buffer may be much larger. For insert into a
partitioned table, that buffer could be multiplied across many
partitions, and start to be a real concern.

We might not need table AM API changes at all here beyond what v21
offers. The ModifyTableState includes the memory context, so that gives
the caller a way to know the memory consumption of a single partition's
buffer. And if it needs to free the resources, it can just call
modify_table_end(), and then _begin() again if more tuples hit that
partition.

So I believe what I'm asking for here is entirely orthogonal to the
current proposal.

However, it got me thinking that we might not want to use work_mem for
controlling the heap's buffer size. Each AM is going to have radically
different memory needs, and may have its own (extension) GUCs to
control that memory usage, so they won't honor work_mem. We could
either have a separate GUC for the heap if it makes sense, or we could
just hard-code a reasonable value.

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-06-05 07:12                           ` Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-06-05 07:12 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

Hi,

On Thu, May 16, 2024 at 5:01 AM Jeff Davis <[email protected]> wrote:
>
> The flushing behavior is entirely controlled by the table AM. The heap
> can use the same flushing logic that it did before, which is to hold
> 1000 tuples.
>
> I like that it's accounting for memory, too, but it doesn't need to be
> overly restrictive. Why not just use work_mem? That should hold 1000
> reasonably-sized tuples, plus overhead.
>
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

I disagree with inventing more GUCs. Instead, I'd vote for just
holding 1000 tuples in buffers for heap AM. This not only keeps the
code and new table AM simple, but also does not cause regression for
COPY. In my testing, 1000 tuples with 1 int and 1 float columns took
40000 bytes of memory (40 bytes each tuple), whereas with 1 int, 1
float and 1 text columns took 172000 bytes of memory (172 bytes each
tuple) bytes which IMO mustn't be a big problem. Thoughts?

> > 1. Try to get the actual tuple sizes excluding header sizes for each
> > column in the new TAM.
>
> I don't see the point in arbitrarily excluding the header.
>
> > v21 also adds code to maintain tuple size for virtual tuple slots.
> > This helps make better memory-based flushing decisions in the new
> > TAM.
>
> That seems wrong. We shouldn't need to change the TupleTableSlot
> structure for this patch.

I dropped these ideas as I went ahead with the above idea of just
holding 1000 tuples in buffers for heap AM.

> Comments on v21:
>
> * All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

Previously, the multi insert state was initialized in modify_begin, so
it was then required to differentiate the code path. But, it's not
needed anymore with the lazy initialization of the multi insert state
moved to modify_buffer_insert. I removed it.

> * The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
> ExecInsert(). What's the disadvantage to using a bulk insert state
> there?

The subsequent read queries will not find the just-now-inserted tuples
in shared buffers as a separate ring buffer is used with bulk insert
access strategy. The multi inserts is nothing but buffering multiple
tuples plus inserting in bulk. So using the bulk insert strategy might
be worth it for INSERT INTO SELECTs too. Thoughts?

> * I'm a bit confused by TableModifyState->modify_end_callback. The AM
> both sets the callback and calls the callback -- why can't the code
> just go into the table_modify_end method?

I came up with modify_end_callback as per the discussion upthread to
use modify_begin, modify_end in future for UPDATE, DELETE and MERGE,
and not use any operation specific flags to clean the state
appropriately. The operation specific state cleaning logic can go to
the modify_end_callback implementation defined by the AM.

> * The code structure in table_modify_begin() (and related) is strange.
> Can it be simplified or am I missing something?

I previously defined these new table AMs as optional, check
GetTableAmRoutine(). And, there was a point upthread to provide
default/fallback implementation to help not fail insert operations on
tables without the new table AMs implemented. FWIW, the default
implementation was just doing the single inserts. The
table_modify_begin and friends need the logic to fallback making the
code there look different than other AMs. However, I now have a
feeling to drop the idea of having fallback implementation and let the
AMs deal with it. Although it might create some friction with various
non-core AM implementations, it keeps this patch simple which I would
vote for. Thoughts?

> * Why are table_modify_state and insert_modify_buffer_flush_context
> globals? What if there are multiple modify nodes in a plan?

Can you please provide the case that can generate multiple "modify
nodes" in a single plan? AFAICS, multiple "modify nodes" in a plan can
exist for both partitioned tables and tables that get created as part
of CTEs. I disabled multi inserts for both of these cases. The way I
disabled for CTEs looks pretty naive - I just did the following. Any
better suggestions here to deal with all such cases?

+        if (operation == CMD_INSERT &&
+            nodeTag(subplanstate) == T_SeqScanState)
+            canMultiInsert = true;

> * Can you explain the design in logical rep?

Multi inserts for logical replication work at the table level. In
other words, all tuple inserts related to a single table within a
transaction are buffered and written to the corresponding table when
necessary. Whenever inserts pertaining to another table arrive, the
buffered tuples related to the previous table are written to the table
before starting the buffering for the new table. Also, the tuples are
written to the table from the buffer when there arrives a non-INSERT
operation, for example, UPDATE/DELETE/TRUNCATE/COMMIT etc. FWIW,
pglogical has the similar multi inserts logic -
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L879.

Please find the v22 patches with the above changes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v22-0001-Introduce-new-Table-AM-for-multi-inserts.patch (15.2K, 2-v22-0001-Introduce-new-Table-AM-for-multi-inserts.patch)
  download | inline diff:
From ee6714e79bdcfe3d0e104f84caee8bd4be730ddb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 4 Jun 2024 20:02:52 +0000
Subject: [PATCH v22 1/5] Introduce new Table AM for multi inserts

Until now, it's the COPY ... FROM command using multi inserts
(i.e. buffer some tuples and inserts them to table at once).
Various other commands can benefit from this multi insert
logic [Reusable].

Also, there's a need to have these multi insert AMs
(Access Methods) as scan-like API [Usability].

Also, there's a need allow various table AMs define their own
buffering and flushing strategy [Flexibility].

This commit introduces, new table AMs for multi inserts to help
achieve all of the above.

Upcoming commits will have these new table AMs being used for
various other commands.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/[email protected]
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/access/heap/heapam.c         | 197 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableamapi.c    |   5 +
 src/include/access/heapam.h              |  38 +++++
 src/include/access/tableam.h             |  80 +++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 6 files changed, 328 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 82bb9cb33b..aaf7a26389 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -112,7 +113,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end_callback(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2611,6 +2612,200 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags,
+				  CommandId cid, int options,
+				  TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				  void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+		mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+		istate->mistate = mistate;
+		mistate->mem_cxt = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_callback = heap_modify_insert_end_callback;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	Assert(TTS_IS_VIRTUAL(dstslot));
+
+	/*
+	 * Note that the copy clears the previous destination slot contents, so
+	 * there's no need of explicit ExecClearTuple here.
+	 */
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	/* Quick exit if we have flushed already */
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert may leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_cxt);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate);
+
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_cxt);
+
+	if (state->modify_buffer_flush_callback != NULL)
+	{
+		for (int i = 0; i < mistate->cur_slots; i++)
+			state->modify_buffer_flush_callback(state->modify_buffer_flush_context,
+												mistate->slots[i]);
+	}
+
+	mistate->cur_slots = 0;
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end_callback(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_cxt);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 6f8b1b7929..eda0c73a16 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2615,6 +2615,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index e9b598256f..772f29b1b5 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,11 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	Assert(routine->tuple_modify_begin != NULL);
+	Assert(routine->tuple_modify_buffer_insert != NULL);
+	Assert(routine->tuple_modify_buffer_flush != NULL);
+	Assert(routine->tuple_modify_end != NULL);
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c47a5045ce..36ea3d5d2c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -271,6 +271,32 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	MemoryContext mem_cxt;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -321,6 +347,18 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+										   void *modify_buffer_flush_context);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..9e9b9771de 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -255,6 +255,39 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000001
+
+struct TableModifyState;
+
+/* Callback invoked for each tuple that gets flushed to disk from buffer */
+typedef void (*TableModifyBufferFlushCallback) (void *context,
+												TupleTableSlot *slot);
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCallback) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mem_cxt;
+	CommandId	cid;
+	int			options;
+
+	/* Flush callback and its context */
+	TableModifyBufferFlushCallback modify_buffer_flush_callback;
+	void	   *modify_buffer_flush_context;
+
+	/* Table AM specific data */
+	void	   *data;
+
+	TableModifyEndCallback modify_end_callback;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -578,6 +611,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCallback modify_buffer_flush_callback,
+											 void *modify_buffer_flush_context);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1609,6 +1657,38 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options,
+				   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				   void *modify_buffer_flush_context)
+{
+	return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+											   cid, options,
+											   modify_buffer_flush_callback,
+											   modify_buffer_flush_context);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_end(state);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d427a1c16a..84baf9b78d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1141,6 +1141,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2859,6 +2861,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.34.1



  [application/x-patch] v22-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch (7.4K, 3-v22-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch)
  download | inline diff:
From e029a166d6bc7550d544f93817c9c1b6d4aa368e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 4 Jun 2024 20:17:45 +0000
Subject: [PATCH v22 2/5] Optimize various SQL commands with new multi insert
 table AM

This commit optimizes the following commands for heap AM using new
multi insert table AM added by commit <<CHANGE_ME>>:
- CREATE TABLE AS
- CREATE MATERIALIZED VIEW
- REFRESH MATERIALIZED VIEW
- Table Rewrites

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/createas.c  | 26 ++++++++++----------------
 src/backend/commands/matview.c   | 25 ++++++++++---------------
 src/backend/commands/tablecmds.c | 30 ++++++++++--------------------
 3 files changed, 30 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..794c735b78 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,20 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 NULL,
+											 NULL);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +591,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +609,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6d09b75556..c23861c505 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -458,9 +456,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN,
+										 NULL,
+										 NULL);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -485,12 +487,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -505,9 +502,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7b6c69b7a5..9b79f46d7a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5966,10 +5966,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableModifyState *mstate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -5988,18 +5986,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 * Prepare a BulkInsertState and options for table_tuple_insert.  The FSM
 	 * is empty, so don't bother using it.
 	 */
-	if (newrel)
+	if (newrel && mstate == NULL)
 	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
-	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		mstate = table_modify_begin(newrel,
+									TM_FLAG_BAS_BULKWRITE,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									NULL,
+									NULL);
 	}
 
 	/*
@@ -6297,8 +6291,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_modify_buffer_insert(mstate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -6319,10 +6312,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
-
-		table_finish_bulk_insert(newrel, ti_options);
-
+		table_modify_end(mstate);
 		table_close(newrel, NoLock);
 	}
 }
-- 
2.34.1



  [application/x-patch] v22-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch (9.6K, 4-v22-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch)
  download | inline diff:
From d30d0c31ceecac5a0f7d61b7e096225dcfdba3c9 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 4 Jun 2024 20:23:03 +0000
Subject: [PATCH v22 3/5] Optimize INSERT INTO SELECT with new multi insert
 table AM

This commit optimizes the INSERT INTO SELECT query for heap AM
using new multi insert table AM added by commit <<CHANGE_ME>>.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/executor/nodeModifyTable.c | 170 ++++++++++++++++++++++---
 src/tools/pgindent/typedefs.list       |   1 +
 2 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index cee60d3659..582c4a9842 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -114,6 +114,18 @@ typedef struct UpdateContext
 	LockTupleMode lockmode;
 } UpdateContext;
 
+typedef struct InsertModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+	ModifyTableState *mtstate;
+} InsertModifyBufferFlushContext;
+
+static InsertModifyBufferFlushContext *insert_modify_buffer_flush_context = NULL;
+static TableModifyState *table_modify_state = NULL;
+
+static void InsertModifyBufferFlushCallback(void *context,
+											TupleTableSlot *slot);
 
 static void ExecBatchInsert(ModifyTableState *mtstate,
 							ResultRelInfo *resultRelInfo,
@@ -726,6 +738,55 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
 	return ExecProject(newProj);
 }
 
+static void
+InsertModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	InsertModifyBufferFlushContext *ctx = (InsertModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	ModifyTableState *mtstate = ctx->mtstate;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 mtstate->mt_transition_capture);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInsert
  *
@@ -751,7 +812,8 @@ ExecInsert(ModifyTableContext *context,
 		   TupleTableSlot *slot,
 		   bool canSetTag,
 		   TupleTableSlot **inserted_tuple,
-		   ResultRelInfo **insert_destrel)
+		   ResultRelInfo **insert_destrel,
+		   bool canMultiInsert)
 {
 	ModifyTableState *mtstate = context->mtstate;
 	EState	   *estate = context->estate;
@@ -764,6 +826,7 @@ ExecInsert(ModifyTableContext *context,
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	MemoryContext oldContext;
+	bool		ar_insert_triggers_executed = false;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -1126,17 +1189,53 @@ ExecInsert(ModifyTableContext *context,
 		}
 		else
 		{
-			/* insert the tuple normally */
-			table_tuple_insert(resultRelationDesc, slot,
-							   estate->es_output_cid,
-							   0, NULL);
+			if (canMultiInsert &&
+				proute == NULL &&
+				resultRelInfo->ri_WithCheckOptions == NIL &&
+				resultRelInfo->ri_projectReturning == NULL)
+			{
+				if (insert_modify_buffer_flush_context == NULL)
+				{
+					insert_modify_buffer_flush_context =
+						(InsertModifyBufferFlushContext *) palloc0(sizeof(InsertModifyBufferFlushContext));
+					insert_modify_buffer_flush_context->resultRelInfo = resultRelInfo;
+					insert_modify_buffer_flush_context->estate = estate;
+					insert_modify_buffer_flush_context->mtstate = mtstate;
+				}
 
-			/* insert index entries for tuple */
-			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-													   slot, estate, false,
-													   false, NULL, NIL,
-													   false);
+				if (table_modify_state == NULL)
+				{
+					table_modify_state = table_modify_begin(resultRelInfo->ri_RelationDesc,
+															0,
+															estate->es_output_cid,
+															0,
+															InsertModifyBufferFlushCallback,
+															insert_modify_buffer_flush_context);
+				}
+
+				table_modify_buffer_insert(table_modify_state, slot);
+				ar_insert_triggers_executed = true;
+			}
+			else
+			{
+				/* insert the tuple normally */
+				table_tuple_insert(resultRelationDesc, slot,
+								   estate->es_output_cid,
+								   0, NULL);
+
+				/* insert index entries for tuple */
+				if (resultRelInfo->ri_NumIndices > 0)
+					recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+														   slot, estate, false,
+														   false, NULL, NIL,
+														   false);
+
+				ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+									 mtstate->mt_transition_capture);
+
+				list_free(recheckIndexes);
+				ar_insert_triggers_executed = true;
+			}
 		}
 	}
 
@@ -1170,10 +1269,12 @@ ExecInsert(ModifyTableContext *context,
 	}
 
 	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
-
-	list_free(recheckIndexes);
+	if (!ar_insert_triggers_executed)
+	{
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 ar_insert_trig_tcs);
+		list_free(recheckIndexes);
+	}
 
 	/*
 	 * Check any WITH CHECK OPTION constraints from parent views.  We are
@@ -1869,7 +1970,7 @@ ExecCrossPartitionUpdate(ModifyTableContext *context,
 	/* Tuple routing starts from the root table. */
 	context->cpUpdateReturningSlot =
 		ExecInsert(context, mtstate->rootResultRelInfo, slot, canSetTag,
-				   inserted_tuple, insert_destrel);
+				   inserted_tuple, insert_destrel, false);
 
 	/*
 	 * Reset the transition state that may possibly have been written by
@@ -3364,7 +3465,7 @@ ExecMergeNotMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 				mtstate->mt_merge_action = action;
 
 				rslot = ExecInsert(context, mtstate->rootResultRelInfo,
-								   newslot, canSetTag, NULL, NULL);
+								   newslot, canSetTag, NULL, NULL, false);
 				mtstate->mt_merge_inserted += 1;
 				break;
 			case CMD_NOTHING:
@@ -3749,6 +3850,10 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		canMultiInsert = false;
+
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -3844,6 +3949,10 @@ ExecModifyTable(PlanState *pstate)
 		if (TupIsNull(context.planSlot))
 			break;
 
+		if (operation == CMD_INSERT &&
+			nodeTag(subplanstate) == T_SeqScanState)
+			canMultiInsert = true;
+
 		/*
 		 * When there are multiple result relations, each tuple contains a
 		 * junk column that gives the OID of the rel from which it came.
@@ -4057,7 +4166,7 @@ ExecModifyTable(PlanState *pstate)
 					ExecInitInsertProjection(node, resultRelInfo);
 				slot = ExecGetInsertNewTuple(resultRelInfo, context.planSlot);
 				slot = ExecInsert(&context, resultRelInfo, slot,
-								  node->canSetTag, NULL, NULL);
+								  node->canSetTag, NULL, NULL, canMultiInsert);
 				break;
 
 			case CMD_UPDATE:
@@ -4116,6 +4225,17 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	if (table_modify_state != NULL)
+	{
+		Assert(operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Insert remaining tuples for batch insert.
 	 */
@@ -4228,6 +4348,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_merge_updated = 0;
 	mtstate->mt_merge_deleted = 0;
 
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
+
 	/*----------
 	 * Resolve the target relation. This is the same as:
 	 *
@@ -4681,6 +4804,17 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	if (table_modify_state != NULL)
+	{
+		Assert(node->operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 84baf9b78d..3353faa6cd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1180,6 +1180,7 @@ ImportForeignSchema_function
 ImportQual
 InProgressEnt
 InProgressIO
+InsertModifyBufferFlushContext
 IncludeWal
 InclusionOpaque
 IncrementVarSublevelsUp_context
-- 
2.34.1



  [application/x-patch] v22-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch (20.1K, 5-v22-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch)
  download | inline diff:
From c7c2e8ebff060cbd36e508c7a0c28ba6696c427d Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 4 Jun 2024 20:24:21 +0000
Subject: [PATCH v22 4/5] Optimize Logical Replication Apply with new multi
 insert table AM

This commit optimizes the Logical Replication Apply for heap AM
using new multi insert table AM added by commit <<CHANGE_ME>>.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/executor/execReplication.c   |  39 +++
 src/backend/replication/logical/proto.c  |  24 ++
 src/backend/replication/logical/worker.c | 350 ++++++++++++++++++++++-
 src/include/executor/executor.h          |   4 +
 src/include/replication/logicalproto.h   |   2 +
 src/tools/pgindent/typedefs.list         |   2 +
 6 files changed, 408 insertions(+), 13 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index d0a89cd577..fae1375537 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -544,6 +544,45 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 	}
 }
 
+void
+ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+						ResultRelInfo *resultRelInfo,
+						EState *estate, TupleTableSlot *slot)
+{
+	bool		skip_tuple = false;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+
+	/* For now we support only tables. */
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+
+	CheckCmdReplicaIdentity(rel, CMD_INSERT);
+
+	/* BEFORE ROW INSERT Triggers */
+	if (resultRelInfo->ri_TrigDesc &&
+		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
+	{
+		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			skip_tuple = true;	/* "do nothing" */
+	}
+
+	if (!skip_tuple)
+	{
+		/* Compute stored generated columns */
+		if (rel->rd_att->constr &&
+			rel->rd_att->constr->has_generated_stored)
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
+
+		/* Check the constraints of the tuple */
+		if (rel->rd_att->constr)
+			ExecConstraints(resultRelInfo, slot, estate);
+		if (rel->rd_rel->relispartition)
+			ExecPartitionCheck(resultRelInfo, slot, estate, true);
+
+		table_modify_buffer_insert(MultiInsertState, slot);
+	}
+}
+
 /*
  * Find the searchslot tuple and update it with data in the slot,
  * update the indexes, and execute any constraints and per-row triggers.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 95c09c9516..46d38aebd2 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -427,6 +427,30 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	logicalrep_write_tuple(out, rel, newslot, binary, columns);
 }
 
+LogicalRepRelId
+logicalrep_read_relid(StringInfo in)
+{
+	LogicalRepRelId relid;
+
+	/* read the relation id */
+	relid = pq_getmsgint(in, 4);
+
+	return relid;
+}
+
+void
+logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup)
+{
+	char		action;
+
+	action = pq_getmsgbyte(in);
+	if (action != 'N')
+		elog(ERROR, "expected new tuple but got %d",
+			 action);
+
+	logicalrep_read_tuple(in, newtup);
+}
+
 /*
  * Read INSERT from stream.
  *
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b5a80fe3e8..6d80f650ce 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -148,7 +148,6 @@
 #include <unistd.h>
 
 #include "access/table.h"
-#include "access/tableam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "catalog/indexing.h"
@@ -416,6 +415,30 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,
 													 ParallelApplyWorkerInfo **winfo);
 
+typedef enum LRMultiInsertReturnStatus
+{
+	LR_MULTI_INSERT_NONE = 0,
+	LR_MULTI_INSERT_REL_SKIPPED,
+	LR_MULTI_INSERT_DISALLOWED,
+	LR_MULTI_INSERT_DONE,
+} LRMultiInsertReturnStatus;
+
+static TableModifyState *MultiInsertState = NULL;
+static LogicalRepRelMapEntry *LastRel = NULL;
+static LogicalRepRelId LastMultiInsertRelId = InvalidOid;
+static ApplyExecutionData *LastEData = NULL;
+static TupleTableSlot *LastRemoteSlot = NULL;
+
+typedef struct LRModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} LRModifyBufferFlushContext;
+
+static LRModifyBufferFlushContext *modify_buffer_flush_context = NULL;
+static void LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
+static void FinishMultiInserts(void);
+
 /*
  * Form the origin name for the subscription.
  *
@@ -1017,6 +1040,8 @@ apply_handle_commit(StringInfo s)
 {
 	LogicalRepCommitData commit_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit(s, &commit_data);
 
 	if (commit_data.commit_lsn != remote_final_lsn)
@@ -1043,6 +1068,8 @@ apply_handle_begin_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData begin_data;
 
+	FinishMultiInserts();
+
 	/* Tablesync should never receive prepare. */
 	if (am_tablesync_worker())
 		ereport(ERROR,
@@ -1109,6 +1136,8 @@ apply_handle_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData prepare_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_prepare(s, &prepare_data);
 
 	if (prepare_data.prepare_lsn != remote_final_lsn)
@@ -1171,6 +1200,8 @@ apply_handle_commit_prepared(StringInfo s)
 	LogicalRepCommitPreparedTxnData prepare_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit_prepared(s, &prepare_data);
 	set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
 
@@ -1220,6 +1251,8 @@ apply_handle_rollback_prepared(StringInfo s)
 	LogicalRepRollbackPreparedTxnData rollback_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_rollback_prepared(s, &rollback_data);
 	set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
 
@@ -1277,6 +1310,8 @@ apply_handle_stream_prepare(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1304,6 +1339,8 @@ apply_handle_stream_prepare(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
 								   prepare_data.xid, prepare_data.prepare_lsn);
 
+			FinishMultiInserts();
+
 			/* Mark the transaction as prepared. */
 			apply_handle_prepare_internal(&prepare_data);
 
@@ -1407,6 +1444,8 @@ apply_handle_stream_prepare(StringInfo s)
 static void
 apply_handle_origin(StringInfo s)
 {
+	FinishMultiInserts();
+
 	/*
 	 * ORIGIN message can only come inside streaming transaction or inside
 	 * remote transaction and before any actual writes.
@@ -1473,6 +1512,8 @@ apply_handle_stream_start(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1628,6 +1669,8 @@ apply_handle_stream_stop(StringInfo s)
 	ParallelApplyWorkerInfo *winfo;
 	TransApplyAction apply_action;
 
+	FinishMultiInserts();
+
 	if (!in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1821,6 +1864,8 @@ apply_handle_stream_abort(StringInfo s)
 	StringInfoData original_msg = *s;
 	bool		toplevel_xact;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2138,6 +2183,8 @@ apply_handle_stream_commit(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2159,6 +2206,8 @@ apply_handle_stream_commit(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
 								   commit_data.commit_lsn);
 
+			FinishMultiInserts();
+
 			apply_handle_commit_internal(&commit_data);
 
 			/* Unlink the files with serialized changes and subxact info. */
@@ -2302,6 +2351,8 @@ apply_handle_relation(StringInfo s)
 {
 	LogicalRepRelation *rel;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
 		return;
 
@@ -2325,6 +2376,8 @@ apply_handle_type(StringInfo s)
 {
 	LogicalRepTyp typ;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
 		return;
 
@@ -2363,16 +2416,126 @@ TargetPrivilegesCheck(Relation rel, AclMode mode)
 						RelationGetRelationName(rel))));
 }
 
-/*
- * Handle INSERT message.
- */
+static void
+FinishMultiInserts(void)
+{
+	LogicalRepMsgType saved_command;
+
+	if (MultiInsertState == NULL)
+		return;
+
+	Assert(OidIsValid(LastMultiInsertRelId));
+	Assert(LastEData != NULL);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	ExecDropSingleTupleTableSlot(LastRemoteSlot);
+	LastRemoteSlot = NULL;
+
+	table_modify_end(MultiInsertState);
+	MultiInsertState = NULL;
+	LastMultiInsertRelId = InvalidOid;
+
+	pfree(modify_buffer_flush_context);
+	modify_buffer_flush_context = NULL;
+
+	ExecCloseIndices(LastEData->targetRelInfo);
+
+	finish_edata(LastEData);
+	LastEData = NULL;
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+
+	logicalrep_rel_close(LastRel, NoLock);
+	LastRel = NULL;
+
+	end_replication_step();
+}
 
 static void
-apply_handle_insert(StringInfo s)
+LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	LRModifyBufferFlushContext *ctx = (LRModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	LogicalRepMsgType saved_command;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 NULL);
+
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 NULL);
+	}
+
+	/*
+	 * XXX we should in theory pass a TransitionCaptureState object to the
+	 * above to capture transition tuples, but after statement triggers don't
+	 * actually get fired by replication yet anyway
+	 */
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+}
+
+static LRMultiInsertReturnStatus
+do_multi_inserts(StringInfo s, LogicalRepRelId *relid)
 {
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
-	LogicalRepRelId relid;
 	UserContext ucxt;
 	ApplyExecutionData *edata;
 	EState	   *estate;
@@ -2380,17 +2543,142 @@ apply_handle_insert(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	if (MultiInsertState == NULL)
+		begin_replication_step();
+
+	*relid = logicalrep_read_relid(s);
+
+	if (MultiInsertState != NULL &&
+		(LastMultiInsertRelId != InvalidOid &&
+		 *relid != InvalidOid &&
+		 LastMultiInsertRelId != *relid))
+		FinishMultiInserts();
+
+	if (MultiInsertState == NULL)
+		rel = logicalrep_rel_open(*relid, RowExclusiveLock);
+	else
+		rel = LastRel;
+
+	if (!should_apply_changes_for_rel(rel))
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_REL_SKIPPED;
+	}
+
+	/* For a partitioned table, let's not do multi inserts. */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_DISALLOWED;
+	}
+
 	/*
-	 * Quick return if we are skipping data modification changes or handling
-	 * streamed transactions.
+	 * Make sure that any user-supplied code runs as the table owner, unless
+	 * the user has opted out of that behavior.
 	 */
-	if (is_skipping_changes() ||
-		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
-		return;
+	run_as_owner = MySubscription->runasowner;
+	if (!run_as_owner)
+		SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = rel;
+
+	if (MultiInsertState == NULL)
+	{
+		oldctx = MemoryContextSwitchTo(TopTransactionContext);
+
+		/* Initialize the executor state. */
+		LastEData = edata = create_edata_for_relation(rel);
+		estate = edata->estate;
+
+		LastRemoteSlot = remoteslot = MakeTupleTableSlot(RelationGetDescr(rel->localrel),
+														 &TTSOpsVirtual);
+
+		modify_buffer_flush_context = (LRModifyBufferFlushContext *) palloc(sizeof(LRModifyBufferFlushContext));
+		modify_buffer_flush_context->resultRelInfo = edata->targetRelInfo;
+		modify_buffer_flush_context->estate = estate;
+
+		MultiInsertState = table_modify_begin(edata->targetRelInfo->ri_RelationDesc,
+											  TM_FLAG_BAS_BULKWRITE,
+											  GetCurrentCommandId(true),
+											  0,
+											  LRModifyBufferFlushCallback,
+											  modify_buffer_flush_context);
+		LastRel = rel;
+		LastMultiInsertRelId = *relid;
+
+		/* We must open indexes here. */
+		ExecOpenIndices(edata->targetRelInfo, false);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+	else
+	{
+		CommandId	cid;
+
+		edata = LastEData;
+		estate = edata->estate;
+		ResetExprContext(GetPerTupleExprContext(estate));
+		ExecClearTuple(LastRemoteSlot);
+		remoteslot = LastRemoteSlot;
+		cid = GetCurrentCommandId(true);
+		MultiInsertState->cid = cid;
+		estate->es_output_cid = cid;
+	}
+
+	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
+	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+	slot_store_data(remoteslot, rel, &newtup);
+	slot_fill_defaults(rel, estate, remoteslot);
+	MemoryContextSwitchTo(oldctx);
+
+	TargetPrivilegesCheck(edata->targetRelInfo->ri_RelationDesc, ACL_INSERT);
+	ExecRelationMultiInsert(MultiInsertState, edata->targetRelInfo, estate, remoteslot);
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	if (!run_as_owner)
+		RestoreUserContext(&ucxt);
+
+	Assert(MultiInsertState != NULL);
+
+	CommandCounterIncrement();
+
+	return LR_MULTI_INSERT_DONE;
+}
+
+static bool
+do_single_inserts(StringInfo s, LogicalRepRelId relid)
+{
+	LogicalRepRelMapEntry *rel;
+	LogicalRepTupleData newtup;
+	UserContext ucxt;
+	ApplyExecutionData *edata;
+	EState	   *estate;
+	TupleTableSlot *remoteslot;
+	MemoryContext oldctx;
+	bool		run_as_owner;
+
+	Assert(relid != InvalidOid);
 
 	begin_replication_step();
 
-	relid = logicalrep_read_insert(s, &newtup);
 	rel = logicalrep_rel_open(relid, RowExclusiveLock);
 	if (!should_apply_changes_for_rel(rel))
 	{
@@ -2400,7 +2688,7 @@ apply_handle_insert(StringInfo s)
 		 */
 		logicalrep_rel_close(rel, RowExclusiveLock);
 		end_replication_step();
-		return;
+		return false;
 	}
 
 	/*
@@ -2422,6 +2710,7 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 
 	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	slot_store_data(remoteslot, rel, &newtup);
 	slot_fill_defaults(rel, estate, remoteslot);
@@ -2446,6 +2735,35 @@ apply_handle_insert(StringInfo s)
 	logicalrep_rel_close(rel, NoLock);
 
 	end_replication_step();
+
+	return true;
+}
+
+/*
+ * Handle INSERT message.
+ */
+static void
+apply_handle_insert(StringInfo s)
+{
+	LRMultiInsertReturnStatus mi_status;
+	LogicalRepRelId relid;
+
+	/*
+	 * Quick return if we are skipping data modification changes or handling
+	 * streamed transactions.
+	 */
+	if (is_skipping_changes() ||
+		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+		return;
+
+	mi_status = do_multi_inserts(s, &relid);
+	if (mi_status == LR_MULTI_INSERT_REL_SKIPPED ||
+		mi_status == LR_MULTI_INSERT_DONE)
+		return;
+
+	do_single_inserts(s, relid);
+
+	return;
 }
 
 /*
@@ -2532,6 +2850,8 @@ apply_handle_update(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -2713,6 +3033,8 @@ apply_handle_delete(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -3154,6 +3476,8 @@ apply_handle_truncate(StringInfo s)
 	ListCell   *lc;
 	LOCKMODE	lockmode = AccessExclusiveLock;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 9770752ea3..8f10ea977b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -14,6 +14,7 @@
 #ifndef EXECUTOR_H
 #define EXECUTOR_H
 
+#include "access/tableam.h"
 #include "executor/execdesc.h"
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
@@ -656,6 +657,9 @@ extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 									 EState *estate, TupleTableSlot *slot);
+extern void ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+									ResultRelInfo *resultRelInfo,
+									EState *estate, TupleTableSlot *slot);
 extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index c409638a2e..3f3a7f0a31 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -226,6 +226,8 @@ extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
 									bool binary, Bitmapset *columns);
+extern LogicalRepRelId logicalrep_read_relid(StringInfo in);
+extern void logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3353faa6cd..42f27dda70 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1470,6 +1470,8 @@ LPTHREAD_START_ROUTINE
 LPTSTR
 LPVOID
 LPWSTR
+LRModifyBufferFlushContext
+LRMultiInsertReturnStatus
 LSEG
 LUID
 LVRelState
-- 
2.34.1



  [application/x-patch] v22-0005-Use-new-multi-insert-table-AM-for-COPY.patch (14.0K, 6-v22-0005-Use-new-multi-insert-table-AM-for-COPY.patch)
  download | inline diff:
From 46462539749b00695ed54a57be89eda8c278fbec Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 4 Jun 2024 20:27:12 +0000
Subject: [PATCH v22 5/5] Use new multi insert table AM for COPY

This commit uses the new multi insert table AM added by commit
<<CHANGE_ME>> for COPY ... FROM command.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/copyfrom.c          | 234 +++++++++++++++--------
 src/include/commands/copyfrom_internal.h |   4 +-
 src/tools/pgindent/typedefs.list         |   1 +
 3 files changed, 158 insertions(+), 81 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index ce4d62e707..82b6f7faa1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -71,14 +71,25 @@
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
+typedef struct CopyModifyBufferFlushContext
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} CopyModifyBufferFlushContext;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *multislot;
+	CopyModifyBufferFlushContext *modify_buffer_flush_context;
 	int			nused;			/* number of 'slots' containing tuples */
+	int			currslotno;		/* Current buffered slot number that's being
+								 * flushed; Used to get correct cur_lineno for
+								 * errors while in flush callback. */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
 } CopyMultiInsertBuffer;
@@ -99,6 +110,7 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
+static void CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
 
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
@@ -218,14 +230,37 @@ CopyLimitPrintoutLength(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->modify_buffer_flush_context = (CopyModifyBufferFlushContext *) palloc(sizeof(CopyModifyBufferFlushContext));
+		buffer->modify_buffer_flush_context->cstate = cstate;
+		buffer->modify_buffer_flush_context->resultRelInfo = rri;
+		buffer->modify_buffer_flush_context->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											TM_FLAG_BAS_BULKWRITE,
+											miinfo->mycid,
+											miinfo->ti_options,
+											CopyModifyBufferFlushCallback,
+											buffer->modify_buffer_flush_context);
+		buffer->slots = NULL;
+		buffer->multislot = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->multislot = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -236,11 +271,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -273,7 +309,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -317,8 +353,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -390,13 +424,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -404,56 +433,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
+		table_modify_buffer_flush(buffer->mstate);
 
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-			ExecClearTuple(slots[i]);
-		}
+		/*
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
+		 */
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -469,6 +460,60 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	buffer->nused = 0;
 }
 
+static void
+CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	CopyModifyBufferFlushContext *ctx = (CopyModifyBufferFlushContext *) context;
+	CopyFromState cstate = ctx->cstate;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 cstate->transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 cstate->transition_capture);
+	}
+
+	Assert(buffer->currslotno <= buffer->nused);
+}
+
 /*
  * Drop used slots and free member for this buffer.
  *
@@ -489,19 +534,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->multislot);
+		pfree(buffer->modify_buffer_flush_context);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -588,13 +632,32 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->multislot == NULL)
+			buffer->multislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												   &TTSOpsVirtual);
+
+		/* Caller must clear the slot */
+		slot = buffer->multislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -608,7 +671,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -616,6 +683,14 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	/* Record this slot as being used */
 	buffer->nused++;
 
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->multislot);
+		buffer->currslotno = 0;
+
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+
 	/* Update how many tuples are stored and their size */
 	miinfo->bufferedTuples++;
 	miinfo->bufferedBytes += tuplen;
@@ -830,7 +905,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -1080,7 +1155,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 42f27dda70..4740c6946e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -494,6 +494,7 @@ CopyHeaderChoice
 CopyInsertMethod
 CopyLogVerbosityChoice
 CopyMethod
+CopyModifyBufferFlushContext
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-08-26 05:39                             ` Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:37                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  0 siblings, 2 replies; 30+ messages in thread

From: Bharath Rupireddy @ 2024-08-26 05:39 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Please find the v22 patches with the above changes.

Please find the v23 patches after rebasing 0005 and adapting 0004 for
9758174e2e.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v23-0005-Use-new-multi-insert-table-AM-for-COPY.patch (14.0K, 2-v23-0005-Use-new-multi-insert-table-AM-for-COPY.patch)
  download | inline diff:
From 511d0a6aa3851408b88a5d5cccb1a31af26aa089 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 26 Aug 2024 04:50:09 +0000
Subject: [PATCH v23 5/5] Use new multi insert table AM for COPY

This commit uses the new multi insert table AM added by commit
<<CHANGE_ME>> for COPY ... FROM command.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/copyfrom.c          | 234 +++++++++++++++--------
 src/include/commands/copyfrom_internal.h |   4 +-
 src/tools/pgindent/typedefs.list         |   1 +
 3 files changed, 158 insertions(+), 81 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..29e0e497c1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -74,14 +74,25 @@
  */
 #define MAX_PARTITION_BUFFERS	32
 
+typedef struct CopyModifyBufferFlushContext
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} CopyModifyBufferFlushContext;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *multislot;
+	CopyModifyBufferFlushContext *modify_buffer_flush_context;
 	int			nused;			/* number of 'slots' containing tuples */
+	int			currslotno;		/* Current buffered slot number that's being
+								 * flushed; Used to get correct cur_lineno for
+								 * errors while in flush callback. */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
 } CopyMultiInsertBuffer;
@@ -102,6 +113,7 @@ typedef struct CopyMultiInsertInfo
 	int			ti_options;		/* table insert options */
 } CopyMultiInsertInfo;
 
+static void CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
 
 /* non-export function prototypes */
 static void ClosePipeFromProgram(CopyFromState cstate);
@@ -221,14 +233,37 @@ CopyLimitPrintoutLength(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->modify_buffer_flush_context = (CopyModifyBufferFlushContext *) palloc(sizeof(CopyModifyBufferFlushContext));
+		buffer->modify_buffer_flush_context->cstate = cstate;
+		buffer->modify_buffer_flush_context->resultRelInfo = rri;
+		buffer->modify_buffer_flush_context->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											TM_FLAG_BAS_BULKWRITE,
+											miinfo->mycid,
+											miinfo->ti_options,
+											CopyModifyBufferFlushCallback,
+											buffer->modify_buffer_flush_context);
+		buffer->slots = NULL;
+		buffer->multislot = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->multislot = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -239,11 +274,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -276,7 +312,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -320,8 +356,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -393,13 +427,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -407,56 +436,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
+		table_modify_buffer_flush(buffer->mstate);
 
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-			ExecClearTuple(slots[i]);
-		}
+		/*
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
+		 */
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -472,6 +463,60 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	buffer->nused = 0;
 }
 
+static void
+CopyModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	CopyModifyBufferFlushContext *ctx = (CopyModifyBufferFlushContext *) context;
+	CopyFromState cstate = ctx->cstate;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 cstate->transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 cstate->transition_capture);
+	}
+
+	Assert(buffer->currslotno <= buffer->nused);
+}
+
 /*
  * Drop used slots and free member for this buffer.
  *
@@ -492,19 +537,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->multislot);
+		pfree(buffer->modify_buffer_flush_context);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -598,15 +642,34 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(buffer->nused < MAX_BUFFERED_TUPLES);
 
 	nused = buffer->nused;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->multislot == NULL)
+			buffer->multislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												   &TTSOpsVirtual);
+
+		/* Caller must clear the slot */
+		slot = buffer->multislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -620,7 +683,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -628,6 +695,14 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	/* Record this slot as being used */
 	buffer->nused++;
 
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->multislot);
+		buffer->currslotno = 0;
+
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+
 	/* Update how many tuples are stored and their size */
 	miinfo->bufferedTuples++;
 	miinfo->bufferedBytes += tuplen;
@@ -842,7 +917,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -1092,7 +1167,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 07a61d086d..e882d4ab17 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -497,6 +497,7 @@ CopyHeaderChoice
 CopyInsertMethod
 CopyLogVerbosityChoice
 CopyMethod
+CopyModifyBufferFlushContext
 CopyMultiInsertBuffer
 CopyMultiInsertInfo
 CopyOnErrorChoice
-- 
2.43.0



  [application/x-patch] v23-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch (7.4K, 3-v23-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch)
  download | inline diff:
From beb1928bbbfaf6fb1466a58d7fa1deb8412e47af Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 26 Aug 2024 04:46:36 +0000
Subject: [PATCH v23 2/5] Optimize various SQL commands with new multi insert 
 table AM

This commit optimizes the following commands for heap AM using new
multi insert table AM added by commit <<CHANGE_ME>>:
- CREATE TABLE AS
- CREATE MATERIALIZED VIEW
- REFRESH MATERIALIZED VIEW
- ALTER TABLE flavours resulting in table rewrites

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/createas.c  | 26 ++++++++++----------------
 src/backend/commands/matview.c   | 25 ++++++++++---------------
 src/backend/commands/tablecmds.c | 30 ++++++++++--------------------
 3 files changed, 30 insertions(+), 51 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0b629b1f79..8378597f36 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -547,17 +545,20 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->mstate = table_modify_begin(intoRelationDesc,
+											 TM_FLAG_BAS_BULKWRITE,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 NULL,
+											 NULL);
 	else
-		myState->bistate = NULL;
+		myState->mstate = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -585,11 +586,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_modify_buffer_insert(myState->mstate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -607,10 +604,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_modify_end(myState->mstate);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 91f0fd6ea3..f036abbeb3 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -48,9 +48,7 @@ typedef struct
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
 	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableModifyState *mstate;	/* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -491,9 +489,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 * Fill private fields of myState for use by later routines
 	 */
 	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->mstate = table_modify_begin(transientrel,
+										 TM_FLAG_BAS_BULKWRITE,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM |
+										 TABLE_INSERT_FROZEN,
+										 NULL,
+										 NULL);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -518,12 +520,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_modify_buffer_insert(myState->mstate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -538,9 +535,7 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_modify_end(myState->mstate);
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dac39df83a..3ca5448a72 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5954,10 +5954,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableModifyState *mstate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -5976,18 +5974,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 * Prepare a BulkInsertState and options for table_tuple_insert.  The FSM
 	 * is empty, so don't bother using it.
 	 */
-	if (newrel)
-	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
+	if (newrel && mstate == NULL)
 	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		mstate = table_modify_begin(newrel,
+									TM_FLAG_BAS_BULKWRITE,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									NULL,
+									NULL);
 	}
 
 	/*
@@ -6285,8 +6279,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_modify_buffer_insert(mstate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -6307,10 +6300,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
-
-		table_finish_bulk_insert(newrel, ti_options);
-
+		table_modify_end(mstate);
 		table_close(newrel, NoLock);
 	}
 }
-- 
2.43.0



  [application/x-patch] v23-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch (22.3K, 4-v23-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch)
  download | inline diff:
From a1579bcf58f74d1e3121b81c249c029e470297f0 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 26 Aug 2024 04:49:29 +0000
Subject: [PATCH v23 4/5] Optimize Logical Replication Apply with new multi
 insert table AM

This commit optimizes the Logical Replication Apply for heap AM
using new multi insert table AM added by commit <<CHANGE_ME>>.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/executor/execReplication.c   |  41 ++-
 src/backend/replication/logical/proto.c  |  24 ++
 src/backend/replication/logical/worker.c | 375 ++++++++++++++++++++++-
 src/include/executor/executor.h          |   4 +
 src/include/replication/conflict.h       |   6 +
 src/include/replication/logicalproto.h   |   2 +
 src/tools/pgindent/typedefs.list         |   2 +
 7 files changed, 440 insertions(+), 14 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 1086cbc962..bc1ba3e5a8 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -531,7 +531,7 @@ retry:
  * Check all the unique indexes in 'recheckIndexes' for conflict with the
  * tuple in 'remoteslot' and report if found.
  */
-static void
+void
 CheckAndReportConflict(ResultRelInfo *resultRelInfo, EState *estate,
 					   ConflictType type, List *recheckIndexes,
 					   TupleTableSlot *searchslot, TupleTableSlot *remoteslot)
@@ -646,6 +646,45 @@ ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 	}
 }
 
+void
+ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+						ResultRelInfo *resultRelInfo,
+						EState *estate, TupleTableSlot *slot)
+{
+	bool		skip_tuple = false;
+	Relation	rel = resultRelInfo->ri_RelationDesc;
+
+	/* For now we support only tables. */
+	Assert(rel->rd_rel->relkind == RELKIND_RELATION);
+
+	CheckCmdReplicaIdentity(rel, CMD_INSERT);
+
+	/* BEFORE ROW INSERT Triggers */
+	if (resultRelInfo->ri_TrigDesc &&
+		resultRelInfo->ri_TrigDesc->trig_insert_before_row)
+	{
+		if (!ExecBRInsertTriggers(estate, resultRelInfo, slot))
+			skip_tuple = true;	/* "do nothing" */
+	}
+
+	if (!skip_tuple)
+	{
+		/* Compute stored generated columns */
+		if (rel->rd_att->constr &&
+			rel->rd_att->constr->has_generated_stored)
+			ExecComputeStoredGenerated(resultRelInfo, estate, slot,
+									   CMD_INSERT);
+
+		/* Check the constraints of the tuple */
+		if (rel->rd_att->constr)
+			ExecConstraints(resultRelInfo, slot, estate);
+		if (rel->rd_rel->relispartition)
+			ExecPartitionCheck(resultRelInfo, slot, estate, true);
+
+		table_modify_buffer_insert(MultiInsertState, slot);
+	}
+}
+
 /*
  * Find the searchslot tuple and update it with data in the slot,
  * update the indexes, and execute any constraints and per-row triggers.
diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index 980f6e2741..0e7050dba8 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -427,6 +427,30 @@ logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 	logicalrep_write_tuple(out, rel, newslot, binary, columns);
 }
 
+LogicalRepRelId
+logicalrep_read_relid(StringInfo in)
+{
+	LogicalRepRelId relid;
+
+	/* read the relation id */
+	relid = pq_getmsgint(in, 4);
+
+	return relid;
+}
+
+void
+logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup)
+{
+	char		action;
+
+	action = pq_getmsgbyte(in);
+	if (action != 'N')
+		elog(ERROR, "expected new tuple but got %d",
+			 action);
+
+	logicalrep_read_tuple(in, newtup);
+}
+
 /*
  * Read INSERT from stream.
  *
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 38c2895307..7873152c02 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -148,7 +148,6 @@
 #include <unistd.h>
 
 #include "access/table.h"
-#include "access/tableam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
 #include "catalog/indexing.h"
@@ -414,6 +413,30 @@ static inline void reset_apply_error_context_info(void);
 static TransApplyAction get_transaction_apply_action(TransactionId xid,
 													 ParallelApplyWorkerInfo **winfo);
 
+typedef enum LRMultiInsertReturnStatus
+{
+	LR_MULTI_INSERT_NONE = 0,
+	LR_MULTI_INSERT_REL_SKIPPED,
+	LR_MULTI_INSERT_DISALLOWED,
+	LR_MULTI_INSERT_DONE,
+} LRMultiInsertReturnStatus;
+
+static TableModifyState *MultiInsertState = NULL;
+static LogicalRepRelMapEntry *LastRel = NULL;
+static LogicalRepRelId LastMultiInsertRelId = InvalidOid;
+static ApplyExecutionData *LastEData = NULL;
+static TupleTableSlot *LastRemoteSlot = NULL;
+
+typedef struct LRModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} LRModifyBufferFlushContext;
+
+static LRModifyBufferFlushContext *modify_buffer_flush_context = NULL;
+static void LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot);
+static void FinishMultiInserts(void);
+
 /*
  * Form the origin name for the subscription.
  *
@@ -1015,6 +1038,8 @@ apply_handle_commit(StringInfo s)
 {
 	LogicalRepCommitData commit_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit(s, &commit_data);
 
 	if (commit_data.commit_lsn != remote_final_lsn)
@@ -1041,6 +1066,8 @@ apply_handle_begin_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData begin_data;
 
+	FinishMultiInserts();
+
 	/* Tablesync should never receive prepare. */
 	if (am_tablesync_worker())
 		ereport(ERROR,
@@ -1107,6 +1134,8 @@ apply_handle_prepare(StringInfo s)
 {
 	LogicalRepPreparedTxnData prepare_data;
 
+	FinishMultiInserts();
+
 	logicalrep_read_prepare(s, &prepare_data);
 
 	if (prepare_data.prepare_lsn != remote_final_lsn)
@@ -1179,6 +1208,8 @@ apply_handle_commit_prepared(StringInfo s)
 	LogicalRepCommitPreparedTxnData prepare_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_commit_prepared(s, &prepare_data);
 	set_apply_error_context_xact(prepare_data.xid, prepare_data.commit_lsn);
 
@@ -1228,6 +1259,8 @@ apply_handle_rollback_prepared(StringInfo s)
 	LogicalRepRollbackPreparedTxnData rollback_data;
 	char		gid[GIDSIZE];
 
+	FinishMultiInserts();
+
 	logicalrep_read_rollback_prepared(s, &rollback_data);
 	set_apply_error_context_xact(rollback_data.xid, rollback_data.rollback_end_lsn);
 
@@ -1290,6 +1323,8 @@ apply_handle_stream_prepare(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1317,6 +1352,8 @@ apply_handle_stream_prepare(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset,
 								   prepare_data.xid, prepare_data.prepare_lsn);
 
+			FinishMultiInserts();
+
 			/* Mark the transaction as prepared. */
 			apply_handle_prepare_internal(&prepare_data);
 
@@ -1428,6 +1465,8 @@ apply_handle_stream_prepare(StringInfo s)
 static void
 apply_handle_origin(StringInfo s)
 {
+	FinishMultiInserts();
+
 	/*
 	 * ORIGIN message can only come inside streaming transaction or inside
 	 * remote transaction and before any actual writes.
@@ -1494,6 +1533,8 @@ apply_handle_stream_start(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1649,6 +1690,8 @@ apply_handle_stream_stop(StringInfo s)
 	ParallelApplyWorkerInfo *winfo;
 	TransApplyAction apply_action;
 
+	FinishMultiInserts();
+
 	if (!in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -1842,6 +1885,8 @@ apply_handle_stream_abort(StringInfo s)
 	StringInfoData original_msg = *s;
 	bool		toplevel_xact;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2159,6 +2204,8 @@ apply_handle_stream_commit(StringInfo s)
 	/* Save the message before it is consumed. */
 	StringInfoData original_msg = *s;
 
+	FinishMultiInserts();
+
 	if (in_streamed_transaction)
 		ereport(ERROR,
 				(errcode(ERRCODE_PROTOCOL_VIOLATION),
@@ -2180,6 +2227,8 @@ apply_handle_stream_commit(StringInfo s)
 			apply_spooled_messages(MyLogicalRepWorker->stream_fileset, xid,
 								   commit_data.commit_lsn);
 
+			FinishMultiInserts();
+
 			apply_handle_commit_internal(&commit_data);
 
 			/* Unlink the files with serialized changes and subxact info. */
@@ -2323,6 +2372,8 @@ apply_handle_relation(StringInfo s)
 {
 	LogicalRepRelation *rel;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_RELATION, s))
 		return;
 
@@ -2346,6 +2397,8 @@ apply_handle_type(StringInfo s)
 {
 	LogicalRepTyp typ;
 
+	FinishMultiInserts();
+
 	if (handle_streamed_transaction(LOGICAL_REP_MSG_TYPE, s))
 		return;
 
@@ -2384,16 +2437,150 @@ TargetPrivilegesCheck(Relation rel, AclMode mode)
 						RelationGetRelationName(rel))));
 }
 
-/*
- * Handle INSERT message.
- */
+static void
+FinishMultiInserts(void)
+{
+	LogicalRepMsgType saved_command;
+
+	if (MultiInsertState == NULL)
+		return;
+
+	Assert(OidIsValid(LastMultiInsertRelId));
+	Assert(LastEData != NULL);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	ExecDropSingleTupleTableSlot(LastRemoteSlot);
+	LastRemoteSlot = NULL;
+
+	table_modify_end(MultiInsertState);
+	MultiInsertState = NULL;
+	LastMultiInsertRelId = InvalidOid;
+
+	pfree(modify_buffer_flush_context);
+	modify_buffer_flush_context = NULL;
+
+	ExecCloseIndices(LastEData->targetRelInfo);
+
+	finish_edata(LastEData);
+	LastEData = NULL;
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+
+	logicalrep_rel_close(LastRel, NoLock);
+	LastRel = NULL;
+
+	end_replication_step();
+}
 
 static void
-apply_handle_insert(StringInfo s)
+LRModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	LRModifyBufferFlushContext *ctx = (LRModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	LogicalRepMsgType saved_command;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = LastRel;
+
+	/* Set current command for error callback */
+	saved_command = apply_error_callback_arg.command;
+	apply_error_callback_arg.command = LOGICAL_REP_MSG_INSERT;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+		List	   *conflictindexes;
+		bool		conflict = false;
+
+		conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  conflictindexes ? true : false,
+								  &conflict,
+								  conflictindexes, false);
+
+		/*
+		 * Checks the conflict indexes to fetch the conflicting local tuple
+		 * and reports the conflict. We perform this check here, instead of
+		 * performing an additional index scan before the actual insertion and
+		 * reporting the conflict if any conflicting tuples are found. This is
+		 * to avoid the overhead of executing the extra scan for each INSERT
+		 * operation, even when no conflict arises, which could introduce
+		 * significant overhead to replication, particularly in cases where
+		 * conflicts are rare.
+		 *
+		 * XXX OTOH, this could lead to clean-up effort for dead tuples added
+		 * in heap and index in case of conflicts. But as conflicts shouldn't
+		 * be a frequent thing so we preferred to save the performance
+		 * overhead of extra scan before each insertion.
+		 */
+		if (conflict)
+			CheckAndReportConflict(resultRelInfo, estate, CT_INSERT_EXISTS,
+								   recheckIndexes, NULL, slot);
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 NULL);
+
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 NULL);
+	}
+
+	/*
+	 * XXX we should in theory pass a TransitionCaptureState object to the
+	 * above to capture transition tuples, but after statement triggers don't
+	 * actually get fired by replication yet anyway
+	 */
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	/* Reset the current command */
+	apply_error_callback_arg.command = saved_command;
+}
+
+static LRMultiInsertReturnStatus
+do_multi_inserts(StringInfo s, LogicalRepRelId *relid)
 {
 	LogicalRepRelMapEntry *rel;
 	LogicalRepTupleData newtup;
-	LogicalRepRelId relid;
 	UserContext ucxt;
 	ApplyExecutionData *edata;
 	EState	   *estate;
@@ -2401,17 +2588,143 @@ apply_handle_insert(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	if (MultiInsertState == NULL)
+		begin_replication_step();
+
+	*relid = logicalrep_read_relid(s);
+
+	if (MultiInsertState != NULL &&
+		(LastMultiInsertRelId != InvalidOid &&
+		 *relid != InvalidOid &&
+		 LastMultiInsertRelId != *relid))
+		FinishMultiInserts();
+
+	if (MultiInsertState == NULL)
+		rel = logicalrep_rel_open(*relid, RowExclusiveLock);
+	else
+		rel = LastRel;
+
+	if (!should_apply_changes_for_rel(rel))
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_REL_SKIPPED;
+	}
+
+	/* For a partitioned table, let's not do multi inserts. */
+	if (rel->localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+	{
+		Assert(MultiInsertState == NULL);
+
+		/*
+		 * The relation can't become interesting in the middle of the
+		 * transaction so it's safe to unlock it.
+		 */
+		logicalrep_rel_close(rel, RowExclusiveLock);
+		end_replication_step();
+		return LR_MULTI_INSERT_DISALLOWED;
+	}
+
 	/*
-	 * Quick return if we are skipping data modification changes or handling
-	 * streamed transactions.
+	 * Make sure that any user-supplied code runs as the table owner, unless
+	 * the user has opted out of that behavior.
 	 */
-	if (is_skipping_changes() ||
-		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
-		return;
+	run_as_owner = MySubscription->runasowner;
+	if (!run_as_owner)
+		SwitchToUntrustedUser(rel->localrel->rd_rel->relowner, &ucxt);
+
+	/* Set relation for error callback */
+	apply_error_callback_arg.rel = rel;
+
+	if (MultiInsertState == NULL)
+	{
+		oldctx = MemoryContextSwitchTo(TopTransactionContext);
+
+		/* Initialize the executor state. */
+		LastEData = edata = create_edata_for_relation(rel);
+		estate = edata->estate;
+
+		LastRemoteSlot = remoteslot = MakeTupleTableSlot(RelationGetDescr(rel->localrel),
+														 &TTSOpsVirtual);
+
+		modify_buffer_flush_context = (LRModifyBufferFlushContext *) palloc(sizeof(LRModifyBufferFlushContext));
+		modify_buffer_flush_context->resultRelInfo = edata->targetRelInfo;
+		modify_buffer_flush_context->estate = estate;
+
+		MultiInsertState = table_modify_begin(edata->targetRelInfo->ri_RelationDesc,
+											  TM_FLAG_BAS_BULKWRITE,
+											  GetCurrentCommandId(true),
+											  0,
+											  LRModifyBufferFlushCallback,
+											  modify_buffer_flush_context);
+		LastRel = rel;
+		LastMultiInsertRelId = *relid;
+
+		/* We must open indexes here. */
+		ExecOpenIndices(edata->targetRelInfo, true);
+		InitConflictIndexes(edata->targetRelInfo);
+
+		MemoryContextSwitchTo(oldctx);
+	}
+	else
+	{
+		CommandId	cid;
+
+		edata = LastEData;
+		estate = edata->estate;
+		ResetExprContext(GetPerTupleExprContext(estate));
+		ExecClearTuple(LastRemoteSlot);
+		remoteslot = LastRemoteSlot;
+		cid = GetCurrentCommandId(true);
+		MultiInsertState->cid = cid;
+		estate->es_output_cid = cid;
+	}
+
+	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
+	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+	slot_store_data(remoteslot, rel, &newtup);
+	slot_fill_defaults(rel, estate, remoteslot);
+	MemoryContextSwitchTo(oldctx);
+
+	TargetPrivilegesCheck(edata->targetRelInfo->ri_RelationDesc, ACL_INSERT);
+	ExecRelationMultiInsert(MultiInsertState, edata->targetRelInfo, estate, remoteslot);
+
+	/* Reset relation for error callback */
+	apply_error_callback_arg.rel = NULL;
+
+	if (!run_as_owner)
+		RestoreUserContext(&ucxt);
+
+	Assert(MultiInsertState != NULL);
+
+	CommandCounterIncrement();
+
+	return LR_MULTI_INSERT_DONE;
+}
+
+static bool
+do_single_inserts(StringInfo s, LogicalRepRelId relid)
+{
+	LogicalRepRelMapEntry *rel;
+	LogicalRepTupleData newtup;
+	UserContext ucxt;
+	ApplyExecutionData *edata;
+	EState	   *estate;
+	TupleTableSlot *remoteslot;
+	MemoryContext oldctx;
+	bool		run_as_owner;
+
+	Assert(relid != InvalidOid);
 
 	begin_replication_step();
 
-	relid = logicalrep_read_insert(s, &newtup);
 	rel = logicalrep_rel_open(relid, RowExclusiveLock);
 	if (!should_apply_changes_for_rel(rel))
 	{
@@ -2421,7 +2734,7 @@ apply_handle_insert(StringInfo s)
 		 */
 		logicalrep_rel_close(rel, RowExclusiveLock);
 		end_replication_step();
-		return;
+		return false;
 	}
 
 	/*
@@ -2443,6 +2756,7 @@ apply_handle_insert(StringInfo s)
 										&TTSOpsVirtual);
 
 	/* Process and store remote tuple in the slot */
+	logicalrep_read_insert_v2(s, &newtup);
 	oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
 	slot_store_data(remoteslot, rel, &newtup);
 	slot_fill_defaults(rel, estate, remoteslot);
@@ -2467,6 +2781,35 @@ apply_handle_insert(StringInfo s)
 	logicalrep_rel_close(rel, NoLock);
 
 	end_replication_step();
+
+	return true;
+}
+
+/*
+ * Handle INSERT message.
+ */
+static void
+apply_handle_insert(StringInfo s)
+{
+	LRMultiInsertReturnStatus mi_status;
+	LogicalRepRelId relid;
+
+	/*
+	 * Quick return if we are skipping data modification changes or handling
+	 * streamed transactions.
+	 */
+	if (is_skipping_changes() ||
+		handle_streamed_transaction(LOGICAL_REP_MSG_INSERT, s))
+		return;
+
+	mi_status = do_multi_inserts(s, &relid);
+	if (mi_status == LR_MULTI_INSERT_REL_SKIPPED ||
+		mi_status == LR_MULTI_INSERT_DONE)
+		return;
+
+	do_single_inserts(s, relid);
+
+	return;
 }
 
 /*
@@ -2554,6 +2897,8 @@ apply_handle_update(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -2761,6 +3106,8 @@ apply_handle_delete(StringInfo s)
 	MemoryContext oldctx;
 	bool		run_as_owner;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
@@ -3245,6 +3592,8 @@ apply_handle_truncate(StringInfo s)
 	ListCell   *lc;
 	LOCKMODE	lockmode = AccessExclusiveLock;
 
+	FinishMultiInserts();
+
 	/*
 	 * Quick return if we are skipping data modification changes or handling
 	 * streamed transactions.
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 69c3ebff00..17b2e42683 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -14,6 +14,7 @@
 #ifndef EXECUTOR_H
 #define EXECUTOR_H
 
+#include "access/tableam.h"
 #include "executor/execdesc.h"
 #include "fmgr.h"
 #include "nodes/lockoptions.h"
@@ -668,6 +669,9 @@ extern bool RelationFindReplTupleSeq(Relation rel, LockTupleMode lockmode,
 
 extern void ExecSimpleRelationInsert(ResultRelInfo *resultRelInfo,
 									 EState *estate, TupleTableSlot *slot);
+extern void ExecRelationMultiInsert(TableModifyState *MultiInsertState,
+									ResultRelInfo *resultRelInfo,
+									EState *estate, TupleTableSlot *slot);
 extern void ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
 									 EState *estate, EPQState *epqstate,
 									 TupleTableSlot *searchslot, TupleTableSlot *slot);
diff --git a/src/include/replication/conflict.h b/src/include/replication/conflict.h
index 02cb84da7e..3b7c910b03 100644
--- a/src/include/replication/conflict.h
+++ b/src/include/replication/conflict.h
@@ -55,4 +55,10 @@ extern void ReportApplyConflict(EState *estate, ResultRelInfo *relinfo,
 								RepOriginId localorigin, TimestampTz localts);
 extern void InitConflictIndexes(ResultRelInfo *relInfo);
 
+extern void CheckAndReportConflict(ResultRelInfo *resultRelInfo,
+								   EState *estate, ConflictType type,
+								   List *recheckIndexes,
+								   TupleTableSlot *searchslot,
+								   TupleTableSlot *remoteslot);
+
 #endif
diff --git a/src/include/replication/logicalproto.h b/src/include/replication/logicalproto.h
index c409638a2e..3f3a7f0a31 100644
--- a/src/include/replication/logicalproto.h
+++ b/src/include/replication/logicalproto.h
@@ -226,6 +226,8 @@ extern void logicalrep_write_insert(StringInfo out, TransactionId xid,
 									Relation rel,
 									TupleTableSlot *newslot,
 									bool binary, Bitmapset *columns);
+extern LogicalRepRelId logicalrep_read_relid(StringInfo in);
+extern void logicalrep_read_insert_v2(StringInfo in, LogicalRepTupleData *newtup);
 extern LogicalRepRelId logicalrep_read_insert(StringInfo in, LogicalRepTupleData *newtup);
 extern void logicalrep_write_update(StringInfo out, TransactionId xid,
 									Relation rel,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index de5699e078..07a61d086d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1475,6 +1475,8 @@ LPTHREAD_START_ROUTINE
 LPTSTR
 LPVOID
 LPWSTR
+LRModifyBufferFlushContext
+LRMultiInsertReturnStatus
 LSEG
 LUID
 LVRelState
-- 
2.43.0



  [application/x-patch] v23-0001-Introduce-new-Table-AM-for-multi-inserts.patch (15.2K, 5-v23-0001-Introduce-new-Table-AM-for-multi-inserts.patch)
  download | inline diff:
From 9846d4e9d845d25c9b57a054b42803019e41c0e5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 26 Aug 2024 04:45:47 +0000
Subject: [PATCH v23 1/5] Introduce new Table AM for multi inserts

Until now, it's the COPY ... FROM command using multi inserts
(i.e. buffer some tuples and inserts them to table at once).
Various other commands can benefit from this multi insert
logic [Reusable].

Also, there's a need to have these multi insert AMs
(Access Methods) as scan-like API [Usability].

Also, there's a need allow various table AMs define their own
buffering and flushing strategy [Flexibility].

This commit introduces, new table AMs for multi inserts to help
achieve all of the above.

Upcoming commits will have these new table AMs being used for
various other commands.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/[email protected]
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/access/heap/heapam.c         | 197 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableamapi.c    |   5 +
 src/include/access/heapam.h              |  38 +++++
 src/include/access/tableam.h             |  80 +++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 6 files changed, 328 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 91b20147a0..86d60e476b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -65,6 +65,7 @@
 #include "utils/datum.h"
 #include "utils/injection_point.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -113,7 +114,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end_callback(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2612,6 +2613,200 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel, int modify_flags,
+				  CommandId cid, int options,
+				  TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				  void *modify_buffer_flush_context)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->modify_flags = modify_flags;
+	state->mem_cxt = context;
+	state->cid = cid;
+	state->options = options;
+	state->modify_buffer_flush_callback = modify_buffer_flush_callback;
+	state->modify_buffer_flush_context = modify_buffer_flush_context;
+	state->modify_end_callback = NULL;	/* To be installed lazily */
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_cxt);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+		mistate = (HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = (TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+		istate->mistate = mistate;
+		mistate->mem_cxt = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		if ((state->modify_flags & TM_FLAG_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+
+		state->modify_end_callback = heap_modify_insert_end_callback;
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	dstslot = mistate->slots[mistate->cur_slots];
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	Assert(TTS_IS_VIRTUAL(dstslot));
+
+	/*
+	 * Note that the copy clears the previous destination slot contents, so
+	 * there's no need of explicit ExecClearTuple here.
+	 */
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	/* Quick exit if we have flushed already */
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert may leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_cxt);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, istate->bistate);
+
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_cxt);
+
+	if (state->modify_buffer_flush_callback != NULL)
+	{
+		for (int i = 0; i < mistate->cur_slots; i++)
+			state->modify_buffer_flush_callback(state->modify_buffer_flush_context,
+												mistate->slots[i]);
+	}
+
+	mistate->cur_slots = 0;
+}
+
+/*
+ * Heap insert specific callback used for performing work at the end like
+ * flushing buffered tuples if any, cleaning up the insert state and buffered
+ * slots.
+ */
+static void
+heap_modify_insert_end_callback(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_cxt);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	if (state->modify_end_callback != NULL)
+		state->modify_end_callback(state);
+
+	MemoryContextDelete(state->mem_cxt);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 1c6da286d4..3cacfdf871 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2611,6 +2611,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index e9b598256f..772f29b1b5 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,11 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	Assert(routine->tuple_modify_begin != NULL);
+	Assert(routine->tuple_modify_buffer_insert != NULL);
+	Assert(routine->tuple_modify_buffer_flush != NULL);
+	Assert(routine->tuple_modify_end != NULL);
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9e9aec88a6..8c44a7808d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -271,6 +271,32 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	MemoryContext mem_cxt;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -321,6 +347,18 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   int modify_flags,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+										   void *modify_buffer_flush_context);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..083d9ac820 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -255,6 +255,39 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Table modify flags */
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TM_FLAG_BAS_BULKWRITE	0x000001
+
+struct TableModifyState;
+
+/* Callback invoked for each tuple that gets flushed to disk from buffer */
+typedef void (*TableModifyBufferFlushCallback) (void *context,
+												TupleTableSlot *slot);
+
+/* Table AM specific callback that gets called in table_modify_end() */
+typedef void (*TableModifyEndCallback) (struct TableModifyState *state);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	Relation	rel;
+	int			modify_flags;
+	MemoryContext mem_cxt;
+	CommandId	cid;
+	int			options;
+
+	/* Flush callback and its context */
+	TableModifyBufferFlushCallback modify_buffer_flush_callback;
+	void	   *modify_buffer_flush_context;
+
+	/* Table AM specific data */
+	void	   *data;
+
+	TableModifyEndCallback modify_end_callback;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -578,6 +611,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 int modify_flags,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCallback modify_buffer_flush_callback,
+											 void *modify_buffer_flush_context);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1599,6 +1647,38 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+static inline TableModifyState *
+table_modify_begin(Relation rel, int modify_flags, CommandId cid, int options,
+				   TableModifyBufferFlushCallback modify_buffer_flush_callback,
+				   void *modify_buffer_flush_context)
+{
+	return rel->rd_tableam->tuple_modify_begin(rel, modify_flags,
+											   cid, options,
+											   modify_buffer_flush_callback,
+											   modify_buffer_flush_context);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_end(state);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9e951a9e6f..538132e6f4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1145,6 +1145,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2869,6 +2871,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.43.0



  [application/x-patch] v23-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch (9.6K, 6-v23-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch)
  download | inline diff:
From 479a15eebfac29f6cb87ba9e603a6179fb850f9e Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 26 Aug 2024 04:47:46 +0000
Subject: [PATCH v23 3/5] Optimize INSERT INTO SELECT with new multi insert
 table AM

This commit optimizes the INSERT INTO SELECT query for heap AM
using new multi insert table AM added by commit <<CHANGE_ME>>.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/executor/nodeModifyTable.c | 170 ++++++++++++++++++++++---
 src/tools/pgindent/typedefs.list       |   1 +
 2 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 8bf4c80d4a..03dd372227 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -123,6 +123,18 @@ typedef struct UpdateContext
 	LockTupleMode lockmode;
 } UpdateContext;
 
+typedef struct InsertModifyBufferFlushContext
+{
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+	ModifyTableState *mtstate;
+} InsertModifyBufferFlushContext;
+
+static InsertModifyBufferFlushContext *insert_modify_buffer_flush_context = NULL;
+static TableModifyState *table_modify_state = NULL;
+
+static void InsertModifyBufferFlushCallback(void *context,
+											TupleTableSlot *slot);
 
 static void ExecBatchInsert(ModifyTableState *mtstate,
 							ResultRelInfo *resultRelInfo,
@@ -735,6 +747,55 @@ ExecGetUpdateNewTuple(ResultRelInfo *relinfo,
 	return ExecProject(newProj);
 }
 
+static void
+InsertModifyBufferFlushCallback(void *context, TupleTableSlot *slot)
+{
+	InsertModifyBufferFlushContext *ctx = (InsertModifyBufferFlushContext *) context;
+	ResultRelInfo *resultRelInfo = ctx->resultRelInfo;
+	EState	   *estate = ctx->estate;
+	ModifyTableState *mtstate = ctx->mtstate;
+
+	/* Quick exit if no indexes or no triggers */
+	if (!(resultRelInfo->ri_NumIndices > 0 ||
+		  (resultRelInfo->ri_TrigDesc != NULL &&
+		   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			resultRelInfo->ri_TrigDesc->trig_insert_new_table))))
+		return;
+
+	/* Caller must take care of opening and closing the indices */
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 mtstate->mt_transition_capture);
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 mtstate->mt_transition_capture);
+	}
+}
+
 /* ----------------------------------------------------------------
  *		ExecInsert
  *
@@ -760,7 +821,8 @@ ExecInsert(ModifyTableContext *context,
 		   TupleTableSlot *slot,
 		   bool canSetTag,
 		   TupleTableSlot **inserted_tuple,
-		   ResultRelInfo **insert_destrel)
+		   ResultRelInfo **insert_destrel,
+		   bool canMultiInsert)
 {
 	ModifyTableState *mtstate = context->mtstate;
 	EState	   *estate = context->estate;
@@ -773,6 +835,7 @@ ExecInsert(ModifyTableContext *context,
 	OnConflictAction onconflict = node->onConflictAction;
 	PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
 	MemoryContext oldContext;
+	bool		ar_insert_triggers_executed = false;
 
 	/*
 	 * If the input result relation is a partitioned table, find the leaf
@@ -1138,17 +1201,53 @@ ExecInsert(ModifyTableContext *context,
 		}
 		else
 		{
-			/* insert the tuple normally */
-			table_tuple_insert(resultRelationDesc, slot,
-							   estate->es_output_cid,
-							   0, NULL);
+			if (canMultiInsert &&
+				proute == NULL &&
+				resultRelInfo->ri_WithCheckOptions == NIL &&
+				resultRelInfo->ri_projectReturning == NULL)
+			{
+				if (insert_modify_buffer_flush_context == NULL)
+				{
+					insert_modify_buffer_flush_context =
+						(InsertModifyBufferFlushContext *) palloc0(sizeof(InsertModifyBufferFlushContext));
+					insert_modify_buffer_flush_context->resultRelInfo = resultRelInfo;
+					insert_modify_buffer_flush_context->estate = estate;
+					insert_modify_buffer_flush_context->mtstate = mtstate;
+				}
 
-			/* insert index entries for tuple */
-			if (resultRelInfo->ri_NumIndices > 0)
-				recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
-													   slot, estate, false,
-													   false, NULL, NIL,
-													   false);
+				if (table_modify_state == NULL)
+				{
+					table_modify_state = table_modify_begin(resultRelInfo->ri_RelationDesc,
+															0,
+															estate->es_output_cid,
+															0,
+															InsertModifyBufferFlushCallback,
+															insert_modify_buffer_flush_context);
+				}
+
+				table_modify_buffer_insert(table_modify_state, slot);
+				ar_insert_triggers_executed = true;
+			}
+			else
+			{
+				/* insert the tuple normally */
+				table_tuple_insert(resultRelationDesc, slot,
+								   estate->es_output_cid,
+								   0, NULL);
+
+				/* insert index entries for tuple */
+				if (resultRelInfo->ri_NumIndices > 0)
+					recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
+														   slot, estate, false,
+														   false, NULL, NIL,
+														   false);
+
+				ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+									 mtstate->mt_transition_capture);
+
+				list_free(recheckIndexes);
+				ar_insert_triggers_executed = true;
+			}
 		}
 	}
 
@@ -1182,10 +1281,12 @@ ExecInsert(ModifyTableContext *context,
 	}
 
 	/* AFTER ROW INSERT Triggers */
-	ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
-						 ar_insert_trig_tcs);
-
-	list_free(recheckIndexes);
+	if (!ar_insert_triggers_executed)
+	{
+		ExecARInsertTriggers(estate, resultRelInfo, slot, recheckIndexes,
+							 ar_insert_trig_tcs);
+		list_free(recheckIndexes);
+	}
 
 	/*
 	 * Check any WITH CHECK OPTION constraints from parent views.  We are
@@ -1881,7 +1982,7 @@ ExecCrossPartitionUpdate(ModifyTableContext *context,
 	/* Tuple routing starts from the root table. */
 	context->cpUpdateReturningSlot =
 		ExecInsert(context, mtstate->rootResultRelInfo, slot, canSetTag,
-				   inserted_tuple, insert_destrel);
+				   inserted_tuple, insert_destrel, false);
 
 	/*
 	 * Reset the transition state that may possibly have been written by
@@ -3385,7 +3486,7 @@ ExecMergeNotMatched(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
 				mtstate->mt_merge_action = action;
 
 				rslot = ExecInsert(context, mtstate->rootResultRelInfo,
-								   newslot, canSetTag, NULL, NULL);
+								   newslot, canSetTag, NULL, NULL, false);
 				mtstate->mt_merge_inserted += 1;
 				break;
 			case CMD_NOTHING:
@@ -3770,6 +3871,10 @@ ExecModifyTable(PlanState *pstate)
 	HeapTupleData oldtupdata;
 	HeapTuple	oldtuple;
 	ItemPointer tupleid;
+	bool		canMultiInsert = false;
+
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -3865,6 +3970,10 @@ ExecModifyTable(PlanState *pstate)
 		if (TupIsNull(context.planSlot))
 			break;
 
+		if (operation == CMD_INSERT &&
+			nodeTag(subplanstate) == T_SeqScanState)
+			canMultiInsert = true;
+
 		/*
 		 * When there are multiple result relations, each tuple contains a
 		 * junk column that gives the OID of the rel from which it came.
@@ -4078,7 +4187,7 @@ ExecModifyTable(PlanState *pstate)
 					ExecInitInsertProjection(node, resultRelInfo);
 				slot = ExecGetInsertNewTuple(resultRelInfo, context.planSlot);
 				slot = ExecInsert(&context, resultRelInfo, slot,
-								  node->canSetTag, NULL, NULL);
+								  node->canSetTag, NULL, NULL, canMultiInsert);
 				break;
 
 			case CMD_UPDATE:
@@ -4137,6 +4246,17 @@ ExecModifyTable(PlanState *pstate)
 			return slot;
 	}
 
+	if (table_modify_state != NULL)
+	{
+		Assert(operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Insert remaining tuples for batch insert.
 	 */
@@ -4249,6 +4369,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	mtstate->mt_merge_updated = 0;
 	mtstate->mt_merge_deleted = 0;
 
+	table_modify_state = NULL;
+	insert_modify_buffer_flush_context = NULL;
+
 	/*----------
 	 * Resolve the target relation. This is the same as:
 	 *
@@ -4702,6 +4825,17 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;
 
+	if (table_modify_state != NULL)
+	{
+		Assert(node->operation == CMD_INSERT);
+
+		table_modify_end(table_modify_state);
+		table_modify_state = NULL;
+
+		pfree(insert_modify_buffer_flush_context);
+		insert_modify_buffer_flush_context = NULL;
+	}
+
 	/*
 	 * Allow any FDWs to shut down
 	 */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 538132e6f4..de5699e078 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1184,6 +1184,7 @@ ImportForeignSchema_function
 ImportQual
 InProgressEnt
 InProgressIO
+InsertModifyBufferFlushContext
 IncludeWal
 InclusionOpaque
 IncrementVarSublevelsUp_context
-- 
2.43.0



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-08-26 21:18                               ` Jeff Davis <[email protected]>
  2024-08-26 21:59                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Matthias van de Meent <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  1 sibling, 2 replies; 30+ messages in thread

From: Jeff Davis @ 2024-08-26 21:18 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <[email protected]> wrote:
> > 
> > Please find the v22 patches with the above changes.
> 
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.

Thank you.

0001 API design:

* Remove TableModifyState.modify_end_callback.

* This patch means that we will either remove or deprecate
TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
strong opinions about maintaining support for multi-insert, or should
we just remove it outright and force any new AMs to implement the new
APIs to maintain COPY performance?

* Why do we need a separate "modify_flags" and "options"? Can't we just
combine them into TABLE_MODIFY_* flags?

Alexander, you had some work in this area as well, such b1484a3f19. I
believe 0001 covers this use case in a different way: rather than
giving complete responsibility to the AM to insert into the indexes,
the caller provides a callback and the AM is responsible for calling it
at the time the tuples are flushed. Is that right?

The design has been out for a while, so unless others have suggestions,
I'm considering the major design points mostly settled and I will move
forward with something like 0001 (pending implementation issues).

Note: I believe this API will extend naturally to updates and deletes,
as well.

0001 implementation issues:

* We need default implementations for AMs that don't implement the new
APIs, so that the AM will still function even if it only defines the
single-tuple APIs. If we need to make use of the AM's multi_insert
method (I'm not sure we do), then the default methods would need to
handle that as well. (I thought a previous version had these default
implementations -- is there a reason they were removed?)

* I am confused about how the heap implementation manages state and
resets it. mistate->mem_cxt is initialized to a new memory context in
heap_modify_begin, and then re-initialized to another new memory
context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also
used as a temp context for executing heap_multi_insert, and it gets
reset before calling the flush callback, which still needs the slots.

* Why materialize the slot at copyfrom.c:1308 if the slot is going to
be copied anyway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?

* After correcting the memory issues, can you get updated performance
numbers for COPY?

Regards,
	Jeff Davis

^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-08-26 21:59                                 ` Matthias van de Meent <[email protected]>
  2024-08-27 05:42                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  1 sibling, 1 reply; 30+ messages in thread

From: Matthias van de Meent @ 2024-08-26 21:59 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, 26 Aug 2024 at 23:18, Jeff Davis <[email protected]> wrote:
>
> On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> > On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> > <[email protected]> wrote:
> > >
> > > Please find the v22 patches with the above changes.
> >
> > Please find the v23 patches after rebasing 0005 and adapting 0004 for
> > 9758174e2e.
>
>
> Thank you.
>
> 0001 API design:
>
> * Remove TableModifyState.modify_end_callback.
>
> * This patch means that we will either remove or deprecate
> TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
> strong opinions about maintaining support for multi-insert, or should
> we just remove it outright and force any new AMs to implement the new
> APIs to maintain COPY performance?

I don't think there is a significant enough difference in the
capabilities and requirements between the two APIs as currently
designed that removal of the old API would mean a significant
difference in capabilities. Maybe we could supply an equivalent API
shim to help the transition, but I don't think we should keep the old
API around in the TableAM.

> * Why do we need a separate "modify_flags" and "options"? Can't we just
> combine them into TABLE_MODIFY_* flags?
>
>
> Alexander, you had some work in this area as well, such b1484a3f19. I
> believe 0001 covers this use case in a different way: rather than
> giving complete responsibility to the AM to insert into the indexes,
> the caller provides a callback and the AM is responsible for calling it
> at the time the tuples are flushed. Is that right?
>
> The design has been out for a while, so unless others have suggestions,
> I'm considering the major design points mostly settled and I will move
> forward with something like 0001 (pending implementation issues).

Sorry about this late feedback, but while I'm generally +1 on the idea
and primary design, I feel that it doesn't quite cover all the areas
I'd expected it to cover.

Specifically, I'm having trouble seeing how this could be used to
implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
returning output path for the newly inserted tuples' data, which is
usually required for our execution nodes' output path. Is support for
RETURN-clauses planned for this API? In a previous iteration, the
flush operation was capable of returning a TTS, but that seems to have
been dropped, and I can't quite figure out why.

> Note: I believe this API will extend naturally to updates and deletes,
> as well.

I have the same concern about UPDATE ... RETURNING not fitting with
this callback-based design.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)






^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-26 21:59                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Matthias van de Meent <[email protected]>
@ 2024-08-27 05:42                                   ` Jeff Davis <[email protected]>
  0 siblings, 0 replies; 30+ messages in thread

From: Jeff Davis @ 2024-08-27 05:42 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, 2024-08-26 at 23:59 +0200, Matthias van de Meent wrote:
> Specifically, I'm having trouble seeing how this could be used to
> implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
> returning output path for the newly inserted tuples' data, which is
> usually required for our execution nodes' output path. Is support for
> RETURN-clauses planned for this API? In a previous iteration, the
> flush operation was capable of returning a TTS, but that seems to
> have
> been dropped, and I can't quite figure out why.

I'm not sure where that was lost, but I suspect when we changed
flushing to use a callback. I didn't get to v23-0003 yet, but I think
you're right that the current flushing mechanism isn't right for
returning tuples. Thank you.

One solution: when the buffer is flushed, we can return an iterator
over the buffered tuples to the caller. The caller can then use the
iterator to insert into indexes, return a tuple to the executor, etc.,
and then release the iterator when done (freeing the buffer). That
control flow is less convenient for most callers, though, so perhaps
that should be optional?

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-08-27 21:43                                 ` Jeff Davis <[email protected]>
  2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-08-27 21:43 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> 0001 implementation issues:
> 
> * We need default implementations for AMs that don't implement the
> new
> APIs, so that the AM will still function even if it only defines the
> single-tuple APIs. If we need to make use of the AM's multi_insert
> method (I'm not sure we do), then the default methods would need to
> handle that as well. (I thought a previous version had these default
> implementations -- is there a reason they were removed?)

On second thought, it would be easier to just have the caller check
whether the AM supports the multi-insert path; and if not, fall back to
the single-tuple path. The single-tuple path is needed anyway for cases
like before-row triggers.

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-08-29 07:25                                   ` Bharath Rupireddy <[email protected]>
  2024-08-29 19:29                                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-08-29 07:25 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Wed, Aug 28, 2024 at 3:14 AM Jeff Davis <[email protected]> wrote:
>
> On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> > 0001 implementation issues:
> >
> > * We need default implementations for AMs that don't implement the
> > new
> > APIs, so that the AM will still function even if it only defines the
> > single-tuple APIs. If we need to make use of the AM's multi_insert
> > method (I'm not sure we do), then the default methods would need to
> > handle that as well. (I thought a previous version had these default
> > implementations -- is there a reason they were removed?)
>
> On second thought, it would be easier to just have the caller check
> whether the AM supports the multi-insert path; and if not, fall back to
> the single-tuple path. The single-tuple path is needed anyway for cases
> like before-row triggers.

Up until v21, the default implementation existed, see
https://www.postgresql.org/message-id/CALj2ACX90L5Mb5Vv%3DjsvhOdZ8BVsfpZf-CdCGhtm2N%2BbGUCSjg%40mail....
I then removed it in v22 to keep the code simple.

IMO, every caller branching out in the code like if (rel->rd_tableam->
tuple_modify_buffer_insert != NULL) then multi insert; else single
insert; doesn't look good. IMO, the default implementation approach
keeps things simple which eventually can be removed in *near* future.
Thoughts?

One change in the default implementation I would do from that of v21
is to assign the default AMs in GetTableAmRoutine() itself to avoid if
.. else if .. else in the table_modify_XXX().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-08-29 19:29                                     ` Jeff Davis <[email protected]>
  2024-10-26 13:30                                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Jeff Davis @ 2024-08-29 19:29 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote:
> IMO, every caller branching out in the code like if (rel->rd_tableam-
> >
> tuple_modify_buffer_insert != NULL) then multi insert; else single
> insert; doesn't look good. IMO, the default implementation approach
> keeps things simple which eventually can be removed in *near* future.
> Thoughts?

I believe we need the branching in the caller anyway:

1. If there is a BEFORE row trigger with a volatile function, the
visibility rules[1] mean that the function should see changes from all
the rows inserted so far this command, which won't work if they are
still in the buffer.

2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
that the function should see all previous rows inserted.

3. If there are volatile functions in the target list or WHERE clause,
the same visibility semantics apply.

4. If there's a "RETURNING ctid" clause, we need to either come up with
a way to return the tuples after flushing, or we need to use the
single-tuple path. (Similarly in the future when we support UPDATE ...
RETURNING, as Matthias pointed out.)

If we need two paths in each caller anyway, it seems cleaner to just
wrap the check for tuple_modify_buffer_insert in
table_modify_buffer_enabled().

We could perhaps use a one path and then force a batch size of one or
something, which is an alternative, but we have to be careful not to
introduce a regression (and it still requires a solution for #4).

Regards,
	Jeff Davis

[1] 
https://www.postgresql.org/docs/devel/trigger-datachanges.html







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-29 19:29                                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
@ 2024-10-26 13:30                                       ` Bharath Rupireddy <[email protected]>
  2024-10-28 08:43                                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jingtang Zhang <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Bharath Rupireddy @ 2024-10-26 13:30 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

Hi,

Thanks for looking into this.

On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis <[email protected]> wrote:
>
> I believe we need the branching in the caller anyway:
>
> 1. If there is a BEFORE row trigger with a volatile function, the
> visibility rules[1] mean that the function should see changes from all
> the rows inserted so far this command, which won't work if they are
> still in the buffer.
>
> 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> that the function should see all previous rows inserted.
>
> 3. If there are volatile functions in the target list or WHERE clause,
> the same visibility semantics apply.
>
> 4. If there's a "RETURNING ctid" clause, we need to either come up with
> a way to return the tuples after flushing, or we need to use the
> single-tuple path. (Similarly in the future when we support UPDATE ...
> RETURNING, as Matthias pointed out.)
>
> If we need two paths in each caller anyway, it seems cleaner to just
> wrap the check for tuple_modify_buffer_insert in
> table_modify_buffer_enabled().
>
> We could perhaps use a one path and then force a batch size of one or
> something, which is an alternative, but we have to be careful not to
> introduce a regression (and it still requires a solution for #4).

I chose to branch in the caller e.g. if there's a volatile function
SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
table_tuple_insert() path, else multi-insert path.

I am posting the new v24 patch set organized as follows: 0001
introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
using the new table AM for COPY ... FROM. I, for now, discarded the
INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
is to take the basic stuff forward.

I reworked structure names, members and function names, reworded
comments, addressed review comments in the v24 patches. Please have a
look.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v24-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch (11.1K, 2-v24-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch)
  download | inline diff:
From 096df7a758b0b7cf00b99be9e4ffadf15ceef535 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 26 Oct 2024 12:35:51 +0000
Subject: [PATCH v24 2/4] Optimize CTAS/CMV/RMV with new multi-inserts table AM

This commit optimizes the following commands for heap AM using new
multi-inserts table AM added by commit <<CHANGE_ME>>:
- CREATE TABLE AS
- CREATE MATERIALIZED VIEW
- REFRESH MATERIALIZED VIEW

Testing shows that performance of CTAS, CMV, RMV is improved by
<<TO_FILL>> respectively on <<TO_FILL>> system.
f
Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/[email protected]
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/createas.c |  60 ++++++++++++++----
 src/backend/commands/matview.c  | 106 +++++++++++++++++++++++++++++---
 src/include/commands/matview.h  |   3 +
 3 files changed, 147 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 68ec122dbf..0affadf404 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -38,6 +38,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "optimizer/optimizer.h"
 #include "rewrite/rewriteHandler.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -56,6 +57,12 @@ typedef struct
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
 	BulkInsertState bistate;	/* bulk insert state */
+
+	/* Table modify state. NULL if multi-inserts isn't supported. */
+	TableModifyState *mstate;
+
+	/* True if SELECT query contains volatile functions */
+	bool		volatile_funcs;
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -313,6 +320,10 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
 		plan = pg_plan_query(query, pstate->p_sourcetext,
 							 CURSOR_OPT_PARALLEL_OK, params);
 
+		/* Check if the SELECT query has any volatile functions */
+		((DR_intorel *) dest)->volatile_funcs =
+			contain_volatile_functions_after_planning((Expr *) query);
+
 		/*
 		 * Use a snapshot with an updated command ID to ensure this query sees
 		 * results of any previously executed queries.  (This could only
@@ -548,16 +559,32 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
 	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
+	myState->ti_options = TABLE_INSERT_SKIP_FSM |
+		TABLE_INSERT_BAS_BULKWRITE;
+	myState->mstate = NULL;
+	myState->bistate = NULL;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * multi or bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
-	else
-		myState->bistate = NULL;
+	{
+		if (TableModifyIsMultiInsertsSupported(myState->rel,
+											   myState->volatile_funcs))
+		{
+			myState->mstate = table_modify_begin(myState->rel,
+												 myState->output_cid,
+												 myState->ti_options,
+												 NULL,	/* Multi-insert buffer
+														 * flush callback */
+												 NULL); /* Multi-insert buffer
+														 * flush callback
+														 * context */
+		}
+		else
+			myState->bistate = GetBulkInsertState();
+	}
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -585,11 +612,15 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+
+		if (myState->mstate != NULL)
+			table_modify_buffer_insert(myState->mstate, slot);
+		else
+			table_tuple_insert(myState->rel,
+							   slot,
+							   myState->output_cid,
+							   myState->ti_options,
+							   myState->bistate);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -608,8 +639,13 @@ intorel_shutdown(DestReceiver *self)
 
 	if (!into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		if (myState->mstate != NULL)
+			table_modify_end(myState->mstate);
+		else
+		{
+			FreeBulkInsertState(myState->bistate);
+			table_finish_bulk_insert(myState->rel, myState->ti_options);
+		}
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 010097873d..fa495ec533 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,7 +30,9 @@
 #include "commands/tablespace.h"
 #include "executor/executor.h"
 #include "executor/spi.h"
+#include "foreign/fdwapi.h"
 #include "miscadmin.h"
+#include "optimizer/optimizer.h"
 #include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
@@ -51,6 +53,12 @@ typedef struct
 	CommandId	output_cid;		/* cmin to insert in output tuples */
 	int			ti_options;		/* table_tuple_insert performance options */
 	BulkInsertState bistate;	/* bulk insert state */
+
+	/* Table modify state. NULL if multi-inserts isn't supported. */
+	TableModifyState *mstate;
+
+	/* True if SELECT query contains volatile functions */
+	bool		volatile_funcs;
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -428,6 +436,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* Plan the query which will generate data for the refresh. */
 	plan = pg_plan_query(query, queryString, CURSOR_OPT_PARALLEL_OK, NULL);
 
+	/*
+	 * Check if the stored MATERIALIZED VIEW query has any volatile functions.
+	 */
+	((DR_transientrel *) dest)->volatile_funcs =
+		contain_volatile_functions_after_planning((Expr *) query);
+
 	/*
 	 * Use a snapshot with an updated command ID to ensure this query sees
 	 * results of any previously executed queries.  (This could only matter if
@@ -492,8 +506,26 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->transientrel = transientrel;
 	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->ti_options = TABLE_INSERT_SKIP_FSM |
+		TABLE_INSERT_FROZEN |
+		TABLE_INSERT_BAS_BULKWRITE;
+	myState->bistate = NULL;
+	myState->mstate = NULL;
+
+	/* Set up the state for multi or bulk inserts */
+	if (TableModifyIsMultiInsertsSupported(myState->transientrel,
+										   myState->volatile_funcs))
+	{
+		myState->mstate = table_modify_begin(myState->transientrel,
+											 myState->output_cid,
+											 myState->ti_options,
+											 NULL,	/* Multi-insert buffer
+													 * flush callback */
+											 NULL); /* Multi-insert buffer
+													 * flush callback context */
+	}
+	else
+		myState->bistate = GetBulkInsertState();
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -519,11 +551,14 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * tuple's xmin), but since we don't do that here...
 	 */
 
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	if (myState->mstate != NULL)
+		table_modify_buffer_insert(myState->mstate, slot);
+	else
+		table_tuple_insert(myState->transientrel,
+						   slot,
+						   myState->output_cid,
+						   myState->ti_options,
+						   myState->bistate);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -538,9 +573,13 @@ transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	if (myState->mstate != NULL)
+		table_modify_end(myState->mstate);
+	else
+	{
+		FreeBulkInsertState(myState->bistate);
+		table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	}
 
 	/* close transientrel, but keep lock until commit */
 	table_close(myState->transientrel, NoLock);
@@ -984,3 +1023,50 @@ CloseMatViewIncrementalMaintenance(void)
 	matview_maintenance_depth--;
 	Assert(matview_maintenance_depth >= 0);
 }
+
+/*
+ * Check if multi-inserts is supported.
+ *
+ * It's generally more efficient to prepare a bunch of tuples for insertion,
+ * and insert them in one multi-inserts call, than call
+ * table_tuple_insert() separately for every tuple. However, there are a
+ * number of reasons why we might not be able to do this. In general, can't
+ * support multi-inserts in the following cases:
+ *
+ * When there are any BEFORE/INSTEAD OF triggers on the table or any volatile
+ * functions/expressions in the SELECT query. Such triggers or volatile
+ * expressions might query the table we're inserting into and act differently
+ * if the tuples that have already been processed and prepared for insertion
+ * are not there.
+ *
+ * When inserting into partitioned table. For partitioned tables, we may still
+ * be able to perform multi-inserts. However, the possibility of this depends
+ * on which types of triggers exist on the partition. We must disable
+ * multi-inserts if the partition is a foreign table that can't use batching or
+ * it has any before row insert or insert instead triggers (same as we checked
+ * above for the parent table). We really can't know all these unless we start
+ * inserting tuples into the respective partitions. We can have an intermediate
+ * insert state to show the intent to do multi-inserts and later determine if
+ * we can use multi-inserts for the partition being inserted into.
+ *
+ * When inserting into foreign table. For foreign tables, we may still be able
+ * to do multi-inserts if the FDW supports batching.
+ */
+bool
+TableModifyIsMultiInsertsSupported(Relation rel, bool volatile_funcs)
+{
+	if (volatile_funcs)
+		return false;
+
+	/*
+	 * For CREATE TABLE AS, CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED
+	 * VIEW, we really can't have triggers or can't create table as
+	 * partitioned or foreign. So, we will assert.
+	 */
+	Assert(rel->trigdesc == NULL);
+	Assert(rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE);
+	Assert(rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE);
+
+	/* Can support multi-inserts */
+	return true;
+}
diff --git a/src/include/commands/matview.h b/src/include/commands/matview.h
index c8811e8fc7..28abd7b89b 100644
--- a/src/include/commands/matview.h
+++ b/src/include/commands/matview.h
@@ -33,4 +33,7 @@ extern DestReceiver *CreateTransientRelDestReceiver(Oid transientoid);
 
 extern bool MatViewIncrementalMaintenanceIsEnabled(void);
 
+extern bool TableModifyIsMultiInsertsSupported(Relation rel,
+											   bool volatile_funcs);
+
 #endif							/* MATVIEW_H */
-- 
2.40.1



  [application/octet-stream] v24-0001-Introduce-new-table-AM-for-multi-inserts.patch (15.4K, 3-v24-0001-Introduce-new-table-AM-for-multi-inserts.patch)
  download | inline diff:
From af21a26a83b4efdb5a73d1ba00e03d5138295ded Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 26 Oct 2024 12:35:08 +0000
Subject: [PATCH v24 1/4] Introduce new table AM for multi-inserts

Until now, it's the COPY ... FROM command using multi inserts
(i.e. buffer some tuples and inserts them to table at once).
Basic idea of multi-inserts is that less WAL and reduced buffer
locking. Multi-inserts is faster than calling heap_insert() in a
loop, because when multiple tuples can be inserted on a single
page, we can write just a single WAL record covering all of them,
and only need to lock/unlock the page once.

Various other commands can benefit from this multi-inserts logic
[Reusable].

Also, there's a need to have these multi-inserts AMs (Access
Methods) as scan-like API [Usability]. With this, various table
AMs can define their own buffering and flushing strategy
[Flexibility] based on the way they store the data in the
underlying storage (e.g. columnar).

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/[email protected]
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/access/heap/heapam.c         | 209 ++++++++++++++++++++++-
 src/backend/access/heap/heapam_handler.c |   6 +
 src/backend/access/table/tableamapi.c    |   5 +
 src/include/access/heapam.h              |  38 +++++
 src/include/access/tableam.h             |  81 +++++++++
 src/tools/pgindent/typedefs.list         |   3 +
 6 files changed, 341 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 75ff9e7388..1426cd40c6 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -50,6 +50,7 @@
 #include "storage/procarray.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/spccache.h"
 
 
@@ -102,7 +103,7 @@ static int	bottomup_sort_and_shrink(TM_IndexDeleteOp *delstate);
 static XLogRecPtr log_heap_new_cid(Relation relation, HeapTuple tup);
 static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool key_required,
 										bool *copy);
-
+static void heap_modify_insert_end(TableModifyState *state);
 
 /*
  * Each tuple lock mode has a corresponding heavyweight lock, and one or two
@@ -2603,6 +2604,212 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize heap modify state.
+ */
+TableModifyState *
+heap_modify_begin(Relation rel,
+				  CommandId cid,
+				  int options,
+				  TableModifyBufferFlushCb buffer_flush_cb,
+				  void *buffer_flush_ctx)
+{
+	TableModifyState *state;
+	MemoryContext context;
+	MemoryContext oldcontext;
+
+	context = AllocSetContextCreate(TopTransactionContext,
+									"heap_modify memory context",
+									ALLOCSET_DEFAULT_SIZES);
+
+	oldcontext = MemoryContextSwitchTo(context);
+	state = palloc0(sizeof(TableModifyState));
+	state->rel = rel;
+	state->mem_ctx = context;
+	state->cid = cid;
+	state->options = options;
+	state->buffer_flush_cb = buffer_flush_cb;
+	state->buffer_flush_ctx = buffer_flush_ctx;
+	MemoryContextSwitchTo(oldcontext);
+
+	return state;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_modify_buffer_insert(TableModifyState *state,
+						  TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	oldcontext = MemoryContextSwitchTo(state->mem_ctx);
+
+	/* First time through, initialize heap insert state */
+	if (state->data == NULL)
+	{
+		istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
+		istate->bistate = NULL;
+		istate->mistate = NULL;
+		state->data = istate;
+		mistate =
+			(HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots =
+			(TupleTableSlot **) palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+		istate->mistate = mistate;
+
+		/*
+		 * heap_multi_insert() can leak memory. So switch to this memory
+		 * context before every heap_multi_insert() call and reset when
+		 * finished.
+		 */
+		mistate->mem_ctx = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
+			istate->bistate = GetBulkInsertState();
+	}
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		/*
+		 * We use virtual tuple slots buffered slots for leveraging the
+		 * optimization it provides to minimize physical data copying. The
+		 * virtual slot gets materialized when we copy (via below
+		 * ExecCopySlot) the tuples from the source slot which can be of any
+		 * type. This way, it is ensured that the tuple storage doesn't depend
+		 * on external memory, because all the datums that aren't passed by
+		 * value are copied into the slot's memory context.
+		 */
+		dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+									 &TTSOpsVirtual);
+
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	Assert(TTS_IS_VIRTUAL(dstslot));
+
+	/*
+	 * Note that the copy clears the previous destination slot contents, so no
+	 * need to explicitly ExecClearTuple() here.
+	 */
+	ExecCopySlot(dstslot, slot);
+
+	mistate->cur_slots++;
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS)
+		heap_modify_buffer_flush(state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_modify_buffer_flush(TableModifyState *state)
+{
+	HeapInsertState *istate;
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+	Assert(istate->mistate != NULL);
+	mistate = istate->mistate;
+
+	/* Quick exit if we have flushed already */
+	if (mistate->cur_slots == 0)
+		return;
+
+	/*
+	 * heap_multi_insert() can leak memory, so switch to short-lived memory
+	 * context before calling it.
+	 */
+	oldcontext = MemoryContextSwitchTo(mistate->mem_ctx);
+	heap_multi_insert(state->rel,
+					  mistate->slots,
+					  mistate->cur_slots,
+					  state->cid,
+					  state->options,
+					  istate->bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->mem_ctx);
+
+	/*
+	 * Invoke caller-supplied buffer flush callback after inserting rows from
+	 * the buffers to heap.
+	 */
+	if (state->buffer_flush_cb != NULL)
+	{
+		for (int i = 0; i < mistate->cur_slots; i++)
+		{
+			state->buffer_flush_cb(state->buffer_flush_ctx,
+								   mistate->slots[i]);
+		}
+	}
+
+	mistate->cur_slots = 0;
+}
+
+/*
+ * Heap insert specific function used for performing work at the end like
+ * flushing remaining buffered tuples, cleaning up the insert state and tuple
+ * table slots used for buffered tuples etc.
+ */
+static void
+heap_modify_insert_end(TableModifyState *state)
+{
+	HeapInsertState *istate;
+
+	/* Quick exit if we haven't inserted anything yet */
+	if (state->data == NULL)
+		return;
+
+	istate = (HeapInsertState *) state->data;
+
+	if (istate->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate = istate->mistate;
+
+		heap_modify_buffer_flush(state);
+
+		Assert(mistate->cur_slots == 0);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->mem_ctx);
+	}
+
+	if (istate->bistate != NULL)
+		FreeBulkInsertState(istate->bistate);
+}
+
+/*
+ * Clean heap modify state.
+ */
+void
+heap_modify_end(TableModifyState *state)
+{
+	heap_modify_insert_end(state);
+	MemoryContextDelete(state->mem_ctx);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index a8d95e0f1c..d2ef6b4b78 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2644,6 +2644,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_modify_begin = heap_modify_begin,
+	.tuple_modify_buffer_insert = heap_modify_buffer_insert,
+	.tuple_modify_buffer_flush = heap_modify_buffer_flush,
+	.tuple_modify_end = heap_modify_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index e9b598256f..772f29b1b5 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -97,6 +97,11 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->scan_sample_next_block != NULL);
 	Assert(routine->scan_sample_next_tuple != NULL);
 
+	Assert(routine->tuple_modify_begin != NULL);
+	Assert(routine->tuple_modify_buffer_insert != NULL);
+	Assert(routine->tuple_modify_buffer_flush != NULL);
+	Assert(routine->tuple_modify_end != NULL);
+
 	return routine;
 }
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b951466ced..b9404bb83d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -272,6 +272,33 @@ typedef enum
 	PRUNE_VACUUM_CLEANUP,		/* VACUUM 2nd heap pass */
 } PruneReason;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+typedef struct HeapMultiInsertState
+{
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of buffered slots currently held */
+	int			cur_slots;
+
+	/* Memory context for dealing with multi inserts */
+	MemoryContext mem_ctx;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+} HeapInsertState;
+
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -322,6 +349,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableModifyState *heap_modify_begin(Relation rel,
+										   CommandId cid,
+										   int options,
+										   TableModifyBufferFlushCb buffer_flush_cb,
+										   void *buffer_flush_ctx);
+extern void heap_modify_buffer_insert(TableModifyState *state,
+									  TupleTableSlot *slot);
+extern void heap_modify_buffer_flush(TableModifyState *state);
+extern void heap_modify_end(TableModifyState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index adb478a93c..57b71eef38 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -254,11 +254,42 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+struct TableModifyState;
+
+/* Callback invoked upon flushing each buffered tuple */
+typedef void (*TableModifyBufferFlushCb) (void *context,
+										  TupleTableSlot *slot);
+
+/* Holds table modify state */
+typedef struct TableModifyState
+{
+	/* These fields are used for inserts for now */
+
+	Relation	rel;			/* Relation to insert to */
+	CommandId	cid;			/* Command ID for insert */
+	int			options;		/* TABLE_INSERT options */
+
+	/* Memory context for dealing with modify state variables */
+	MemoryContext mem_ctx;
+
+	/* Flush callback and its context used for multi inserts */
+	TableModifyBufferFlushCb buffer_flush_cb;
+	void	   *buffer_flush_ctx;
+
+	/* Table AM specific data */
+	void	   *data;
+} TableModifyState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
 #define TABLE_INSERT_FROZEN			0x0004
 #define TABLE_INSERT_NO_LOGICAL		0x0008
+/*
+ * Use BAS_BULKWRITE buffer access strategy. 0x0010 is for
+ * HEAP_INSERT_SPECULATIVE.
+ */
+#define TABLE_INSERT_BAS_BULKWRITE	0x0020
 
 /* flag bits for table_tuple_lock */
 /* Follow tuples whose update is in progress if lock modes don't conflict  */
@@ -577,6 +608,21 @@ typedef struct TableAmRoutine
 	void		(*finish_bulk_insert) (Relation rel, int options);
 
 
+	/* ------------------------------------------------------------------------
+	 * Table Modify related functions.
+	 * ------------------------------------------------------------------------
+	 */
+	TableModifyState *(*tuple_modify_begin) (Relation rel,
+											 CommandId cid,
+											 int options,
+											 TableModifyBufferFlushCb buffer_flush_cb,
+											 void *buffer_flush_ctx);
+	void		(*tuple_modify_buffer_insert) (TableModifyState *state,
+											   TupleTableSlot *slot);
+	void		(*tuple_modify_buffer_flush) (TableModifyState *state);
+	void		(*tuple_modify_end) (TableModifyState *state);
+
+
 	/* ------------------------------------------------------------------------
 	 * DDL related functionality.
 	 * ------------------------------------------------------------------------
@@ -1608,6 +1654,41 @@ table_finish_bulk_insert(Relation rel, int options)
 		rel->rd_tableam->finish_bulk_insert(rel, options);
 }
 
+/* ------------------------------------------------------------------------
+ * Table Modify related functions.
+ * ------------------------------------------------------------------------
+ */
+static inline TableModifyState *
+table_modify_begin(Relation rel,
+				   CommandId cid,
+				   int options,
+				   TableModifyBufferFlushCb buffer_flush_cb,
+				   void *buffer_flush_ctx)
+{
+	return rel->rd_tableam->tuple_modify_begin(rel,
+											   cid,
+											   options,
+											   buffer_flush_cb,
+											   buffer_flush_ctx);
+}
+
+static inline void
+table_modify_buffer_insert(TableModifyState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_insert(state, slot);
+}
+
+static inline void
+table_modify_buffer_flush(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_buffer_flush(state);
+}
+
+static inline void
+table_modify_end(TableModifyState *state)
+{
+	state->rel->rd_tableam->tuple_modify_end(state);
+}
 
 /* ------------------------------------------------------------------------
  * DDL related functionality.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..e7ddf29c16 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1147,6 +1147,8 @@ HeadlineJsonState
 HeadlineParsedText
 HeadlineWordEntry
 HeapCheckContext
+HeapInsertState
+HeapMultiInsertState
 HeapPageFreeze
 HeapScanDesc
 HeapTuple
@@ -2873,6 +2875,7 @@ TableFuncScanState
 TableFuncType
 TableInfo
 TableLikeClause
+TableModifyState
 TableSampleClause
 TableScanDesc
 TableScanDescData
-- 
2.40.1



  [application/octet-stream] v24-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch (14.3K, 4-v24-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch)
  download | inline diff:
From fd64f007092f4a71d61aa1f3347390da05c2460c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Sat, 26 Oct 2024 12:37:14 +0000
Subject: [PATCH v24 3/4] Use new multi-inserts table AM for COPY ... FROM

This commit uses the new multi-inserts table AM added by commit
<<CHANGE_ME>> for COPY ... FROM command.

Author: Bharath Rupireddy
Reviewed-by: Jeff Davis
Discussion: https://www.postgresql.org/message-id/CALj2ACVi9eTRYR%3Dgdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/8633171cb034aafc260fdf37df04b6c779aa1e2f.camel%40j-davis.com
---
 src/backend/commands/copyfrom.c          | 254 +++++++++++++++--------
 src/include/commands/copyfrom_internal.h |   4 +-
 src/tools/pgindent/typedefs.list         |   1 +
 3 files changed, 171 insertions(+), 88 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 07cbd5d22b..18fb609cbe 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -74,14 +74,27 @@
  */
 #define MAX_PARTITION_BUFFERS	32
 
+/* Context for multi-inserts buffer flush callback */
+typedef struct MultiInsertBufferFlushCtx
+{
+	CopyFromState cstate;
+	ResultRelInfo *resultRelInfo;
+	EState	   *estate;
+} MultiInsertBufferFlushCtx;
+
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableModifyState *mstate;	/* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
+	TupleTableSlot *mislot;		/* Slot used for multi-inserts */
+	MultiInsertBufferFlushCtx *mibufferctx; /* Multi-inserts buffer flush
+											 * callback context */
 	int			nused;			/* number of 'slots' containing tuples */
+	int			currslotno;		/* Current buffered slot number that's being
+								 * flushed; Used to get correct cur_lineno for
+								 * errors while in flush callback. */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
 } CopyMultiInsertBuffer;
@@ -216,19 +229,96 @@ CopyLimitPrintoutLength(const char *str)
 	return res;
 }
 
+/*
+ * Implements for multi-inserts buffer flush callback
+ * i.e. TableModifyEndCallback.
+ *
+ * NB: Caller must take care of opening and closing the indices.
+ */
+static void
+MultiInsertBufferFlushCb(void *context, TupleTableSlot *slot)
+{
+	MultiInsertBufferFlushCtx *mibufferctx = (MultiInsertBufferFlushCtx *) context;
+	CopyFromState cstate = mibufferctx->cstate;
+	ResultRelInfo *resultRelInfo = mibufferctx->resultRelInfo;
+	EState	   *estate = mibufferctx->estate;
+	CopyMultiInsertBuffer *buffer = resultRelInfo->ri_CopyMultiInsertBuffer;
+
+	/*
+	 * If there are any indexes, update them for all the inserted tuples, and
+	 * run AFTER ROW INSERT triggers.
+	 */
+	if (resultRelInfo->ri_NumIndices > 0)
+	{
+		List	   *recheckIndexes;
+
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+		recheckIndexes =
+			ExecInsertIndexTuples(resultRelInfo,
+								  slot, estate, false,
+								  false, NULL, NIL, false);
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, recheckIndexes,
+							 cstate->transition_capture);
+
+		list_free(recheckIndexes);
+	}
+
+	/*
+	 * There's no indexes, but see if we need to run AFTER ROW INSERT triggers
+	 * anyway.
+	 */
+	else if (resultRelInfo->ri_TrigDesc != NULL &&
+			 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+			  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+	{
+		cstate->cur_lineno = buffer->linenos[buffer->currslotno++];
+
+		ExecARInsertTriggers(estate, resultRelInfo,
+							 slot, NIL,
+							 cstate->transition_capture);
+	}
+
+	Assert(buffer->currslotno <= buffer->nused);
+}
+
 /*
  * Allocate memory and initialize a new CopyMultiInsertBuffer for this
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
+						  CopyFromState cstate, EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		buffer->mibufferctx =
+			(MultiInsertBufferFlushCtx *) palloc(sizeof(MultiInsertBufferFlushCtx));
+		buffer->mibufferctx->cstate = cstate;
+		buffer->mibufferctx->resultRelInfo = rri;
+		buffer->mibufferctx->estate = estate;
+
+		buffer->mstate = table_modify_begin(rri->ri_RelationDesc,
+											miinfo->mycid,
+											miinfo->ti_options,
+											MultiInsertBufferFlushCb,
+											buffer->mibufferctx);
+
+		buffer->slots = NULL;
+	}
+	else
+	{
+		buffer->mstate = NULL;
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+	}
+
+	buffer->mislot = NULL;
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -239,11 +329,12 @@ CopyMultiInsertBufferInit(ResultRelInfo *rri)
  */
 static inline void
 CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+							   ResultRelInfo *rri, CopyFromState cstate,
+							   EState *estate)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri, cstate, estate);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -276,7 +367,7 @@ CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	 * tuples their way for the first time.
 	 */
 	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+		CopyMultiInsertInfoSetupBuffer(miinfo, rri, cstate, estate);
 }
 
 /*
@@ -320,8 +411,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -393,13 +482,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -407,56 +491,18 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-		for (i = 0; i < nused; i++)
-		{
-			/*
-			 * If there are any indexes, update them for all the inserted
-			 * tuples, and run AFTER ROW INSERT triggers.
-			 */
-			if (resultRelInfo->ri_NumIndices > 0)
-			{
-				List	   *recheckIndexes;
-
-				cstate->cur_lineno = buffer->linenos[i];
-				recheckIndexes =
-					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
-										  false, NULL, NIL, false);
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], recheckIndexes,
-									 cstate->transition_capture);
-				list_free(recheckIndexes);
-			}
+		table_modify_buffer_flush(buffer->mstate);
 
-			/*
-			 * There's no indexes, but see if we need to run AFTER ROW INSERT
-			 * triggers anyway.
-			 */
-			else if (resultRelInfo->ri_TrigDesc != NULL &&
-					 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-					  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
-			{
-				cstate->cur_lineno = buffer->linenos[i];
-				ExecARInsertTriggers(estate, resultRelInfo,
-									 slots[i], NIL,
-									 cstate->transition_capture);
-			}
+		Assert(buffer->currslotno <= buffer->nused);
+		buffer->currslotno = 0;
 
-			ExecClearTuple(slots[i]);
-		}
+		/*
+		 * Indexes are updated and AFTER ROW INSERT triggers (if any) are run
+		 * in the flush callback CopyModifyBufferFlushCallback.
+		 */
 
 		/* Update the row counter and progress of the COPY command */
 		*processed += nused;
@@ -492,19 +538,18 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
 	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
+		table_modify_end(buffer->mstate);
+		ExecDropSingleTupleTableSlot(buffer->mislot);
+		pfree(buffer->mibufferctx);
 	}
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -598,15 +643,36 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(buffer->nused < MAX_BUFFERED_TUPLES);
 
 	nused = buffer->nused;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		if (buffer->mislot == NULL)
+		{
+			buffer->mislot = MakeTupleTableSlot(RelationGetDescr(rri->ri_RelationDesc),
+												&TTSOpsVirtual);
+		}
+
+		/* Caller must clear the slot */
+		slot = buffer->mislot;
+	}
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -620,7 +686,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 
 	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+
+#ifdef USE_ASSERT_CHECKING
+	if (rri->ri_FdwRoutine != NULL)
+		Assert(slot == buffer->slots[buffer->nused]);
+#endif
 
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
@@ -628,6 +698,22 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	/* Record this slot as being used */
 	buffer->nused++;
 
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		Assert(slot == buffer->mislot);
+		buffer->currslotno = 0;
+
+		table_modify_buffer_insert(buffer->mstate, slot);
+	}
+	else
+	{
+		/*
+		 * The slot previously might point into the per-tuple context. For
+		 * batching it needs to be longer lived.
+		 */
+		ExecMaterializeSlot(slot);
+	}
+
 	/* Update how many tuples are stored and their size */
 	miinfo->bufferedTuples++;
 	miinfo->bufferedBytes += tuplen;
@@ -841,7 +927,7 @@ CopyFrom(CopyFromState cstate)
 	/*
 	 * It's generally more efficient to prepare a bunch of tuples for
 	 * insertion, and insert them in one
-	 * table_multi_insert()/ExecForeignBatchInsert() call, than call
+	 * table_modify_buffer_insert()/ExecForeignBatchInsert() call, than call
 	 * table_tuple_insert()/ExecForeignInsert() separately for every tuple.
 	 * However, there are a number of reasons why we might not be able to do
 	 * this.  These are explained below.
@@ -925,7 +1011,8 @@ CopyFrom(CopyFromState cstate)
 			insertMethod = CIM_MULTI;
 
 		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+								estate, mycid,
+								ti_options | TABLE_INSERT_BAS_BULKWRITE);
 	}
 
 	/*
@@ -1094,7 +1181,8 @@ CopyFrom(CopyFromState cstate)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
 						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+													   resultRelInfo, cstate,
+													   estate);
 				}
 				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
 						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
@@ -1224,12 +1312,6 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
 					CopyMultiInsertInfoStore(&multiInsertInfo,
 											 resultRelInfo, myslot,
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index cad52fcc78..14addbc6f6 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -46,9 +46,9 @@ typedef enum EolType
 typedef enum CopyInsertMethod
 {
 	CIM_SINGLE,					/* use table_tuple_insert or ExecForeignInsert */
-	CIM_MULTI,					/* always use table_multi_insert or
+	CIM_MULTI,					/* always use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert */
-	CIM_MULTI_CONDITIONAL,		/* use table_multi_insert or
+	CIM_MULTI_CONDITIONAL,		/* use table_modify_buffer_insert or
 								 * ExecForeignBatchInsert only if valid */
 } CopyInsertMethod;
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e7ddf29c16..bf21e43ce1 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1664,6 +1664,7 @@ MonotonicFunction
 MorphOpaque
 MsgType
 MultiAssignRef
+MultiInsertBufferFlushCtx
 MultiSortSupport
 MultiSortSupportData
 MultiXactId
-- 
2.40.1



^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-29 19:29                                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-10-26 13:30                                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-10-28 08:43                                         ` Jingtang Zhang <[email protected]>
  2024-10-28 14:48                                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jingtang Zhang <[email protected]>
  0 siblings, 1 reply; 30+ messages in thread

From: Jingtang Zhang @ 2024-10-28 08:43 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: [email protected]; pgsql-hackers

Hi! Glad to see update in this thread.

Little question about v24 0002 patch: would it be better to move the
implementation of TableModifyIsMultiInsertsSupported to somewhere for table
AM
level? Seems it is a common function for future use, not a specific one for
matview.

---

Regards, Jingtang


Bharath Rupireddy <[email protected]> 于2024年10月26日周六
21:31写道：

> Hi,
>
> Thanks for looking into this.
>
> On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis <[email protected]> wrote:
> >
> > I believe we need the branching in the caller anyway:
> >
> > 1. If there is a BEFORE row trigger with a volatile function, the
> > visibility rules[1] mean that the function should see changes from all
> > the rows inserted so far this command, which won't work if they are
> > still in the buffer.
> >
> > 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> > that the function should see all previous rows inserted.
> >
> > 3. If there are volatile functions in the target list or WHERE clause,
> > the same visibility semantics apply.
> >
> > 4. If there's a "RETURNING ctid" clause, we need to either come up with
> > a way to return the tuples after flushing, or we need to use the
> > single-tuple path. (Similarly in the future when we support UPDATE ...
> > RETURNING, as Matthias pointed out.)
> >
> > If we need two paths in each caller anyway, it seems cleaner to just
> > wrap the check for tuple_modify_buffer_insert in
> > table_modify_buffer_enabled().
> >
> > We could perhaps use a one path and then force a batch size of one or
> > something, which is an alternative, but we have to be careful not to
> > introduce a regression (and it still requires a solution for #4).
>
> I chose to branch in the caller e.g. if there's a volatile function
> SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
> table_tuple_insert() path, else multi-insert path.
>
> I am posting the new v24 patch set organized as follows: 0001
> introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
> using the new table AM for COPY ... FROM. I, for now, discarded the
> INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
> is to take the basic stuff forward.
>
> I reworked structure names, members and function names, reworded
> comments, addressed review comments in the v24 patches. Please have a
> look.
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-29 19:29                                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-10-26 13:30                                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-10-28 08:43                                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jingtang Zhang <[email protected]>
@ 2024-10-28 14:48                                           ` Jingtang Zhang <[email protected]>
  0 siblings, 0 replies; 30+ messages in thread

From: Jingtang Zhang @ 2024-10-28 14:48 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: [email protected]; pgsql-hackers; Andres Freund <[email protected]>; Masahiko Sawada <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>; Jeff Davis <[email protected]>

Oh, another comments for v24-0001 patch: we are in heam AM now, should we use
something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option,
just like other heap AM options do?

> +		if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
> +			istate->bistate = GetBulkInsertState();


—

Regards, Jingtang







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

* Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-21 07:40 ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-23 00:17   ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-25 19:58     ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-26 15:37       ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-26 19:49         ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-03-27 08:12           ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-03-31 15:48             ` Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
  2024-04-02 19:40               ` Re: New Table Access Methods for Multi and Single Inserts Jeff Davis <[email protected]>
  2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
  2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
  2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
@ 2024-08-27 21:37                               ` Jeff Davis <[email protected]>
  1 sibling, 0 replies; 30+ messages in thread

From: Jeff Davis @ 2024-08-27 21:37 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Masahiko Sawada <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>; Alexander Korotkov <[email protected]>

On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <[email protected]> wrote:
> > 
> > Please find the v22 patches with the above changes.
> 
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.

In patches 0002-0004, they must avoid the multi insert path when there
are before-row triggers, instead-of-row triggers, or volatile functions
used (see copyfrom.c:917-1006).

Also, until we decide on the RETURNING clause, we should block the
multi-insert path for that, as well, or implement it by using the
callback to copy tuples into the caller's context.

In 0003, why do you need the global insert_modify_buffer_flush_context?

0004 is the only place that calls table_modify_buffer_flush(). Is that
really necessary, or is automatic flushing enough?

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 30+ messages in thread

end of thread, other threads:[~2024-10-28 14:48 UTC | newest]

Thread overview: 30+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21 04:14 Re: New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
2024-03-21 07:40 ` Bharath Rupireddy <[email protected]>
2024-03-23 00:17   ` Jeff Davis <[email protected]>
2024-03-25 19:58     ` Bharath Rupireddy <[email protected]>
2024-03-26 15:37       ` Jeff Davis <[email protected]>
2024-03-26 19:49         ` Bharath Rupireddy <[email protected]>
2024-03-27 08:12           ` Jeff Davis <[email protected]>
2024-03-31 15:48             ` Bharath Rupireddy <[email protected]>
2024-04-02 19:40               ` Jeff Davis <[email protected]>
2024-04-03 09:02                 ` Bharath Rupireddy <[email protected]>
2024-04-03 12:25                   ` Bharath Rupireddy <[email protected]>
2024-04-24 12:49                 ` Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-04-24 16:07                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Pavel Stehule <[email protected]>
2024-04-25 16:41                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-04-29 06:06                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-05-15 07:26                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-05-15 23:31                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-05-16 19:00                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-06-05 07:12                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-08-26 05:39                             ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-08-26 21:18                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-08-26 21:59                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Matthias van de Meent <[email protected]>
2024-08-27 05:42                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-08-27 21:43                                 ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-08-29 07:25                                   ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-08-29 19:29                                     ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>
2024-10-26 13:30                                       ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Bharath Rupireddy <[email protected]>
2024-10-28 08:43                                         ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jingtang Zhang <[email protected]>
2024-10-28 14:48                                           ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jingtang Zhang <[email protected]>
2024-08-27 21:37                               ` Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM Jeff Davis <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox