New Table Access Methods for Multi and Single Inserts

public inbox for [email protected]  
help / color / mirror / Atom feed

New Table Access Methods for Multi and Single Inserts
43+ messages / 10 participants
[nested] [flat]

* New Table Access Methods for Multi and Single Inserts
@ 2020-12-08 12:57  Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-08 12:57 UTC (permalink / raw)
  To: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>

Hi,

Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
slots allocation, maintenance, decision to flush and clean up, need to
be implemented outside the table_multi_insert() API. The main problem
is that it fails to take into consideration the underlying storage
engine capabilities, for more details of this point refer to a
discussion in multi inserts in CTAS thread[1]. This also creates a lot
of duplicate code which is more error prone and not maintainable.

More importantly, in another thread [3] @Andres Freund suggested to
have table insert APIs in such a way that they look more like 'scan'
APIs i.e. insert_begin, insert, insert_end. The main advantages doing
this are(quoting from his statement in [3]) - "more importantly it'd
allow an AM to optimize operations across multiple inserts, which is
important for column stores."

I propose to introduce new table access methods for both multi and
single inserts based on the prototype suggested by Andres in [3]. Main
design goal of these new APIs is to give flexibility to tableam
developers in implementing multi insert logic dependent on the
underlying storage engine.

Below are the APIs. I suggest to have a look at
v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
of the new data structure and the API functionality. Note that
temporarily I used XX_v2, we can change it later.

TableInsertState* table_insert_begin(initial_args);
void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_flush(TableInsertState *state);
void table_insert_end(TableInsertState *state);

I'm attaching a few patches(just to show that these APIs work, avoids
a lot of duplicate code and makes life easier). Better commenting can
be added later. If these APIs and patches look okay, we can even
consider replacing them in other places such as nodeModifyTable.c and
so on.

v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
introduces new table access methods for multi and single inserts. Also
implements/rearranges the outside code for heap am into these new
APIs.
v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
---> adds new multi insert table access methods to CREATE TABLE AS,
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
new single insert table access method to ALTER TABLE rewrite table
code.
v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
new single and multi insert table access method to COPY code.

Thoughts?

Many thanks to Robert, Vignesh and Dilip for offlist discussion.

[1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
[2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
[3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/octet-stream] v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (20.4K, 2-v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From 8dd54a131d27651f36587a784e0eedadfdf4ee5a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 8 Dec 2020 12:02:31 +0530
Subject: [PATCH v1] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c         | 245 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 +++++++-
 src/include/access/heapam.h              |  11 +
 src/include/access/tableam.h             | 120 +++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 471 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..48c93ccce7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -66,6 +66,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2371,6 +2372,250 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Allocate and initialize TableInsertState.
+ *
+ * If alloc_bistate is true, then bulk insert state is allocated, otherwise not
+ *
+ * For single inserts:
+ *  1) Specify alloc_bistate as false, then multi insert state is NULL.
+ *  2) mi_max_slots and mi_max_size are ignored, but good to specify negative
+ * 	   values.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state is allcoated.
+ *  2) Specify mi_max_slots > 0 i.e. the number of slots to buffer.
+ * 	   mi_max_slots <= 0 is invalid.
+ *  3) Specify mi_max_size > 0 i.e. the total tuple size (in bytes) the
+ * 	   buffered slots can hold until flush.
+ *  4) Flush the buffers either if all the mi_max_slots are filled or if the
+ * 	   total tuple size of the so far buffered slots is >= mi_max_size. If
+ *	   mi_max_size <= 0, then flush the buffers only when all the mi_max_slots
+ *	   are filled.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int options,
+									bool alloc_bistate, bool is_multi,
+									int32 mi_max_slots, int64 mi_max_size)
+{
+	TableInsertState *state = NULL;
+
+	state = palloc0(sizeof(TableInsertState));
+
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	if (is_multi)
+	{
+		if (mi_max_slots > 0)
+		{
+			state->mistate = palloc0(sizeof(TableMultiInsertState));
+			state->mistate->slots =
+							palloc0(sizeof(TupleTableSlot *) * mi_max_slots);
+			state->mistate->max_slots = mi_max_slots;
+			state->mistate->max_size = mi_max_size;
+			state->mistate->cur_slots = 0;
+			state->mistate->cur_size = 0;
+			state->mistate->cur_tup_size = -1;
+			state->mistate->clear_slots = true;
+			state->mistate->flushed	= false;
+
+			/*
+			 * Create a temporary memory context so that we can reset once per
+			 * multi insert batch.
+			 */
+			state->mistate->context =
+							AllocSetContextCreate(CurrentMemoryContext,
+												  "heap_multi_insert",
+												  ALLOCSET_DEFAULT_SIZES);
+		}
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid number of slots specified for heap multi inserts")));
+	}
+	else
+		state->mistate = NULL;
+
+	return state;
+}
+
+/* Insert a tuple from a slot into table AM routine. */
+void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Buffer the input slots and insert the tuples from the buffered slots at a
+ * time into a table.
+ *
+ * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
+ * (in bytes) the buffered slots can hold until flush is specified and the
+ * current tuple size i.e. cur_tup_size is not known.
+ *
+ * Flush the tuples from the buffered slots into the table:
+ *  1) either if all the slots are filled up
+ *  2) or if the mi_max_size is specified and the total tuple size of the
+ * 	   currently buffered slots are >= mi_max_size.
+ */
+void heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+
+	Assert(state->mistate != NULL);
+	Assert(state->mistate->slots != NULL);
+
+	if (state->mistate->slots[state->mistate->cur_slots] == NULL)
+		state->mistate->slots[state->mistate->cur_slots] =
+										table_slot_create(state->rel, NULL);
+
+	batchslot = state->mistate->slots[state->mistate->cur_slots];
+
+	ExecCopySlot(batchslot, slot);
+
+	/* Reset the flush state if previously set. */
+	if (state->mistate->flushed)
+		state->mistate->flushed = false;
+
+	/* Compute the tuple size only if asked to do so. */
+	if (state->mistate->max_size > 0 && state->mistate->cur_tup_size <= 0)
+	{
+		/* We are here when the tuple size is not known in the caller. */
+		Size sz;
+
+		/*
+		 * Calculate the tuple size after the original slot is copied. Because
+		 * the copied slot type and the tuple size may change.
+		 */
+		sz = GetTupleSize(batchslot, state->mistate->max_size);
+
+		state->mistate->cur_slots++;
+		state->mistate->cur_size += sz;
+
+	}
+	else if (state->mistate->max_size > 0 && state->mistate->cur_tup_size > 0)
+	{
+		/* Tuple size is known in the caller, just use and reset it. */
+		state->mistate->cur_slots++;
+		state->mistate->cur_size += state->mistate->cur_tup_size;
+		state->mistate->cur_tup_size = -1;
+	}
+
+	if (state->mistate->cur_slots >= state->mistate->max_slots ||
+		state->mistate->cur_size >= state->mistate->max_size)
+	{
+		heap_multi_insert_flush(state);
+	}
+}
+
+/*
+ * Flush the tuples from buffered slots if any.
+ *
+ * This function can be useful in cases, where one of the partition can not use
+ * multi inserts but others can and they have buffered few slots so far, which
+ * need to be flushed for visibility, before the partition that doesn't support
+ * can proceed with single inserts.
+ */
+void heap_multi_insert_flush(TableInsertState *state)
+{
+	Assert(state->mistate != NULL);
+
+	if (state->mistate->cur_slots > 0)
+	{
+		MemoryContext oldcontext;
+
+		oldcontext = MemoryContextSwitchTo(state->mistate->context);
+
+		heap_multi_insert(state->rel, state->mistate->slots,
+						state->mistate->cur_slots, state->cid,
+						state->options, state->bistate);
+
+		MemoryContextReset(state->mistate->context);
+		MemoryContextSwitchTo(oldcontext);
+
+		/*
+		 * Do not clear the slots always. Sometimes callers may want the slots
+		 * for index insertions or after row trigger executions in which case
+		 * they have to clear the tuples before using for the next insert
+		 * batch.
+		 */
+		if (state->mistate->clear_slots)
+		{
+			int i;
+
+			for (i = 0; i < state->mistate->cur_slots; i++)
+				ExecClearTuple(state->mistate->slots[i]);
+		}
+
+		state->mistate->cur_slots = 0;
+		state->mistate->cur_size = 0;
+		state->mistate->cur_tup_size = -1;
+		state->mistate->flushed = true;
+	}
+	else
+		state->mistate->flushed = false;
+}
+
+/*
+ * Clean up the TableInsertState.
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function. Buffered slots are
+ * dropped, short-lived memory context is delted and mistate is freed up.
+ *
+ * Free up the bulk insert state using FreeBulkInsertState() if exists.
+ *
+ * Free up TableInsertState.
+ */
+void heap_insert_end(TableInsertState *state)
+{
+	if (state->mistate)
+	{
+		int i;
+
+		/*
+		 * Ensure to flush all the remaining buffers with
+		 * heap_multi_insert_flush before calling heap_insert_end.
+		 */
+		Assert(state->mistate->cur_slots == 0);
+
+		for (i = 0; i < state->mistate->max_slots && state->mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(state->mistate->slots[i]);
+
+		if (state->mistate->context)
+			MemoryContextDelete(state->mistate->context);
+
+		pfree(state->mistate->slots);
+		pfree(state->mistate);
+	}
+
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3eea215b85..eb3da12d9c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2554,6 +2554,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 58de0743ba..6bec0659e4 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 4c90ac5236..f93b1a49a8 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also
+	 * being used in GetTupleSize(), hence ensure to have the same changes or
+	 * fixes here and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code is being used from tts_virtual_materialize().
+		 * Ensure to have the same changes or fixes here and also in
+		 * tts_virtual_materialize().
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 54b2eb7378..d938efbbc5 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -139,6 +139,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool alloc_bistate,
+										   bool is_multi, int32 mi_max_slots,
+										   int64 mi_max_size);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 387eb34a61..60d4cd8c8b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -128,6 +128,80 @@ typedef struct TM_FailureData
 	bool		traversed;
 } TM_FailureData;
 
+/* Holds the multi insert related information. */
+typedef struct TableMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+
+	/* Array of buffered slots. */
+	TupleTableSlot      **slots;
+
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+
+	/* Number of slots that are currently buffered. */
+	int32               cur_slots;
+
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold. This parameter is optional.
+	 */
+	int64               max_size;
+
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered. If
+	 * max_size is specified, then flush the buffered slots when cur_size >=
+	 * max_size.
+	 */
+	int64               cur_size;
+
+	/*
+	 * Current tuple size (in bytes). Set this each time before calling
+	 * table_multi_insert_v2, if the tuple size is known (as in the case of
+	 * COPY where each tuple size known after parsing the input lines).
+	 * table_multi_insert_v2 will not calculate the tuple size again to add to
+	 * cur_size, it just uses this value. table_multi_insert_v2 will set it to
+	 * -1 after it uses. Default is -1.
+	 */
+	int64               cur_tup_size;
+
+	/*
+	 * Whether to clear the buffered slots after each flush? If the relation
+	 * has indexes or after row triggers, the buffered slots are required
+	 * outside table_multi_insert_v2(), in which case, clean them in the caller
+	 * using ExecClearTuple() outside the table_multi_insert_v2. If true,
+	 * which is default, table_multi_insert_v2() will clear the slots.
+	 *
+	 * It is good to set clear_slots (by looking at whether the table is having
+	 * any indexes or after row triggers) at the beginning of multi insert
+	 * operation. It is better to set it to false for the final flush before
+	 * ending the multi insert operation. This is to save ExecClearTuple cost
+	 * while flushing as the buffered slots will anyways be dropped in the end
+	 * operation.
+	 */
+	bool                clear_slots;
+
+	/*
+	 * Initially false, set to true by table_multi_insert whenever it flushes
+	 * the buffered slots. Caller can use this flag to insert into indexes or
+	 * execute after row triggers and so on if any.
+	 */
+	bool				flushed;
+} TableMultiInsertState;
+
+/* Holds the table insert state. */
+typedef struct TableInsertState
+{
+	Relation                rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData     *bistate;
+	/* Mulit insert state if requested, otherwise NULL. */
+	struct TableMultiInsertState   *mistate;
+	CommandId               cid;
+	int                     options;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -376,6 +450,19 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool alloc_bistate,
+											 bool is_multi, int32 mi_max_slots,
+											 int64 mi_max_size);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1237,6 +1324,39 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi, int32 mi_max_slots,
+				   int64 mi_max_size)
+{
+	return rel->rd_tableam->tuple_insert_begin(rel, cid, options, alloc_bistate,
+										is_multi, mi_max_slots, mi_max_size);
+}
+
+static inline void
+table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index f7df70b5ab..d7c284d8e3 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



  [application/octet-stream] v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch (2.1K, 3-v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch)
  download | inline diff:
From 72351f7dbf0353ec1fcd8bb14a1563806eb62218 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 8 Dec 2020 12:20:17 +0530
Subject: [PATCH v1] ATRewriteTable With New Single Insert Table AM

This patch adds new single insert table access method to ALTER TABLE
rewrite table code.
---
 src/backend/commands/tablecmds.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 46f1637e77..80f013036e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5182,10 +5182,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	int			i;
 	ListCell   *l;
 	EState	   *estate;
-	CommandId	mycid;
-	BulkInsertState bistate;
-	int			ti_options;
 	ExprState  *partqualstate = NULL;
+	TableInsertState *istate = NULL;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -5206,16 +5204,13 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	 */
 	if (newrel)
 	{
-		mycid = GetCurrentCommandId(true);
-		bistate = GetBulkInsertState();
-		ti_options = TABLE_INSERT_SKIP_FSM;
-	}
-	else
-	{
-		/* keep compiler quiet about using these uninitialized */
-		mycid = 0;
-		bistate = NULL;
-		ti_options = 0;
+		istate = table_insert_begin(newrel,
+									GetCurrentCommandId(true),
+									TABLE_INSERT_SKIP_FSM,
+									true,
+									false,
+									-1,
+									-1);
 	}
 
 	/*
@@ -5510,8 +5505,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
-				table_tuple_insert(newrel, insertslot, mycid,
-								   ti_options, bistate);
+				table_insert_v2(istate, insertslot);
 
 			ResetExprContext(econtext);
 
@@ -5532,7 +5526,9 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	table_close(oldrel, NoLock);
 	if (newrel)
 	{
-		FreeBulkInsertState(bistate);
+		int ti_options = istate->options;
+
+		table_insert_end(istate);
 
 		table_finish_bulk_insert(newrel, ti_options);
 
-- 
2.25.1



  [application/octet-stream] v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch (22.4K, 4-v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch)
  download | inline diff:
From 59bd7de19762241fa53eed6f64510f022345b14b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 8 Dec 2020 13:01:32 +0530
Subject: [PATCH v1] COPY With New Multi and Single Insert Table AM

This patch adds new single and multi insert table access method to
COPY code.
---
 src/backend/commands/copyfrom.c | 483 +++++++++++---------------------
 1 file changed, 163 insertions(+), 320 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1b14e9a6eb..8376af32f5 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -45,10 +45,10 @@
 #include "utils/snapmgr.h"
 
 /*
- * No more than this many tuples per CopyMultiInsertBuffer
+ * No more than this many tuples per multi insert buffer
  *
  * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
+ * multi insert buffer items stored in CopyMultiInsertInfo's
  * multiInsertBuffers list.  Increasing this can cause quadratic growth in
  * memory requirements during copies into partitioned tables with a large
  * number of partitions.
@@ -67,31 +67,11 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
-												 * stream */
+	TableInsertState *istate;
+	/* Line # of tuple in copy stream. */
+	uint64		linenos[MAX_BUFFERED_TUPLES];
 } CopyMultiInsertBuffer;
 
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState	cstate;			/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */
-	CommandId	mycid;			/* Command Id used for COPY */
-	int			ti_options;		/* table insert options */
-} CopyMultiInsertInfo;
-
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
@@ -204,227 +184,130 @@ limit_printout_length(const char *str)
 	return res;
 }
 
-/*
- * Allocate memory and initialize a new CopyMultiInsertBuffer for this
- * ResultRelInfo.
- */
-static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+static void
+InitCopyMultiInsertBufferInfo(List **mirri, ResultRelInfo *rri,
+							  CommandId mycid, int ti_options)
 {
 	CopyMultiInsertBuffer *buffer;
+	TriggerDesc *trigdesc = rri->ri_TrigDesc;
 
-	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
-	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
-	buffer->nused = 0;
-
-	return buffer;
-}
+	buffer = (CopyMultiInsertBuffer *) palloc0(sizeof(CopyMultiInsertBuffer));
 
-/*
- * Make a new buffer for this ResultRelInfo.
- */
-static inline void
-CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
-{
-	CopyMultiInsertBuffer *buffer;
+	buffer->istate = table_insert_begin(rri->ri_RelationDesc,
+										mycid,
+										ti_options,
+										true,
+										true,
+										MAX_BUFFERED_TUPLES,
+										MAX_BUFFERED_BYTES);
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	if (rri->ri_NumIndices ||
+		(trigdesc && (trigdesc->trig_insert_after_row ||
+		trigdesc->trig_insert_new_table)))
+		buffer->istate->mistate->clear_slots = false;
 
-	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
-	/* Record that we're tracking this buffer */
-	miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-}
-
-/*
- * Initialize an already allocated CopyMultiInsertInfo.
- *
- * If rri is a non-partitioned table then a CopyMultiInsertBuffer is set up
- * for that table.
- */
-static void
-CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						CopyFromState cstate, EState *estate, CommandId mycid,
-						int ti_options)
-{
-	miinfo->multiInsertBuffers = NIL;
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
-	miinfo->cstate = cstate;
-	miinfo->estate = estate;
-	miinfo->mycid = mycid;
-	miinfo->ti_options = ti_options;
 
-	/*
-	 * Only setup the buffer when not dealing with a partitioned table.
-	 * Buffers for partitioned tables will just be setup when we need to send
-	 * tuples their way for the first time.
-	 */
-	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
+	*mirri = lappend(*mirri, rri);
 }
 
-/*
- * Returns true if the buffers are full
- */
-static inline bool
-CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
-{
-	if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
-		miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
-		return true;
-	return false;
-}
-
-/*
- * Returns true if we have no buffered tuples
- */
-static inline bool
-CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
-{
-	return miinfo->bufferedTuples == 0;
-}
-
-/*
- * Write the tuples stored in 'buffer' out to the table.
- */
-static inline void
-CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
+static void
+HandleAfterRowEvents(ResultRelInfo *rri, EState *estate,
+					 CopyFromState cstate, int32 cur_slots)
 {
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
-	CopyFromState	cstate = miinfo->cstate;
-	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
+	int i;
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	uint64		save_cur_lineno = cstate->cur_lineno;
 	bool		line_buf_valid = cstate->line_buf_valid;
-	int			nused = buffer->nused;
-	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
-	TupleTableSlot **slots = buffer->slots;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fail.
-	 */
 	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
-
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	for (i = 0; i < cur_slots; i++)
 	{
 		/*
-		 * If there are any indexes, update them for all the inserted tuples,
-		 * and run AFTER ROW INSERT triggers.
-		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		* If there are any indexes, update them for all the inserted tuples,
+		* and run AFTER ROW INSERT triggers.
+		*/
+		if (rri->ri_NumIndices > 0)
 		{
-			List	   *recheckIndexes;
+			List       *recheckIndexes;
 
 			cstate->cur_lineno = buffer->linenos[i];
 			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, NULL,
-									  NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
+					ExecInsertIndexTuples(rri,
+										  istate->mistate->slots[i], estate,
+										  false,
+										  NULL,
+										  NULL);
+
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mistate->slots[i],
+								 recheckIndexes,
 								 cstate->transition_capture);
+
 			list_free(recheckIndexes);
 		}
 
 		/*
-		 * There's no indexes, but see if we need to run AFTER ROW INSERT
-		 * triggers anyway.
-		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		* There's no indexes, but see if we need to run AFTER ROW INSERT
+		* triggers anyway.
+		*/
+		else if (rri->ri_TrigDesc != NULL &&
+				(rri->ri_TrigDesc->trig_insert_after_row ||
+				 rri->ri_TrigDesc->trig_insert_new_table))
 		{
 			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mistate->slots[i],
+								 NULL,
+								 cstate->transition_capture);
 		}
 
-		ExecClearTuple(slots[i]);
+		ExecClearTuple(istate->mistate->slots[i]);
 	}
 
-	/* Mark that all slots are free */
-	buffer->nused = 0;
-
 	/* reset cur_lineno and line_buf_valid to what they were */
 	cstate->line_buf_valid = line_buf_valid;
 	cstate->cur_lineno = save_cur_lineno;
 }
 
-/*
- * Drop used slots and free member for this buffer.
- *
- * The buffer must be flushed before cleanup.
- */
-static inline void
-CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
-							 CopyMultiInsertBuffer *buffer)
+static void
+CopyMultiInsertBufferTuple(ResultRelInfo *rri, TupleTableSlot *slot,
+					   CopyFromState cstate, EState *estate)
 {
-	int			i;
-
-	/* Ensure buffer was flushed */
-	Assert(buffer->nused == 0);
-
-	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-
-	FreeBulkInsertState(buffer->bistate);
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	int32 cur_slots = istate->mistate->cur_slots;
 
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	buffer->linenos[istate->mistate->cur_slots] = cstate->cur_lineno;
+	istate->mistate->cur_tup_size = cstate->line_buf.len;
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	table_multi_insert_v2(buffer->istate, slot);
 
-	pfree(buffer);
+	if (istate->mistate->flushed)
+		HandleAfterRowEvents(rri, estate, cstate, cur_slots);
 }
 
-/*
- * Write out all stored tuples in all buffers out to the tables.
- *
- * Once flushed we also trim the tracked buffers list down to size by removing
- * the buffers created earliest first.
- *
- * Callers should pass 'curr_rri' is the ResultRelInfo that's currently being
- * used.  When cleaning up old buffers we'll never remove the one for
- * 'curr_rri'.
- */
-static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+static void
+CopyMulitInsertFlushBuffers(List **mirri, ResultRelInfo *curr_rri,
+							CopyFromState cstate, EState *estate)
 {
 	ListCell   *lc;
 
-	foreach(lc, miinfo->multiInsertBuffers)
+	foreach(lc, *mirri)
 	{
-		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
+		int32 cur_slots = istate->mistate->cur_slots;
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
-	}
+		table_multi_insert_flush(istate);
 
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
+		if (istate->mistate->flushed)
+			HandleAfterRowEvents(rri, estate, cstate, cur_slots);
+	}
 
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
@@ -432,87 +315,62 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	 * likely that these older ones will be needed than the ones that were
 	 * just created.
 	 */
-	while (list_length(miinfo->multiInsertBuffers) > MAX_PARTITION_BUFFERS)
+	while (list_length(*mirri) > MAX_PARTITION_BUFFERS)
 	{
+		ResultRelInfo *rri;
 		CopyMultiInsertBuffer *buffer;
+		TableInsertState *istate;
+		int ti_options;
 
-		buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+		rri = (ResultRelInfo *) linitial(*mirri);
 
 		/*
 		 * We never want to remove the buffer that's currently being used, so
 		 * if we happen to find that then move it to the end of the list.
 		 */
-		if (buffer->resultRelInfo == curr_rri)
+		if (rri == curr_rri)
 		{
-			miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-			miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-			buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+			*mirri = list_delete_first(*mirri);
+			*mirri = lappend(*mirri, rri);
+			rri = (ResultRelInfo *) linitial(*mirri);
 		}
 
-		CopyMultiInsertBufferCleanup(miinfo, buffer);
-		miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-	}
-}
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+		istate->mistate->clear_slots = true;
+		ti_options = istate->options;
 
-/*
- * Cleanup allocated buffers and free memory
- */
-static inline void
-CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
-{
-	ListCell   *lc;
+		table_insert_end(istate);
 
-	foreach(lc, miinfo->multiInsertBuffers)
-		CopyMultiInsertBufferCleanup(miinfo, lfirst(lc));
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	list_free(miinfo->multiInsertBuffers);
+		*mirri = list_delete_first(*mirri);
+	}
 }
 
-/*
- * Get the next TupleTableSlot that the next tuple should be stored in.
- *
- * Callers must ensure that the buffer is not full.
- *
- * Note: 'miinfo' is unused but has been included for consistency with the
- * other functions in this area.
- */
-static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+static void
+CopyMulitInsertDropBuffers(List *mirri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
-
-	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	ListCell   *lc;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
-}
+	foreach(lc, mirri)
+	{
+		int ti_options;
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
 
-/*
- * Record the previously reserved TupleTableSlot that was reserved by
- * CopyMultiInsertInfoNextFreeSlot as being consumed.
- */
-static inline void
-CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						 TupleTableSlot *slot, int tuplen, uint64 lineno)
-{
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		istate->mistate->clear_slots = true;
+		ti_options = istate->options;
 
-	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+		table_insert_end(istate);
 
-	/* Store the line number so we can properly report any errors later */
-	buffer->linenos[buffer->nused] = lineno;
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	/* Record this slot as being used */
-	buffer->nused++;
+		pfree(buffer);
+	}
 
-	/* Update how many tuples are stored and their size */
-	miinfo->bufferedTuples++;
-	miinfo->bufferedBytes += tuplen;
+	list_free(mirri);
 }
 
 /*
@@ -527,20 +385,20 @@ CopyFrom(CopyFromState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *singleslot = NULL;
+	TupleTableSlot *slot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
-	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
-	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	List	   *multi_insert_rris = NULL;
+	TableInsertState *istate = NULL;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -723,7 +581,7 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables we can't support multi-inserts when there
 		 * are any statement level insert triggers. It might be possible to
 		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * now, CopyMulitInsertFlushBuffers expects that any before row insert
 		 * and statement level insert triggers are on the same relation.
 		 */
 		insertMethod = CIM_SINGLE;
@@ -771,22 +629,22 @@ CopyFrom(CopyFromState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		/*
+		* Only setup the buffer when not dealing with a partitioned table.
+		* Buffers for partitioned tables will just be setup when we need to
+		* send tuples their way for the first time.
+		*/
+		if (!proute)
+			InitCopyMultiInsertBufferInfo(&multi_insert_rris, resultRelInfo,
+										  mycid, ti_options);
 	}
 
 	/*
-	 * If not using batch mode (which allocates slots as needed) set up a
-	 * tuple slot too. When inserting into a partitioned table, we also need
-	 * one, even if we might batch insert, to read the tuple in the root
-	 * partition's form.
+	 * Set up a tuple slot to which the input data from copy stream is read
+	 * into and used for inserts into table.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
-	{
-		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
-									   &estate->es_tupleTable);
-		bistate = GetBulkInsertState();
-	}
+	slot = table_slot_create(resultRelInfo->ri_RelationDesc,
+									&estate->es_tupleTable);
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 								  resultRelInfo->ri_TrigDesc->trig_insert_before_row);
@@ -824,19 +682,8 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
-		{
-			myslot = singleslot;
-			Assert(myslot != NULL);
-		}
-		else
-		{
-			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
-
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
-		}
+		myslot = slot;
+		Assert(myslot != NULL);
 
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
@@ -904,21 +751,22 @@ CopyFrom(CopyFromState cstate)
 				if (leafpart_use_multi_insert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
-						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+						InitCopyMultiInsertBufferInfo(&multi_insert_rris,
+													  resultRelInfo, mycid,
+													  ti_options);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (insertMethod == CIM_MULTI_CONDITIONAL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMulitInsertFlushBuffers(&multi_insert_rris,
+												resultRelInfo, cstate, estate);
 				}
 
-				if (bistate != NULL)
-					ReleaseBulkInsertStatePin(bistate);
+				if (istate && istate->bistate)
+					ReleaseBulkInsertStatePin(istate->bistate);
 				prevResultRelInfo = resultRelInfo;
 			}
 
@@ -960,8 +808,8 @@ CopyFrom(CopyFromState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+											  &estate->es_tupleTable);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -1033,24 +881,9 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
-					CopyMultiInsertInfoStore(&multiInsertInfo,
-											 resultRelInfo, myslot,
-											 cstate->line_buf.len,
-											 cstate->cur_lineno);
-
-					/*
-					 * If enough inserts have queued up, then flush all
-					 * buffers out to their tables.
-					 */
-					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertBufferTuple(resultRelInfo, myslot, cstate,
+											   estate);
 				}
 				else
 				{
@@ -1076,9 +909,21 @@ CopyFrom(CopyFromState cstate)
 					}
 					else
 					{
+						if (!istate)
+						{
+							istate = table_insert_begin(resultRelInfo->ri_RelationDesc,
+														mycid,
+														ti_options,
+														true,
+														false,
+														-1,
+														-1);
+						}
+
+						istate->rel = resultRelInfo->ri_RelationDesc;
+
 						/* OK, store the tuple and create index entries for it */
-						table_tuple_insert(resultRelInfo->ri_RelationDesc,
-										   myslot, mycid, ti_options, bistate);
+						table_insert_v2(istate, myslot);
 
 						if (resultRelInfo->ri_NumIndices > 0)
 							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
@@ -1108,16 +953,14 @@ CopyFrom(CopyFromState cstate)
 
 	/* Flush any remaining buffered tuples */
 	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+		CopyMulitInsertFlushBuffers(&multi_insert_rris, resultRelInfo,
+									cstate, estate);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (bistate != NULL)
-		FreeBulkInsertState(bistate);
+	if (istate)
+		table_insert_end(istate);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1144,7 +987,7 @@ CopyFrom(CopyFromState cstate)
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+		CopyMulitInsertDropBuffers(multi_insert_rris);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
-- 
2.25.1



  [application/octet-stream] v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch (6.5K, 5-v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch)
  download | inline diff:
From e777510f323c09839a6dd9253a327f5dd4172a8b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 8 Dec 2020 12:14:22 +0530
Subject: [PATCH v1] CTAS and REFRESH Mat View With New Multi Insert Table AM

This patch adds new multi insert table access methods to
CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED
VIEW.
---
 src/backend/commands/createas.c | 57 ++++++++++++++++++++-------------
 src/backend/commands/matview.c  | 43 ++++++++++++++-----------
 2 files changed, 59 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 6bf6c5a310..4580cbae1d 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -523,22 +521,28 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	if (is_matview && !into->skipData)
 		SetMatViewPopulatedState(intoRelationDesc, true);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->rel = intoRelationDesc;
-	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
-
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * bulk inserts and multi inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		myState->istate = table_insert_begin(intoRelationDesc,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 true,
+											 true,
+											 1000, /* To change it to a macro */
+											 65535); /* To change it to a macro */
+	}
 	else
-		myState->bistate = NULL;
+		myState->istate = NULL;
+
+	/*
+	 * Fill private fields of myState for use by later routines
+	 */
+	myState->rel = intoRelationDesc;
+	myState->reladdr = intoRelationAddr;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -566,11 +570,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->istate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -585,12 +585,23 @@ static void
 intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	IntoClause *into = myState->into;
+	int			ti_options;
 
-	if (!into->skipData)
+	if (!myState->into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		ti_options = myState->istate->options;
+
+		/*
+		 * Do not let clearing buffered slots while flushing as they will be
+		 * anyways dropped by table_insert_end.
+		 */
+		myState->istate->mistate->clear_slots = false;
+
+		table_multi_insert_flush(myState->istate);
+
+		table_insert_end(myState->istate);
+
+		table_finish_bulk_insert(myState->rel, ti_options);
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index cfc63915f3..610c7ede78 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -466,10 +463,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	/*
 	 * Fill private fields of myState for use by later routines
 	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->istate = table_insert_begin(transientrel,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN,
+										 true,
+										 true,
+										 1000, /* To change it to a macro */
+										 65535); /* To change it to a macro */
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -494,12 +494,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->istate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -513,14 +508,26 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	int			ti_options;
+	Relation	transientrel;
+
+	ti_options = myState->istate->options;
+	transientrel = myState->istate->rel;
+
+	/*
+	 * Do not let clearing buffered slots while flushing as they will be
+	 * anyways dropped by table_insert_end.
+	 */
+	myState->istate->mistate->clear_slots = false;
+
+	table_multi_insert_flush(myState->istate);
 
-	FreeBulkInsertState(myState->bistate);
+	table_insert_end(myState->istate);
 
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_finish_bulk_insert(transientrel, ti_options);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-11 13:47  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 0 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-11 13:47 UTC (permalink / raw)
  To: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>

On Tue, Dec 8, 2020 at 6:27 PM Bharath Rupireddy <
[email protected]> wrote:
> Hi,
>
> Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
> Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
> slots allocation, maintenance, decision to flush and clean up, need to
> be implemented outside the table_multi_insert() API. The main problem
> is that it fails to take into consideration the underlying storage
> engine capabilities, for more details of this point refer to a
> discussion in multi inserts in CTAS thread[1]. This also creates a lot
> of duplicate code which is more error prone and not maintainable.
>
> More importantly, in another thread [3] @Andres Freund suggested to
> have table insert APIs in such a way that they look more like 'scan'
> APIs i.e. insert_begin, insert, insert_end. The main advantages doing
> this are(quoting from his statement in [3]) - "more importantly it'd
> allow an AM to optimize operations across multiple inserts, which is
> important for column stores."
>
> I propose to introduce new table access methods for both multi and
> single inserts based on the prototype suggested by Andres in [3]. Main
> design goal of these new APIs is to give flexibility to tableam
> developers in implementing multi insert logic dependent on the
> underlying storage engine.
>
> Below are the APIs. I suggest to have a look at
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
> of the new data structure and the API functionality. Note that
> temporarily I used XX_v2, we can change it later.
>
> TableInsertState* table_insert_begin(initial_args);
> void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_flush(TableInsertState *state);
> void table_insert_end(TableInsertState *state);
>
> I'm attaching a few patches(just to show that these APIs work, avoids
> a lot of duplicate code and makes life easier). Better commenting can
> be added later. If these APIs and patches look okay, we can even
> consider replacing them in other places such as nodeModifyTable.c and
> so on.
>
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
> introduces new table access methods for multi and single inserts. Also
> implements/rearranges the outside code for heap am into these new
> APIs.
> v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
> ---> adds new multi insert table access methods to CREATE TABLE AS,
> CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
> v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
> new single insert table access method to ALTER TABLE rewrite table
> code.
> v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
> new single and multi insert table access method to COPY code.
>
> Thoughts?
>
> [1] -
https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
> [2] -
https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
> [3] -
https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

Added this to commitfest to get it reviewed further.

https://commitfest.postgresql.org/31/2871/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-17 05:05  Justin Pryzby <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Justin Pryzby @ 2020-12-17 05:05 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

Typos:

+ *  1) Specify is_multi as true, then multi insert state is allcoated.
=> allocated
+ * dropped, short-lived memory context is delted and mistate is freed up.
=> deleted
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
=> minimal
+       /* Mulit insert state if requested, otherwise NULL. */
=> multi
+ * Buffer the input slots and insert the tuples from the buffered slots at a
=> *one* at a time ?
+ * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
=> I guess you mean max_size

This variable could use a better name:
+CopyMulitInsertFlushBuffers(List **mirri, ..
mirri is fine for a local variable like an element of a struture/array, or a
loop variable, but not for a function parameter which is an "List" of arbitrary
pointers.

I think this comment needs to be updated (again) for the removal of the Info
structure.
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
+ * multi insert buffer items stored in CopyMultiInsertInfo's

I think the COPY patch should be 0002 (or maybe merged into 0001).
There's some superfluous whitespace (and other) changes there which make the
patch unnecessarily long.

You made the v2 insert interface a requirement for all table AMs.
Should it be optional, and fall back to simple inserts if not implemented ? 

For CTAS, I think we need to consider Paul's idea here.
https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com
Conceivably, tableam should support something like that for arbitrary AMs
("insert into a new table for which we have exclusive lock").  I think that AM
method should also be optional.  It should be possible to implement a minimal
AM without implementing every available optimization, which may not apply to
all AMs, anyway.

-- 
Justin

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-17 11:05  Bharath Rupireddy <[email protected]>
  parent: Justin Pryzby <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-17 11:05 UTC (permalink / raw)
  To: Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

Thanks a lot for taking a look at the patches.

On Thu, Dec 17, 2020 at 10:35 AM Justin Pryzby <[email protected]> wrote:
> Typos:
>
> + *  1) Specify is_multi as true, then multi insert state is allcoated.
> => allocated
> + * dropped, short-lived memory context is delted and mistate is freed up.
> => deleted
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
> => minimal
> +       /* Mulit insert state if requested, otherwise NULL. */
> => multi
> + * Buffer the input slots and insert the tuples from the buffered slots at a
> => *one* at a time ?
> + * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
> => I guess you mean max_size
>
> This variable could use a better name:
> +CopyMulitInsertFlushBuffers(List **mirri, ..
> mirri is fine for a local variable like an element of a struture/array, or a
> loop variable, but not for a function parameter which is an "List" of arbitrary
> pointers.
>
> I think this comment needs to be updated (again) for the removal of the Info
> structure.
> - * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
> + * multi insert buffer items stored in CopyMultiInsertInfo's
>
> There's some superfluous whitespace (and other) changes there which make the
> patch unnecessarily long.

I will correct them and post the next version of the patch set. Before
that, I would like to have the discussion and thoughts on the APIs and
their usefulness.

> I think the COPY patch should be 0002 (or maybe merged into 0001).

I can make it as a 0002 patch.

> You made the v2 insert interface a requirement for all table AMs.
> Should it be optional, and fall back to simple inserts if not implemented ?

I tried to implement the APIs mentioned by Andreas here in [1]. I just
used v2 table am APIs in existing table_insert places to show that it
works. Having said that, if you notice, I moved the bulk insert
allocation and deallocation to the new APIs table_insert_begin() and
table_insert_end() respectively, which make them tableam specific.
Currently, the bulk insert state is outside and independent of
tableam. I think we should not make bulk insert state allocation and
deallocation tableam specific. Thoughts?

[1] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

> For CTAS, I think we need to consider Paul's idea here.
> https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com

IMO, if we were to allow those raw insert APIs to perform parallel
inserts, then we would be reimplementing the existing table_insert or
table_mulit_insert API by having some sort of shared memory for
coordinating among workers and so on, may be in some other way. Yes,
we could avoid all the existing locking and shared buffers with those
raw insert APIs, I also feel that we can now do that with the existing
insert APIs for unlogged tables and bulk insert state. To me, the raw
insert APIs after implementing them for the parallel inserts, they
would look like the existing insert APIs for unlogged tables and with
bulk insert state. Thoughts?

Please have a look at [1] for detailed comment.

[1] https://www.postgresql.org/message-id/CALj2ACX0u%3DQvB7GHLEqeVYwvs2eQS7%3D-cEuem7ZaF%3Dp%2BqZ0ikA%40...

> Conceivably, tableam should support something like that for arbitrary AMs
> ("insert into a new table for which we have exclusive lock").  I think that AM
> method should also be optional.  It should be possible to implement a minimal
> AM without implementing every available optimization, which may not apply to
> all AMs, anyway.

I could not understand this point well. Maybe more thoughts help me here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-17 20:44  Justin Pryzby <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Justin Pryzby @ 2020-12-17 20:44 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Thu, Dec 17, 2020 at 04:35:33PM +0530, Bharath Rupireddy wrote:
> > You made the v2 insert interface a requirement for all table AMs.
> > Should it be optional, and fall back to simple inserts if not implemented ?
> 
> I tried to implement the APIs mentioned by Andreas here in [1]. I just
> used v2 table am APIs in existing table_insert places to show that it
> works. Having said that, if you notice, I moved the bulk insert
> allocation and deallocation to the new APIs table_insert_begin() and
> table_insert_end() respectively, which make them tableam specific.

I mean I think it should be optional for a tableam to support the optimized
insert routines.  Here, you've made it a requirement.

+       Assert(routine->tuple_insert_begin != NULL);
+       Assert(routine->tuple_insert_v2 != NULL);
+       Assert(routine->multi_insert_v2 != NULL);
+       Assert(routine->multi_insert_flush != NULL);
+       Assert(routine->tuple_insert_end != NULL);

+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+       state->rel->rd_tableam->multi_insert_v2(state, slot);
+}

If multi_insert_v2 == NULL, I think table_multi_insert_v2() would just call
table_insert_v2(), and begin/flush/end would do nothing.  If
table_multi_insert_v2!=NULL, then you should assert that the other routines are
provided.

Are you thinking that TableInsertState would eventually have additional
attributes which would apply to other tableams, but not to heap ?  Is
heap_insert_begin() really specific to heap ?  It's allocating and populating a
structure based on its arguments, but those same arguments would be passed to
every other AM's insert_begin routine, too.  Do you need a more flexible data
structure, something that would also accomodate extensions?  I'm thinking of
reloptions as a loose analogy.

-- 
Justin

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-18 02:09  Bharath Rupireddy <[email protected]>
  parent: Justin Pryzby <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-18 02:09 UTC (permalink / raw)
  To: Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <[email protected]> wrote:
> On Thu, Dec 17, 2020 at 04:35:33PM +0530, Bharath Rupireddy wrote:
> > > You made the v2 insert interface a requirement for all table AMs.
> > > Should it be optional, and fall back to simple inserts if not implemented ?
> >
> > I tried to implement the APIs mentioned by Andreas here in [1]. I just
> > used v2 table am APIs in existing table_insert places to show that it
> > works. Having said that, if you notice, I moved the bulk insert
> > allocation and deallocation to the new APIs table_insert_begin() and
> > table_insert_end() respectively, which make them tableam specific.
>
> I mean I think it should be optional for a tableam to support the optimized
> insert routines.  Here, you've made it a requirement.
>
> +       Assert(routine->tuple_insert_begin != NULL);
> +       Assert(routine->tuple_insert_v2 != NULL);
> +       Assert(routine->multi_insert_v2 != NULL);
> +       Assert(routine->multi_insert_flush != NULL);
> +       Assert(routine->tuple_insert_end != NULL);

+1 to make them optional. I will change.

> +static inline void
> +table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> +       state->rel->rd_tableam->multi_insert_v2(state, slot);
> +}
>
> If multi_insert_v2 == NULL, I think table_multi_insert_v2() would just call
> table_insert_v2(), and begin/flush/end would do nothing.  If
> table_multi_insert_v2!=NULL, then you should assert that the other routines are
> provided.

What should happen if both multi_insert_v2 and insert_v2 are NULL?
Should we error out from table_insert_v2()?

> Are you thinking that TableInsertState would eventually have additional
> attributes which would apply to other tableams, but not to heap ?  Is
> heap_insert_begin() really specific to heap ?  It's allocating and populating a
> structure based on its arguments, but those same arguments would be passed to
> every other AM's insert_begin routine, too.  Do you need a more flexible data
> structure, something that would also accomodate extensions?  I'm thinking of
> reloptions as a loose analogy.

I could not think of other tableam attributes now. But +1 to have that
kind of flexible structure for TableInsertState. So, it can have
tableam type and attributes within the union for each type.

> I moved the bulk insert allocation and deallocation to the new APIs table_insert_begin()
> and table_insert_end() respectively, which make them tableam specific.
> Currently, the bulk insert state is outside and independent of
> tableam. I think we should not make bulk insert state allocation and
> deallocation tableam specific.

Any thoughts on the above point?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-18 17:54  Justin Pryzby <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 43+ messages in thread

From: Justin Pryzby @ 2020-12-18 17:54 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <[email protected]> wrote:
> > Are you thinking that TableInsertState would eventually have additional
> > attributes which would apply to other tableams, but not to heap ?  Is
> > heap_insert_begin() really specific to heap ?  It's allocating and populating a
> > structure based on its arguments, but those same arguments would be passed to
> > every other AM's insert_begin routine, too.  Do you need a more flexible data
> > structure, something that would also accomodate extensions?  I'm thinking of
> > reloptions as a loose analogy.
> 
> I could not think of other tableam attributes now. But +1 to have that
> kind of flexible structure for TableInsertState. So, it can have
> tableam type and attributes within the union for each type.

Right now you have heap_insert_begin(), and I asked if it was really
heap-specific.  Right now, it populates a struct based on a static list of
arguments, which are what heap uses.  

If you were to implement a burp_insert_begin(), how would it differ from
heap's?  With the current API, they'd (have to) be the same, which means either
that it should apply to all AMs (or have a "default" implementation that can be
overridden by an AM), or that this API assumes that other AMs will want to do
exactly what heap does, and fails to allow other AMs to implement optimizations
for bulk inserts as claimed.

I don't think using a "union" solves the problem, since it can only accommodate
core AMs, and not extensions, so I suggested something like reloptions, which
have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
toast.autovacuum_enabled).

-- 
Justin

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-21 07:42  Bharath Rupireddy <[email protected]>
  parent: Justin Pryzby <[email protected]>
  1 sibling, 0 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-21 07:42 UTC (permalink / raw)
  To: Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Fri, Dec 18, 2020 at 11:24 PM Justin Pryzby <[email protected]> wrote:
> On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <[email protected]> wrote:
> > > Are you thinking that TableInsertState would eventually have additional
> > > attributes which would apply to other tableams, but not to heap ?  Is
> > > heap_insert_begin() really specific to heap ?  It's allocating and populating a
> > > structure based on its arguments, but those same arguments would be passed to
> > > every other AM's insert_begin routine, too.  Do you need a more flexible data
> > > structure, something that would also accomodate extensions?  I'm thinking of
> > > reloptions as a loose analogy.
> >
> > I could not think of other tableam attributes now. But +1 to have that
> > kind of flexible structure for TableInsertState. So, it can have
> > tableam type and attributes within the union for each type.
>
> Right now you have heap_insert_begin(), and I asked if it was really
> heap-specific.  Right now, it populates a struct based on a static list of
> arguments, which are what heap uses.
>
> If you were to implement a burp_insert_begin(), how would it differ from
> heap's?  With the current API, they'd (have to) be the same, which means either
> that it should apply to all AMs (or have a "default" implementation that can be
> overridden by an AM), or that this API assumes that other AMs will want to do
> exactly what heap does, and fails to allow other AMs to implement optimizations
> for bulk inserts as claimed.
>
> I don't think using a "union" solves the problem, since it can only accommodate
> core AMs, and not extensions, so I suggested something like reloptions, which
> have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> toast.autovacuum_enabled).

IIUC, your suggestion is to make the heap options such as
alloc_bistate(bulk insert state is required or not), mi_max_slots
(number of maximum buffered slots/tuples) and mi_max_size (the maximum
tuple size of the buffered slots) as reloptions with some default
values in reloptions.c under RELOPT_KIND_HEAP category so that they
can be modified by users on a per table basis. And likewise other
tableam options can be added by the tableam developers. This way, the
APIs will become more generic. The tableam developers need to add
reloptions of their choice and use them in the new API
implementations.

Let me know if I am missing anything from what you have in your mind.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-21 07:47  Justin Pryzby <[email protected]>
  parent: Justin Pryzby <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Justin Pryzby @ 2020-12-21 07:47 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <[email protected]> wrote:
> > > Are you thinking that TableInsertState would eventually have additional
> > > attributes which would apply to other tableams, but not to heap ?  Is
> > > heap_insert_begin() really specific to heap ?  It's allocating and populating a
> > > structure based on its arguments, but those same arguments would be passed to
> > > every other AM's insert_begin routine, too.  Do you need a more flexible data
> > > structure, something that would also accomodate extensions?  I'm thinking of
> > > reloptions as a loose analogy.
> > 
> > I could not think of other tableam attributes now. But +1 to have that
> > kind of flexible structure for TableInsertState. So, it can have
> > tableam type and attributes within the union for each type.
> 
> Right now you have heap_insert_begin(), and I asked if it was really
> heap-specific.  Right now, it populates a struct based on a static list of
> arguments, which are what heap uses.  
> 
> If you were to implement a burp_insert_begin(), how would it differ from
> heap's?  With the current API, they'd (have to) be the same, which means either
> that it should apply to all AMs (or have a "default" implementation that can be
> overridden by an AM), or that this API assumes that other AMs will want to do
> exactly what heap does, and fails to allow other AMs to implement optimizations
> for bulk inserts as claimed.
> 
> I don't think using a "union" solves the problem, since it can only accommodate
> core AMs, and not extensions, so I suggested something like reloptions, which
> have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> toast.autovacuum_enabled).

I think you'd want to handle things like:

 - a compressed AM wants to specify a threshold for a tuple's *compressed* size
   (maybe in addition to the uncompressed size);
 - a "columnar" AM wants to specify a threshold size for a column, rather
   than for each tuple;

I'm not proposing to handle those specific parameters, but rather pointing out
that your implementation doesn't allow handling AM-specific considerations,
which I think was the goal.

The TableInsertState structure would need to store those, and then the AM's
multi_insert_v2 routine would need to make use of them.

It feels a bit like we'd introduce the idea of an "AM option", except that it
wouldn't be user-facing (or maybe some of them would be?).  Maybe I've
misunderstood though, so other opinions are welcome.

-- 
Justin





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-24 00:18  Bharath Rupireddy <[email protected]>
  parent: Justin Pryzby <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-24 00:18 UTC (permalink / raw)
  To: Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>

On Mon, Dec 21, 2020 at 1:17 PM Justin Pryzby <[email protected]> wrote:
> On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> > On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <[email protected]> wrote:
> > > > Are you thinking that TableInsertState would eventually have additional
> > > > attributes which would apply to other tableams, but not to heap ?  Is
> > > > heap_insert_begin() really specific to heap ?  It's allocating and populating a
> > > > structure based on its arguments, but those same arguments would be passed to
> > > > every other AM's insert_begin routine, too.  Do you need a more flexible data
> > > > structure, something that would also accomodate extensions?  I'm thinking of
> > > > reloptions as a loose analogy.
> > >
> > > I could not think of other tableam attributes now. But +1 to have that
> > > kind of flexible structure for TableInsertState. So, it can have
> > > tableam type and attributes within the union for each type.
> >
> > Right now you have heap_insert_begin(), and I asked if it was really
> > heap-specific.  Right now, it populates a struct based on a static list of
> > arguments, which are what heap uses.
> >
> > If you were to implement a burp_insert_begin(), how would it differ from
> > heap's?  With the current API, they'd (have to) be the same, which means either
> > that it should apply to all AMs (or have a "default" implementation that can be
> > overridden by an AM), or that this API assumes that other AMs will want to do
> > exactly what heap does, and fails to allow other AMs to implement optimizations
> > for bulk inserts as claimed.
> >
> > I don't think using a "union" solves the problem, since it can only accommodate
> > core AMs, and not extensions, so I suggested something like reloptions, which
> > have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> > toast.autovacuum_enabled).
>
> I think you'd want to handle things like:
>
>  - a compressed AM wants to specify a threshold for a tuple's *compressed* size
>    (maybe in addition to the uncompressed size);
>  - a "columnar" AM wants to specify a threshold size for a column, rather
>    than for each tuple;
>
> I'm not proposing to handle those specific parameters, but rather pointing out
> that your implementation doesn't allow handling AM-specific considerations,
> which I think was the goal.
>
> The TableInsertState structure would need to store those, and then the AM's
> multi_insert_v2 routine would need to make use of them.
>
> It feels a bit like we'd introduce the idea of an "AM option", except that it
> wouldn't be user-facing (or maybe some of them would be?).  Maybe I've
> misunderstood though, so other opinions are welcome.

Attaching a v2 patch for the new table AMs.

This patch has following changes:

1) Made the TableInsertState structure generic by having a void
pointer for multi insert state and defined the heap specific multi
insert state information in heapam.h. This way each AM can have it's
own multi insert state structure and dereference the void pointer
using that structure inside the respective AM implementations.
2) Earlier in the v1 patch, the bulk insert state
allocation/deallocation was moved to AM level, but I see that there's
nothing specific in doing so and I think it should be independent of
AM. So I'm doing that in table_insert_begin() and table_insert_end().
Because of this, I had to move the BulkInsert function declarations
from heapam.h to tableam.h
3) Corrected the typos and tried to adjust indentation of the code.

Note that I have not yet made the multi_insert_v2 API optional as
suggested earlier. I will think more on this and update.

I'm not posting the updated 0002 to 0004 patches, I plan to do so
after a couple of reviews happen on the design of the APIs in 0001.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/x-patch] v2-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (20.2K, 2-v2-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From cab7baa6f5c0229816e09a887c0468a1ca4edccb Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Thu, 24 Dec 2020 05:18:13 +0530
Subject: [PATCH v2] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs
---
 src/backend/access/heap/heapam.c         | 206 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 ++++++++-
 src/include/access/heapam.h              |  53 +++++-
 src/include/access/tableam.h             |  93 ++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 442 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a9583f3103..baa0f3032e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -66,6 +66,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2371,6 +2372,211 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc0(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mistate = NULL;
+	state->clear_mi_slots = false;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots =
+				palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_slots = 0;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert",
+												 ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+		state->clear_mi_slots = true;
+		state->flushed	= false;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from the slot into table. This is like heap_insert(). The only
+ * difference is that the parameters are inside table insert state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	/* Update the tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform the insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute the size of the tuple, store it into the buffered slots and insert
+ * the tuples(flush) from the buffered slots one at a time into the table.
+ *
+ * Flush can happen:
+ *  1) either if all the slots are filled up
+ *  2) or if the total tuple size of the currently buffered slots are >=
+ *     max_size.
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+	HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
+	Size sz;
+
+	Assert(mistate && mistate->slots);
+
+	if (mistate->slots[mistate->cur_slots] == NULL)
+		mistate->slots[mistate->cur_slots] =
+									table_slot_create(state->rel, NULL);
+
+	batchslot = mistate->slots[mistate->cur_slots];
+
+	ExecCopySlot(batchslot, slot);
+
+	/* Reset the flush state if previously set. */
+	if (state->flushed)
+		state->flushed = false;
+
+	/*
+	 * Calculate the tuple size after the original slot is copied, because the
+	 * copied slot type and the tuple size may change.
+	 */
+	sz = GetTupleSize(batchslot, mistate->max_size);
+
+	Assert(sz > 0);
+
+	mistate->cur_slots++;
+	mistate->cur_size += sz;
+
+	if (mistate->cur_slots >= mistate->max_slots ||
+		mistate->cur_size >= mistate->max_size)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * heap_multi_insert_flush - flush the tuples from buffered slots if any
+ *
+ * Flush the buffered tuples, indicate the caller that the flushing happened
+ * and clear the slots if they are not required outside. Reset the parameters.
+ */
+void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
+	MemoryContext oldcontext;
+
+	Assert(mistate && mistate->slots && mistate->cur_slots >= 0 &&
+		   mistate->context);
+
+	if (mistate->cur_slots == 0)
+	{
+		state->flushed = false;
+		return;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->options, state->bistate);
+
+	MemoryContextReset(mistate->context);
+	MemoryContextSwitchTo(oldcontext);
+
+	/*
+	 * Do not clear the slots always. Sometimes callers may want the slots for
+	 * index insertions or after row trigger executions in which case they have
+	 * to clear the tuples before using for the next insert batch.
+	 */
+	if (state->clear_mi_slots)
+	{
+		int i;
+
+		for (i = 0; i < mistate->cur_slots; i++)
+			ExecClearTuple(mistate->slots[i]);
+	}
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+	state->flushed = true;
+}
+
+/*
+ * heap_insert_end - clean up the TableInsertState
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function. Buffered slots are
+ * dropped, short-lived memory context is deleted and mistate is freed up.
+ *
+ * And finally free up TableInsertState.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	if (state->mistate)
+	{
+		HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
+		int i;
+
+		/* Ensure that the buffers have been flushed before. */
+		Assert(mistate->slots && mistate->cur_slots == 0 &&
+			   mistate->context);
+
+		for (i = 0; i < mistate->max_slots && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		MemoryContextDelete(mistate->context);
+
+		pfree(mistate->slots);
+		pfree(mistate);
+	}
+
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3eea215b85..eb3da12d9c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2554,6 +2554,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 58de0743ba..6bec0659e4 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 4c90ac5236..fa6f494ab6 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also used
+	 * in GetTupleSize(), hence ensure to have the same changes or fixes here
+	 * and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code being used here is from
+		 * tts_virtual_materialize(), ensure to have the same changes or fixes
+		 * here and also there.
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 54b2eb7378..c981b4758d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,11 +36,26 @@
 #define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
 #define HEAP_INSERT_SPECULATIVE 0x0010
 
-typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
+/*
+ * No more than this many tuples per single multi insert batch
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. Increasing this can cause quadratic growth in
+ * memory requirements during copies into partitioned tables with a large
+ * number of partitions.
+ */
+#define MAX_BUFFERED_TUPLES		1000
+
+/*
+ * Flush multi insert buffers if there are >= this many bytes, as counted by
+ * the size of the tuples buffered.
+ */
+#define MAX_BUFFERED_BYTES		65535
+
 /*
  * Descriptor for heap table scans.
  */
@@ -93,6 +108,29 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/* Holds the multi insert state for heap access method. */
+typedef struct HeapMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+	/* Array of buffered slots. */
+	TupleTableSlot      **slots;
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+	/* Number of slots that are currently buffered. */
+	int32               cur_slots;
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold.
+	 */
+	int64               max_size;
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered.
+	 * Flush the buffered slots when cur_size >= max_size.
+	 */
+	int64               cur_size;
+} HeapMultiInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -130,15 +168,20 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 
 extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
-extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
-extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool is_multi);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 387eb34a61..f3205a520d 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -128,6 +128,38 @@ typedef struct TM_FailureData
 	bool		traversed;
 } TM_FailureData;
 
+/* Holds the table insert state. */
+typedef struct TableInsertState
+{
+	Relation	rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData     *bistate;
+	CommandId	cid;
+	int		options;
+	/* Multi insert state if requested, otherwise NULL. */
+	void	*mistate;
+	/*
+	 * Valid only for multi inserts that is when mistate is non NULL.
+	 * Whether to clear the buffered slots after each flush? If the relation
+	 * has indexes or after row triggers, the buffered slots are required
+	 * outside multi insert AM, in which case, clean them in the caller using
+	 * ExecClearTuple() outside the multi insert AM. If true, which is default,
+	 * multi insert AM will clear the slots.
+	 *
+	 * It is good to set this flag by looking at whether the table is having
+	 * any indexes or after row triggers at the beginning of multi insert
+	 * operation, precisely after calling begin insert AM.
+	 */
+	bool	clear_mi_slots;
+	/*
+	 * Valid only for multi inserts that is when mistate is non NULL.
+	 * Initially false, set to true by multi insert AM whenever it flushes the
+	 * buffered slots. Caller can use this flag to insert into indexes or
+	 * execute after row triggers and so on if any.
+	 */
+	bool	flushed;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -376,6 +408,17 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool is_multi);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -723,6 +766,8 @@ typedef struct TableAmRoutine
 } TableAmRoutine;
 
 
+typedef struct BulkInsertStateData *BulkInsertState;
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -741,6 +786,10 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
+/* Bulk insert state functions. */
+extern BulkInsertState GetBulkInsertState(void);
+extern void FreeBulkInsertState(BulkInsertState);
+extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1237,6 +1286,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi)
+{
+	TableInsertState *state = rel->rd_tableam->tuple_insert_begin(rel, cid,
+										options, is_multi);
+
+	/* Allocate bulk insert state here, since it's AM independent. */
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	return state;
+}
+
+static inline void
+table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* Deallocate bulk insert state here, since it's AM independent. */
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index f7df70b5ab..d7c284d8e3 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-25 02:40  Justin Pryzby <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Justin Pryzby @ 2020-12-25 02:40 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
> I'm not posting the updated 0002 to 0004 patches, I plan to do so
> after a couple of reviews happen on the design of the APIs in 0001.
> 
> Thoughts?

Are you familiar with this work ?

https://commitfest.postgresql.org/31/2717/
Reloptions for table access methods

It seems like that can be relevant for your patch, and I think some of what
your patch needs might be provided by AM opts.  

It's difficult to generalize AMs when we have only one, but your use-case might
be a concrete example which would help to answer some questions on the other
thread.

@Jeff: https://commitfest.postgresql.org/31/2871/

-- 
Justin





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2020-12-28 12:48  Bharath Rupireddy <[email protected]>
  parent: Justin Pryzby <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2020-12-28 12:48 UTC (permalink / raw)
  To: Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Luc Vlaming <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <[email protected]> wrote:
> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
> > I'm not posting the updated 0002 to 0004 patches, I plan to do so
> > after a couple of reviews happen on the design of the APIs in 0001.
> >
> > Thoughts?
>
> Are you familiar with this work ?
>
> https://commitfest.postgresql.org/31/2717/
> Reloptions for table access methods
>
> It seems like that can be relevant for your patch, and I think some of what
> your patch needs might be provided by AM opts.
>
> It's difficult to generalize AMs when we have only one, but your use-case might
> be a concrete example which would help to answer some questions on the other
> thread.
>
> @Jeff: https://commitfest.postgresql.org/31/2871/

Note that I have not gone through the entire thread at [1]. On some
initial study, that patch is proposing to allow different table AMs to
have custom rel options.

In the v2 patch that I sent upthread [2] for new table AMs has heap AM
multi insert code moved inside the new heap AM implementation and I
don't see any need of having rel options. In case, any other AMs want
to have the control for their multi insert API implementation via rel
options, I think the proposal at [1] can be useful.

IIUC, there's no dependency or anything as such for the new table AM
patch with the rel options thread [1]. If I'm right, can this new
table AM patch [2] be reviewed further?

Thoughts?

[1] - https://commitfest.postgresql.org/31/2717/
[2] - https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40...

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-04 07:59  Luc Vlaming <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 43+ messages in thread

From: Luc Vlaming @ 2021-01-04 07:59 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On 28-12-2020 13:48, Bharath Rupireddy wrote:
> On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <[email protected]> wrote:
>> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
>>> I'm not posting the updated 0002 to 0004 patches, I plan to do so
>>> after a couple of reviews happen on the design of the APIs in 0001.
>>>
>>> Thoughts?
>>
>> Are you familiar with this work ?
>>
>> https://commitfest.postgresql.org/31/2717/
>> Reloptions for table access methods
>>
>> It seems like that can be relevant for your patch, and I think some of what
>> your patch needs might be provided by AM opts.
>>
>> It's difficult to generalize AMs when we have only one, but your use-case might
>> be a concrete example which would help to answer some questions on the other
>> thread.
>>
>> @Jeff: https://commitfest.postgresql.org/31/2871/
> 
> Note that I have not gone through the entire thread at [1]. On some
> initial study, that patch is proposing to allow different table AMs to
> have custom rel options.
> 
> In the v2 patch that I sent upthread [2] for new table AMs has heap AM
> multi insert code moved inside the new heap AM implementation and I
> don't see any need of having rel options. In case, any other AMs want
> to have the control for their multi insert API implementation via rel
> options, I think the proposal at [1] can be useful.
> 
> 
> Thoughts?
> 
> [1] - https://commitfest.postgresql.org/31/2717/
> [2] - https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40...
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 
Hi,

 > IIUC, there's no dependency or anything as such for the new table AM
 > patch with the rel options thread [1]. If I'm right, can this new
 > table AM patch [2] be reviewed further?

To me this seems good enough. Reason is that I anticipate that there 
would not necessarily be per-table options for now but rather global 
options, if any. Moreover, if we want to make these kind of tradeoffs 
user-controllable I would argue this should be done in a different 
patch-set either way. Reason is that there are parameters in heap 
already that are computed / hardcoded as well (see e.g. 
RelationAddExtraBlocks).

===

As to the patches themselves:

I think the API is a huge step forward! I assume that we want to have a 
single-insert API like heap_insert_v2 so that we can encode the 
knowledge that there will just be a single insert coming and likely a 
commit afterwards?

Reason I'm asking is that I quite liked the heap_insert_begin parameter 
is_multi, which could even be turned into a "expected_rowcount" of the 
amount of rows expected to be commited in the transaction (e.g. single, 
several, thousands/stream).
If we were to make the API based on expected rowcounts, the whole 
heap_insert_v2, heap_insert and heap_multi_insert could be turned into a 
single function heap_insert, as the knowledge about buffering of the 
slots is then already stored in the TableInsertState, creating an API like:

// expectedRows: -1 = streaming, otherwise expected rowcount.
TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int 
options, int expectedRows);
heap_insert(TableInsertState *state, TupleTableSlot *slot);

Do you think that's a good idea?

Two smaller things I'm wondering:
- the clear_mi_slots; why is this not in the HeapMultiInsertState? the 
slots themselves are declared there? also, the boolean themselves is 
somewhat problematic I think because it would only work if you specified 
is_multi=true which would depend on the actual tableam to implement this 
then in a way that copy/ctas/etc can also use the slot properly, which I 
think would severely limit their freedom to store the slots more 
efficiently? Also, why do we want to do ExecClearTuple() anyway? Isn't 
it good enough that the next call to ExecCopySlot will effectively clear 
it out?
- flushed -> why is this a stored boolean? isn't this indirectly encoded 
by cur_slots/cur_size == 0?

For patches 02-04 I quickly skimmed through them as I assume we first 
want the API agreed upon. Generally they look nice and like a big step 
forward. What I'm just wondering about is the usage of the 
implementation details like mistate->slots[X]. It makes a lot of sense 
to do so but also makes for a difficult compromise, because now the 
tableam has to guarantee a copy of the slot, and hopefully even one in a 
somewhat efficient form.

Kind regards,
Luc

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-05 10:06  Bharath Rupireddy <[email protected]>
  parent: Luc Vlaming <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2021-01-05 10:06 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <[email protected]> wrote:
>  > table AM patch [2] be reviewed further?
> As to the patches themselves:
>
> I think the API is a huge step forward! I assume that we want to have a
> single-insert API like heap_insert_v2 so that we can encode the
> knowledge that there will just be a single insert coming and likely a
> commit afterwards?
>
> Reason I'm asking is that I quite liked the heap_insert_begin parameter
> is_multi, which could even be turned into a "expected_rowcount" of the
> amount of rows expected to be commited in the transaction (e.g. single,
> several, thousands/stream).
> If we were to make the API based on expected rowcounts, the whole
> heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
> single function heap_insert, as the knowledge about buffering of the
> slots is then already stored in the TableInsertState, creating an API
like:
>
> // expectedRows: -1 = streaming, otherwise expected rowcount.
> TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
> options, int expectedRows);
> heap_insert(TableInsertState *state, TupleTableSlot *slot);
>
> Do you think that's a good idea?

IIUC, your suggestion is to use expectedRows and move the multi insert
implementation heap_multi_insert_v2 to heap_insert_v2. If that's correct,
so heap_insert_v2 will look something like this:

heap_insert_v2()
{
    if (single_insert)
      //do single insertion work, the code in existing heap_insert_v2 comes
here
   else
      //do multi insertion work, the code in existing heap_multi_insert_v2
comes here
}

I don't see any problem in combining single and multi insert APIs into one.
Having said that, will the APIs be cleaner then? Isn't it going to be
confusing if a single heap_insert_v2 API does both the works? With the
existing separate APIs, for single insertion, the sequence of the API can
be like begin, insert_v2, end and for multi inserts it's like begin,
multi_insert_v2, flush, end. I prefer to have a separate multi insert API
so that it will make the code look readable.

Thoughts?

> Two smaller things I'm wondering:
> - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
> slots themselves are declared there?

Firstly, we need to have the buffered slots sometimes(please have a look at
the comments in TableInsertState structure) outside the multi_insert API.
And we need to have cleared the previously flushed slots before we start
buffering in heap_multi_insert_v2(). I can remove the clear_mi_slots flag
altogether and do as follows: I will not set mistate->cur_slots to 0 in
heap_multi_insert_flush after the flush, I will only set state->flushed to
true. In heap_multi_insert_v2,

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
    TupleTableSlot  *batchslot;
    HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
    Size sz;

    Assert(mistate && mistate->slots);

*   /* if the slots are flushed previously then clear them off before using
them again. */    if (state->flushed)    {        int i;        for (i = 0;
i < mistate->cur_slots; i++)            ExecClearTuple(mistate->slots[i]);
      mistate->cur_slots = 0;        state->flushed = false    }*

    if (mistate->slots[mistate->cur_slots] == NULL)
        mistate->slots[mistate->cur_slots] =
                                    table_slot_create(state->rel, NULL);

    batchslot = mistate->slots[mistate->cur_slots];

    ExecCopySlot(batchslot, slot);

Thoughts?

> Also, why do we want to do ExecClearTuple() anyway? Isn't
> it good enough that the next call to ExecCopySlot will effectively clear
> it out?

For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
slot before copying. But, for buffer heap slots, the
tts_buffer_heap_copyslot does not always clear the destination slot, see
below. If we fall into else condition, we might get some issues. And also
note that, once the slot is cleared in ExecClearTuple, it will not be
cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be false.
That is why, let's have ExecClearTuple as is.

    /*
     * If the source slot is of a different kind, or is a buffer slot that
has
     * been materialized / is virtual, make a new copy of the tuple.
Otherwise
     * make a new reference to the in-buffer tuple.
     */
    if (dstslot->tts_ops != srcslot->tts_ops ||
        TTS_SHOULDFREE(srcslot) ||
        !bsrcslot->base.tuple)
    {
        MemoryContext oldContext;

        ExecClearTuple(dstslot);
    }
    else
    {
        Assert(BufferIsValid(bsrcslot->buffer));

        tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
                                    bsrcslot->buffer, false);

> - flushed -> why is this a stored boolean? isn't this indirectly encoded
> by cur_slots/cur_size == 0?

Note that cur_slots is in HeapMultiInsertState and outside of the new APIs
i.e. in TableInsertState, mistate is a void pointer, and we can't really
access the cur_slots. I mean, we can access but we need to be dereferencing
using the tableam kind. Instead of  doing all of that, to keep the API
cleaner, I chose to have a boolean in the TableInsertState which we can see
and use outside of the new APIs. Hope that's fine.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-05 21:28  Jeff Davis <[email protected]>
  parent: Luc Vlaming <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Jeff Davis @ 2021-01-05 21:28 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; Bharath Rupireddy <[email protected]>; Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>

On Mon, 2021-01-04 at 08:59 +0100, Luc Vlaming wrote:
> Reason I'm asking is that I quite liked the heap_insert_begin
> parameter 
> is_multi, which could even be turned into a "expected_rowcount" of
> the 
> amount of rows expected to be commited in the transaction (e.g.
> single, 
> several, thousands/stream).

Do you mean "written by the statement" instead of "committed in the
transaction"? It doesn't look like the TableInsertState state will
survive across statement boundaries.

Though that is an important question to consider. If the premise is
that a given custom AM may be much more efficient at bulk inserts than
retail inserts (which is reasonable), then it makes sense to handle the
case of a transaction with many single-tuple inserts. But keeping
insert state across statement boundaries also raises a few potential
problems.

Regards,
	Jeff Davis

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-06 07:00  Luc Vlaming <[email protected]>
  parent: Jeff Davis <[email protected]>
  0 siblings, 0 replies; 43+ messages in thread

From: Luc Vlaming @ 2021-01-06 07:00 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; Bharath Rupireddy <[email protected]>; Justin Pryzby <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>

On 05-01-2021 22:28, Jeff Davis wrote:
> On Mon, 2021-01-04 at 08:59 +0100, Luc Vlaming wrote:
>> Reason I'm asking is that I quite liked the heap_insert_begin
>> parameter
>> is_multi, which could even be turned into a "expected_rowcount" of
>> the
>> amount of rows expected to be commited in the transaction (e.g.
>> single,
>> several, thousands/stream).
> 
> Do you mean "written by the statement" instead of "committed in the
> transaction"? It doesn't look like the TableInsertState state will
> survive across statement boundaries.
> 
> Though that is an important question to consider. If the premise is
> that a given custom AM may be much more efficient at bulk inserts than
> retail inserts (which is reasonable), then it makes sense to handle the
> case of a transaction with many single-tuple inserts. But keeping
> insert state across statement boundaries also raises a few potential
> problems.
> 
> Regards,
> 	Jeff Davis
> 
> 

I did actually mean until the end of the transaction. I know this is 
currently not possible with the current design but I think it would be 
cool to start going that way (even if slightly). Creating some more 
freedom on how a tableam optimizes inserts, when one syncs to disk, etc 
would be good imo. It would allow one to create e.g. a tableam that 
would not have as a high overhead when doing single statement inserts.

Kind regards,
Luc





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-06 07:26  Luc Vlaming <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Luc Vlaming @ 2021-01-06 07:26 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On 05-01-2021 11:06, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <[email protected] 
> <mailto:[email protected]>> wrote:
>  >  > table AM patch [2] be reviewed further?
>  > As to the patches themselves:
>  >
>  > I think the API is a huge step forward! I assume that we want to have a
>  > single-insert API like heap_insert_v2 so that we can encode the
>  > knowledge that there will just be a single insert coming and likely a
>  > commit afterwards?
>  >
>  > Reason I'm asking is that I quite liked the heap_insert_begin parameter
>  > is_multi, which could even be turned into a "expected_rowcount" of the
>  > amount of rows expected to be commited in the transaction (e.g. single,
>  > several, thousands/stream).
>  > If we were to make the API based on expected rowcounts, the whole
>  > heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
>  > single function heap_insert, as the knowledge about buffering of the
>  > slots is then already stored in the TableInsertState, creating an API 
> like:
>  >
>  > // expectedRows: -1 = streaming, otherwise expected rowcount.
>  > TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
>  > options, int expectedRows);
>  > heap_insert(TableInsertState *state, TupleTableSlot *slot);
>  >
>  > Do you think that's a good idea?
> 
> IIUC, your suggestion is to use expectedRows and move the multi insert 
> implementation heap_multi_insert_v2 to heap_insert_v2. If that's 
> correct, so heap_insert_v2 will look something like this:
> 
> heap_insert_v2()
> {
>      if (single_insert)
>        //do single insertion work, the code in existing heap_insert_v2 
> comes here
>     else
>        //do multi insertion work, the code in existing 
> heap_multi_insert_v2 comes here
> }
> 
> I don't see any problem in combining single and multi insert APIs into 
> one. Having said that, will the APIs be cleaner then? Isn't it going to 
> be confusing if a single heap_insert_v2 API does both the works? With 
> the existing separate APIs, for single insertion, the sequence of the 
> API can be like begin, insert_v2, end and for multi inserts it's like 
> begin, multi_insert_v2, flush, end. I prefer to have a separate multi 
> insert API so that it will make the code look readable.
> 
> Thoughts?

The main reason for me for wanting a single API is that I would like the 
decision of using single or multi inserts to move to inside the tableam.
For e.g. a heap insert we might want to put the threshold at e.g. 100 
rows so that the overhead of buffering the tuples is actually 
compensated. For other tableam this logic might also be quite different, 
and I think therefore that it shouldn't be e.g. COPY or CTAS deciding 
whether or not multi inserts should be used. Because otherwise the thing 
we'll get is that there will be tableams that will ignore this flag and 
do their own thing anyway. I'd rather have an API that gives all 
necessary information to the tableam and then make the tableam do "the 
right thing".

Another reason I'm suggesting this API is that I would expect that the 
begin is called in a different place in the code for the (multiple) 
inserts than the actual insert statement.
To me conceptually the begin and end are like e.g. the executor begin 
and end: you prepare the inserts with the knowledge you have at that 
point. I assumed (wrongly?) that during the start of the statement one 
knows best how many rows are coming; and then the actual insertion of 
the row doesn't have to deal anymore with multi/single inserts, choosing 
when to buffer or not, because that information has already been given 
during the initial phase. One of the reasons this is appealing to me is 
that e.g. in [1] there was discussion on when to switch to a multi 
insert state, and imo this should be up to the tableam.

> 
>  > Two smaller things I'm wondering:
>  > - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
>  > slots themselves are declared there?
> 
> Firstly, we need to have the buffered slots sometimes(please have a look 
> at the comments in TableInsertState structure) outside the multi_insert 
> API. And we need to have cleared the previously flushed slots before we 
> start buffering in heap_multi_insert_v2(). I can remove the 
> clear_mi_slots flag altogether and do as follows: I will not set 
> mistate->cur_slots to 0 in heap_multi_insert_flush after the flush, I 
> will only set state->flushed to true. In heap_multi_insert_v2,
> 
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
>      TupleTableSlot  *batchslot;
>      HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>      Size sz;
> 
>      Assert(mistate && mistate->slots);
> 
> *  /* if the slots are flushed previously then clear them off before 
> using them again. */
>      if (state->flushed)
>      {
>          int i;
> 
>          for (i = 0; i < mistate->cur_slots; i++)
>              ExecClearTuple(mistate->slots[i]);
> 
>          mistate->cur_slots = 0;
>          state->flushed = false
>      }*
> 
>      if (mistate->slots[mistate->cur_slots] == NULL)
>          mistate->slots[mistate->cur_slots] =
>                                      table_slot_create(state->rel, NULL);
> 
>      batchslot = mistate->slots[mistate->cur_slots];
> 
>      ExecCopySlot(batchslot, slot);
> 
> Thoughts?

 From what I can see you can just keep the v2-0001 patch and:
- remove the flushed variable alltogether. mistate->cur_slots == 0 
encodes this already and the variable is never actually checked on.
- call ExecClearTuple just before ExecCopySlot()

Which would make the code something like:

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
	TupleTableSlot  *batchslot;
	HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
	Size sz;

	Assert(mistate && mistate->slots);

	if (mistate->slots[mistate->cur_slots] == NULL)
		mistate->slots[mistate->cur_slots] =
									table_slot_create(state->rel, NULL);

	batchslot = mistate->slots[mistate->cur_slots];

	ExecClearTuple(batchslot);
	ExecCopySlot(batchslot, slot);

	/*
	 * Calculate the tuple size after the original slot is copied, because the
	 * copied slot type and the tuple size may change.
	 */
	sz = GetTupleSize(batchslot, mistate->max_size);

	Assert(sz > 0);

	mistate->cur_slots++;
	mistate->cur_size += sz;

	if (mistate->cur_slots >= mistate->max_slots ||
		mistate->cur_size >= mistate->max_size)
		heap_multi_insert_flush(state);
}

void
heap_multi_insert_flush(TableInsertState *state)
{
	HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
	MemoryContext oldcontext;

	Assert(mistate && mistate->slots && mistate->cur_slots >= 0 &&
		   mistate->context);

	if (mistate->cur_slots == 0)
		return;

	oldcontext = MemoryContextSwitchTo(mistate->context);

	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
					  state->cid, state->options, state->bistate);

	MemoryContextReset(mistate->context);
	MemoryContextSwitchTo(oldcontext);

	/*
	 * Do not clear the slots always. Sometimes callers may want the slots for
	 * index insertions or after row trigger executions in which case they have
	 * to clear the tuples before using for the next insert batch.
	 */
	if (state->clear_mi_slots)
	{
		int i;

		for (i = 0; i < mistate->cur_slots; i++)
			ExecClearTuple(mistate->slots[i]);
	}

	mistate->cur_slots = 0;
	mistate->cur_size = 0;
}


> 
>  > Also, why do we want to do ExecClearTuple() anyway? Isn't
>  > it good enough that the next call to ExecCopySlot will effectively clear
>  > it out?
> 
> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the 
> slot before copying. But, for buffer heap slots, the 
> tts_buffer_heap_copyslot does not always clear the destination slot, see 
> below. If we fall into else condition, we might get some issues. And 
> also note that, once the slot is cleared in ExecClearTuple, it will not 
> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be 
> false. That is why, let's have ExecClearTuple as is.
> 
I had no idea the buffer heap slot doesn't unconditionally clear out the 
slot :( So yes lets call it unconditionally ourselves. See also 
suggestion above.

>      /*
>       * If the source slot is of a different kind, or is a buffer slot 
> that has
>       * been materialized / is virtual, make a new copy of the tuple. 
> Otherwise
>       * make a new reference to the in-buffer tuple.
>       */
>      if (dstslot->tts_ops != srcslot->tts_ops ||
>          TTS_SHOULDFREE(srcslot) ||
>          !bsrcslot->base.tuple)
>      {
>          MemoryContext oldContext;
> 
>          ExecClearTuple(dstslot);
>      }
>      else
>      {
>          Assert(BufferIsValid(bsrcslot->buffer));
> 
>          tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>                                      bsrcslot->buffer, false);
> 
>  > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>  > by cur_slots/cur_size == 0?
> 
> Note that cur_slots is in HeapMultiInsertState and outside of the new 
> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't 
> really access the cur_slots. I mean, we can access but we need to be 
> dereferencing using the tableam kind. Instead of  doing all of that, to 
> keep the API cleaner, I chose to have a boolean in the TableInsertState 
> which we can see and use outside of the new APIs. Hope that's fine.
> 
So you mean the flushed variable is actually there to tell the user of 
the API that they are supposed to call flush before end? Why can't the 
end call flush itself then? I guess I completely misunderstood the 
purpose of table_multi_insert_flush being public. I had assumed it is 
there to from the usage site indicate that now would be a good time to 
flush, e.g. because of a statement ending or something. I had not 
understood this is a requirement that its always required to do 
table_multi_insert_flush + table_insert_end.
IMHO I would hide this from the callee, given that you would only really 
call flush yourself when you immediately after would call end, or are 
there other cases where one would be required to explicitly call flush?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com;

Kind regards,
Luc





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-06 13:06  Bharath Rupireddy <[email protected]>
  parent: Luc Vlaming <[email protected]>
  0 siblings, 3 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2021-01-06 13:06 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <[email protected]> wrote:
> The main reason for me for wanting a single API is that I would like the
> decision of using single or multi inserts to move to inside the tableam.
> For e.g. a heap insert we might want to put the threshold at e.g. 100
> rows so that the overhead of buffering the tuples is actually
> compensated. For other tableam this logic might also be quite different,
> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
> whether or not multi inserts should be used. Because otherwise the thing
> we'll get is that there will be tableams that will ignore this flag and
> do their own thing anyway. I'd rather have an API that gives all
> necessary information to the tableam and then make the tableam do "the
> right thing".
>
> Another reason I'm suggesting this API is that I would expect that the
> begin is called in a different place in the code for the (multiple)
> inserts than the actual insert statement.
> To me conceptually the begin and end are like e.g. the executor begin
> and end: you prepare the inserts with the knowledge you have at that
> point. I assumed (wrongly?) that during the start of the statement one
> knows best how many rows are coming; and then the actual insertion of
> the row doesn't have to deal anymore with multi/single inserts, choosing
> when to buffer or not, because that information has already been given
> during the initial phase. One of the reasons this is appealing to me is
> that e.g. in [1] there was discussion on when to switch to a multi
> insert state, and imo this should be up to the tableam.

Agree that whether to go with the multi or single inserts should be
completely left to tableam implementation, we, as callers of those API
just need to inform whether we expect single or multiple rows, and it
should be left to tableam implementation whether to actually go with
buffering or single inserts. ISTM that it's an elegant way of making
the API generic and abstracting everything from the callers. What I
wonder is how can we know in advance the expected row count that we
need to pass in to heap_insert_begin()? IIUC, we can not estimate the
upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
other insert queries?  Of course, we can look at the planner's
estimated row count for the selects in COPY, Insert Into Select or
Refresh Mat View after the planning, but to me that's not something we
can depend on and pass in the row count to the insert APIs.

When we don't know the expected row count, why can't we(as callers of
the APIs) tell the APIs something like, "I'm intending to perform
multi inserts, so if possible and if you have a mechanism to buffer
the slots, do it, otherwise insert the tuples one by one, or else do
whatever you want to do with the tuples I give it you". So, in case of
COPY we can ask the API for multi inserts and call heap_insert_begin()
and heap_insert_v2().

Given the above explanation, I still feel bool is_multi would suffice.

Thoughts?

On dynamically, switching from single to multi inserts, this can be
done by heap_insert_v2 itself. The way I think it's possible is that,
say we have some threshold row count 1000(can be a macro)  after
inserting those many tuples, heap_insert_v2 can switch to buffering
mode.

Thoughts?

> Which would make the code something like:
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
>         TupleTableSlot  *batchslot;
>         HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>         Size sz;
>
>         Assert(mistate && mistate->slots);
>
>         if (mistate->slots[mistate->cur_slots] == NULL)
>                 mistate->slots[mistate->cur_slots] =
>                                                                         table_slot_create(state->rel, NULL);
>
>         batchslot = mistate->slots[mistate->cur_slots];
>
>         ExecClearTuple(batchslot);
>         ExecCopySlot(batchslot, slot);
>
>         /*
>          * Calculate the tuple size after the original slot is copied, because the
>          * copied slot type and the tuple size may change.
>          */
>         sz = GetTupleSize(batchslot, mistate->max_size);
>
>         Assert(sz > 0);
>
>         mistate->cur_slots++;
>         mistate->cur_size += sz;
>
>         if (mistate->cur_slots >= mistate->max_slots ||
>                 mistate->cur_size >= mistate->max_size)
>                 heap_multi_insert_flush(state);
> }

I think clearing tuples before copying the slot as you suggested may
work without the need of clear_slots flag.

>
> >  > Also, why do we want to do ExecClearTuple() anyway? Isn't
> >  > it good enough that the next call to ExecCopySlot will effectively clear
> >  > it out?
> >
> > For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> > slot before copying. But, for buffer heap slots, the
> > tts_buffer_heap_copyslot does not always clear the destination slot, see
> > below. If we fall into else condition, we might get some issues. And
> > also note that, once the slot is cleared in ExecClearTuple, it will not
> > be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
> > false. That is why, let's have ExecClearTuple as is.
> >
> I had no idea the buffer heap slot doesn't unconditionally clear out the
> slot :( So yes lets call it unconditionally ourselves. See also
> suggestion above.

Yeah, we will clear the tuple slot before copy to be on the safer side.

> >      /*
> >       * If the source slot is of a different kind, or is a buffer slot
> > that has
> >       * been materialized / is virtual, make a new copy of the tuple.
> > Otherwise
> >       * make a new reference to the in-buffer tuple.
> >       */
> >      if (dstslot->tts_ops != srcslot->tts_ops ||
> >          TTS_SHOULDFREE(srcslot) ||
> >          !bsrcslot->base.tuple)
> >      {
> >          MemoryContext oldContext;
> >
> >          ExecClearTuple(dstslot);
> >      }
> >      else
> >      {
> >          Assert(BufferIsValid(bsrcslot->buffer));
> >
> >          tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
> >                                      bsrcslot->buffer, false);
> >
> >  > - flushed -> why is this a stored boolean? isn't this indirectly encoded
> >  > by cur_slots/cur_size == 0?
> >
> > Note that cur_slots is in HeapMultiInsertState and outside of the new
> > APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
> > really access the cur_slots. I mean, we can access but we need to be
> > dereferencing using the tableam kind. Instead of  doing all of that, to
> > keep the API cleaner, I chose to have a boolean in the TableInsertState
> > which we can see and use outside of the new APIs. Hope that's fine.
> >
> So you mean the flushed variable is actually there to tell the user of
> the API that they are supposed to call flush before end? Why can't the
> end call flush itself then? I guess I completely misunderstood the
> purpose of table_multi_insert_flush being public. I had assumed it is
> there to from the usage site indicate that now would be a good time to
> flush, e.g. because of a statement ending or something. I had not
> understood this is a requirement that its always required to do
> table_multi_insert_flush + table_insert_end.
> IMHO I would hide this from the callee, given that you would only really
> call flush yourself when you immediately after would call end, or are
> there other cases where one would be required to explicitly call flush?

We need to know outside the multi_insert API whether the buffered
slots in case of multi inserts are flushed. Reason is that if we have
indexes or after row triggers, currently we call ExecInsertIndexTuples
or ExecARInsertTriggers on the buffered slots outside the API in a
loop after the flush.

If we agree on removing heap_multi_insert_v2 API and embed that logic
inside heap_insert_v2, then we can do this - pass the required
information and the functions ExecInsertIndexTuples and
ExecARInsertTriggers as callbacks so that, whether or not
heap_insert_v2 choses single or multi inserts, it can callback these
functions with the required information passed after the flush. We can
add the callback and required information into TableInsertState. But,
I'm not quite sure, we would make ExecInsertIndexTuples and
ExecARInsertTriggers. And in

If we don't want to go with callback way, then at least we need to
know whether or not heap_insert_v2 has chosen multi inserts, if yes,
the buffered slots array, and the number of current buffered slots,
whether they are flushed or not in the TableInsertState. Then,
eventually, we might need all the HeapMultiInsertState info in the
TableInsertState.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-12 08:03  Luc Vlaming <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  2 siblings, 0 replies; 43+ messages in thread

From: Luc Vlaming @ 2021-01-12 08:03 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On 06-01-2021 14:06, Bharath Rupireddy wrote:
> On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <[email protected]> wrote:
>> The main reason for me for wanting a single API is that I would like the
>> decision of using single or multi inserts to move to inside the tableam.
>> For e.g. a heap insert we might want to put the threshold at e.g. 100
>> rows so that the overhead of buffering the tuples is actually
>> compensated. For other tableam this logic might also be quite different,
>> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
>> whether or not multi inserts should be used. Because otherwise the thing
>> we'll get is that there will be tableams that will ignore this flag and
>> do their own thing anyway. I'd rather have an API that gives all
>> necessary information to the tableam and then make the tableam do "the
>> right thing".
>>
>> Another reason I'm suggesting this API is that I would expect that the
>> begin is called in a different place in the code for the (multiple)
>> inserts than the actual insert statement.
>> To me conceptually the begin and end are like e.g. the executor begin
>> and end: you prepare the inserts with the knowledge you have at that
>> point. I assumed (wrongly?) that during the start of the statement one
>> knows best how many rows are coming; and then the actual insertion of
>> the row doesn't have to deal anymore with multi/single inserts, choosing
>> when to buffer or not, because that information has already been given
>> during the initial phase. One of the reasons this is appealing to me is
>> that e.g. in [1] there was discussion on when to switch to a multi
>> insert state, and imo this should be up to the tableam.
> 
> Agree that whether to go with the multi or single inserts should be
> completely left to tableam implementation, we, as callers of those API
> just need to inform whether we expect single or multiple rows, and it
> should be left to tableam implementation whether to actually go with
> buffering or single inserts. ISTM that it's an elegant way of making
> the API generic and abstracting everything from the callers. What I
> wonder is how can we know in advance the expected row count that we
> need to pass in to heap_insert_begin()? IIUC, we can not estimate the
> upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
> other insert queries?  Of course, we can look at the planner's
> estimated row count for the selects in COPY, Insert Into Select or
> Refresh Mat View after the planning, but to me that's not something we
> can depend on and pass in the row count to the insert APIs.
> 
> When we don't know the expected row count, why can't we(as callers of
> the APIs) tell the APIs something like, "I'm intending to perform
> multi inserts, so if possible and if you have a mechanism to buffer
> the slots, do it, otherwise insert the tuples one by one, or else do
> whatever you want to do with the tuples I give it you". So, in case of
> COPY we can ask the API for multi inserts and call heap_insert_begin()
> and heap_insert_v2().
> 

I thought that when it is available (because of planning) it would be 
nice to pass it in. If you don't know you could pass in a 1 for doing 
single inserts, and e.g. -1 or max-int for streaming. The reason I 
proposed it is so that tableam's have as much knowledge as posisble to 
do the right thing. is_multi does also work of course but is just 
somewhat less informative.

What to me seemed somewhat counterintuitive is that with the proposed 
API it is possible to say is_multi=true and then still call 
heap_insert_v2 to do a single insert.

> Given the above explanation, I still feel bool is_multi would suffice.
> 
> Thoughts?
> 
> On dynamically, switching from single to multi inserts, this can be
> done by heap_insert_v2 itself. The way I think it's possible is that,
> say we have some threshold row count 1000(can be a macro)  after
> inserting those many tuples, heap_insert_v2 can switch to buffering
> mode.

For that I thought it'd be good to use the expected row count, but yeah 
dynamically switching also works and might work better if the expected 
row counts are usually off.

> 
> Thoughts?
> 
>> Which would make the code something like:
>>
>> void
>> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
>> {
>>          TupleTableSlot  *batchslot;
>>          HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>>          Size sz;
>>
>>          Assert(mistate && mistate->slots);
>>
>>          if (mistate->slots[mistate->cur_slots] == NULL)
>>                  mistate->slots[mistate->cur_slots] =
>>                                                                          table_slot_create(state->rel, NULL);
>>
>>          batchslot = mistate->slots[mistate->cur_slots];
>>
>>          ExecClearTuple(batchslot);
>>          ExecCopySlot(batchslot, slot);
>>
>>          /*
>>           * Calculate the tuple size after the original slot is copied, because the
>>           * copied slot type and the tuple size may change.
>>           */
>>          sz = GetTupleSize(batchslot, mistate->max_size);
>>
>>          Assert(sz > 0);
>>
>>          mistate->cur_slots++;
>>          mistate->cur_size += sz;
>>
>>          if (mistate->cur_slots >= mistate->max_slots ||
>>                  mistate->cur_size >= mistate->max_size)
>>                  heap_multi_insert_flush(state);
>> }
> 
> I think clearing tuples before copying the slot as you suggested may
> work without the need of clear_slots flag.

ok, cool :)

> 
>>
>>>   > Also, why do we want to do ExecClearTuple() anyway? Isn't
>>>   > it good enough that the next call to ExecCopySlot will effectively clear
>>>   > it out?
>>>
>>> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
>>> slot before copying. But, for buffer heap slots, the
>>> tts_buffer_heap_copyslot does not always clear the destination slot, see
>>> below. If we fall into else condition, we might get some issues. And
>>> also note that, once the slot is cleared in ExecClearTuple, it will not
>>> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
>>> false. That is why, let's have ExecClearTuple as is.
>>>
>> I had no idea the buffer heap slot doesn't unconditionally clear out the
>> slot :( So yes lets call it unconditionally ourselves. See also
>> suggestion above.
> 
> Yeah, we will clear the tuple slot before copy to be on the safer side.
> 

ok

>>>       /*
>>>        * If the source slot is of a different kind, or is a buffer slot
>>> that has
>>>        * been materialized / is virtual, make a new copy of the tuple.
>>> Otherwise
>>>        * make a new reference to the in-buffer tuple.
>>>        */
>>>       if (dstslot->tts_ops != srcslot->tts_ops ||
>>>           TTS_SHOULDFREE(srcslot) ||
>>>           !bsrcslot->base.tuple)
>>>       {
>>>           MemoryContext oldContext;
>>>
>>>           ExecClearTuple(dstslot);
>>>       }
>>>       else
>>>       {
>>>           Assert(BufferIsValid(bsrcslot->buffer));
>>>
>>>           tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>>>                                       bsrcslot->buffer, false);
>>>
>>>   > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>>>   > by cur_slots/cur_size == 0?
>>>
>>> Note that cur_slots is in HeapMultiInsertState and outside of the new
>>> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
>>> really access the cur_slots. I mean, we can access but we need to be
>>> dereferencing using the tableam kind. Instead of  doing all of that, to
>>> keep the API cleaner, I chose to have a boolean in the TableInsertState
>>> which we can see and use outside of the new APIs. Hope that's fine.
>>>
>> So you mean the flushed variable is actually there to tell the user of
>> the API that they are supposed to call flush before end? Why can't the
>> end call flush itself then? I guess I completely misunderstood the
>> purpose of table_multi_insert_flush being public. I had assumed it is
>> there to from the usage site indicate that now would be a good time to
>> flush, e.g. because of a statement ending or something. I had not
>> understood this is a requirement that its always required to do
>> table_multi_insert_flush + table_insert_end.
>> IMHO I would hide this from the callee, given that you would only really
>> call flush yourself when you immediately after would call end, or are
>> there other cases where one would be required to explicitly call flush?
> 
> We need to know outside the multi_insert API whether the buffered
> slots in case of multi inserts are flushed. Reason is that if we have
> indexes or after row triggers, currently we call ExecInsertIndexTuples
> or ExecARInsertTriggers on the buffered slots outside the API in a
> loop after the flush.
> 
> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers. And in
> 
> If we don't want to go with callback way, then at least we need to
> know whether or not heap_insert_v2 has chosen multi inserts, if yes,
> the buffered slots array, and the number of current buffered slots,
> whether they are flushed or not in the TableInsertState. Then,
> eventually, we might need all the HeapMultiInsertState info in the
> TableInsertState.
> 

To me the callback API seems cleaner, that on heap_insert_begin we can 
pass in a callback that is called on every flushed slot, or only on 
multi-insert flushes. Is there a reason it would only be done for 
multi-insert flushes or can it be generic?

> Thoughts?
> 
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
> 

Hi,

Replied inline.

Kind regards,
Luc





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-16 23:04  Jeff Davis <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  2 siblings, 1 reply; 43+ messages in thread

From: Jeff Davis @ 2021-01-16 23:04 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; Luc Vlaming <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>


> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We
> can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers.

How should the API interact with INSERT INTO ... SELECT? Right now it
doesn't appear to be integrated at all, but that seems like a fairly
important path for bulk inserts.

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-18 07:58  Luc Vlaming <[email protected]>
  parent: Jeff Davis <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Luc Vlaming @ 2021-01-18 07:58 UTC (permalink / raw)
  To: Jeff Davis <[email protected]>; Bharath Rupireddy <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>

On 17-01-2021 00:04, Jeff Davis wrote:
> 
>> If we agree on removing heap_multi_insert_v2 API and embed that logic
>> inside heap_insert_v2, then we can do this - pass the required
>> information and the functions ExecInsertIndexTuples and
>> ExecARInsertTriggers as callbacks so that, whether or not
>> heap_insert_v2 choses single or multi inserts, it can callback these
>> functions with the required information passed after the flush. We
>> can
>> add the callback and required information into TableInsertState. But,
>> I'm not quite sure, we would make ExecInsertIndexTuples and
>> ExecARInsertTriggers.
> 
> How should the API interact with INSERT INTO ... SELECT? Right now it
> doesn't appear to be integrated at all, but that seems like a fairly
> important path for bulk inserts.
> 
> Regards,
> 	Jeff Davis
> 
> 

Hi,

You mean how it could because of that the table modification API uses 
the table_tuple_insert_speculative ? Just wondering if you think if it 
generally cannot work or would like to see that path / more paths 
integrated in to the patch.

Kind regards,
Luc





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-01-19 17:33  Jeff Davis <[email protected]>
  parent: Luc Vlaming <[email protected]>
  0 siblings, 0 replies; 43+ messages in thread

From: Jeff Davis @ 2021-01-19 17:33 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; Bharath Rupireddy <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>

On Mon, 2021-01-18 at 08:58 +0100, Luc Vlaming wrote:
> You mean how it could because of that the table modification API
> uses 
> the table_tuple_insert_speculative ? Just wondering if you think if
> it 
> generally cannot work or would like to see that path / more paths 
> integrated in to the patch.

I think the patch should support INSERT INTO ... SELECT, and it will be
easier to tell if we have the right API when that's integrated.

Regards,
	Jeff Davis







^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-02-17 07:16  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  2 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2021-02-17 07:16 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

Hi,

I addressed the following review comments and attaching v3 patch set.

1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
and this allowed us to remove clear_mi_slots flag from
TableInsertState.
2) I retained the flushed variable inside TableInsertState so that the
callers can know whether the buffered slots have been flushed. If yes,
the callers can execute after insert row triggers or perform index
insertions. This is easier than passing the after insert row triggers
info and index info to new multi insert table am and let it do. This
way the functionalities can be kept separate i.e. multi insert ams do
only buffering, decisions on when to flush, insertions and the callers
will execute triggers or index insertions. And also none of the
existing table ams are performing these operations within them, so
this is inline with the current design of the table ams.
3) I have kept the single and multi insert API separate. The previous
suggestion was to have only a single insert API and let the callers
provide initially whether they want multi or single inserts. One
problem with that approach is that we have to allow table ams to
execute the after row triggers or index insertions. That is something
I personally don't like.

0001 - new table ams implementation
0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
MATERIALIZED VIEW
0003 - the new multi table ams used in COPY

Please review the v3 patch set further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/x-patch] v3-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (19.4K, 2-v3-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From 49060fdc2c2a2e6caf1a489fcd16cafd0e1e20a3 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Feb 2021 11:06:35 +0530
Subject: [PATCH v3] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c         | 212 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 ++++++++-
 src/include/access/heapam.h              |  49 +++++-
 src/include/access/tableam.h             |  87 ++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 438 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 9926e2bd54..789228aafb 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2522,6 +2523,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mi_slots = NULL;
+	state->mistate = NULL;
+	state->mi_cur_slots = 0;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc(sizeof(HeapMultiInsertState));
+		state->mi_slots =
+				palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert",
+												 ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from slot into table. This is like heap_insert(), the only
+ * difference is that the parameters for insertion are inside table insert
+ * state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	Assert(state);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute size of tuple. See if the buffered slots can hold the tuple. If yes,
+ * store it in the buffers, otherwise flush i.e. insert the so far buffered
+ * tuples into heap.
+ *
+ * Flush can happen:
+ *  1) either if all the buffered slots are filled up
+ *  2) or if total tuple size of the currently buffered slots are >= max_size
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+	HeapMultiInsertState *mistate;
+	Size sz;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots);
+
+	/* Reset flush state if previously set. */
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+	}
+
+	Assert(state->mi_cur_slots < mistate->max_slots);
+
+	if (state->mi_slots[state->mi_cur_slots] == NULL)
+		state->mi_slots[state->mi_cur_slots] =
+									table_slot_create(state->rel, NULL);
+
+	batchslot = state->mi_slots[state->mi_cur_slots];
+
+	ExecClearTuple(batchslot);
+	ExecCopySlot(batchslot, slot);
+
+	/*
+	 * Calculate tuple size after original slot is copied, because the copied
+	 * slot type and tuple size may change.
+	 */
+	sz = GetTupleSize(batchslot, mistate->max_size);
+
+	Assert(sz > 0);
+
+	state->mi_cur_slots++;
+	mistate->cur_size += sz;
+
+	if (state->mi_cur_slots >= mistate->max_slots ||
+		mistate->cur_size >= mistate->max_size)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * heap_multi_insert_flush - flush buffered tuples, if any, into a heap
+ *
+ * Flush the buffered tuples, indicate caller that flushing happened and reset
+ * parameters.
+ */
+void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots && state->mi_cur_slots >= 0 &&
+		   mistate->context);
+
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+		return;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, state->mi_slots, state->mi_cur_slots,
+					  state->cid, state->options, state->bistate);
+	MemoryContextReset(mistate->context);
+	MemoryContextSwitchTo(oldcontext);
+
+	state->flushed = true;
+	mistate->cur_size = 0;
+}
+
+/*
+ * heap_insert_end - clean up TableInsertState
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function.
+ *
+ * In this function, buffered slots are dropped, short-lived memory context is
+ * deleted, mistate and TableInsertState are freed up.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	int i;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	if (!mistate)
+	{
+		pfree(state);
+		return;
+	}
+
+	Assert(state->mi_slots &&  mistate->context);
+
+	/* Ensure that the buffers have been flushed before. */
+	Assert(state->mi_cur_slots == 0 || state->flushed);
+
+	for (i = 0; i < mistate->max_slots && state->mi_slots[i] != NULL; i++)
+		ExecDropSingleTupleTableSlot(state->mi_slots[i]);
+
+	MemoryContextDelete(mistate->context);
+	pfree(mistate);
+	pfree(state->mi_slots);
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 4a70e20a14..4249b661af 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2555,6 +2555,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 325ecdc122..95f1f9b6a0 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 73c35df9c9..79ae22455a 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also used
+	 * in GetTupleSize(), hence ensure to have the same changes or fixes here
+	 * and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code being used here is from
+		 * tts_virtual_materialize(), ensure to have the same changes or fixes
+		 * here and also there.
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 60e5cd3109..c98cffbeac 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,11 +36,26 @@
 #define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
 #define HEAP_INSERT_SPECULATIVE 0x0010
 
-typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
+/*
+ * No more than this many tuples per single multi insert batch
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. Increasing this can cause quadratic growth in
+ * memory requirements during copies into partitioned tables with a large
+ * number of partitions.
+ */
+#define MAX_BUFFERED_TUPLES		1000
+
+/*
+ * Flush multi insert buffers if there are >= this many bytes, as counted by
+ * the size of the tuples buffered.
+ */
+#define MAX_BUFFERED_BYTES		65535
+
 /*
  * Descriptor for heap table scans.
  */
@@ -93,6 +108,25 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/* Holds multi insert state for heap access method.*/
+typedef struct HeapMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold.
+	 */
+	int64               max_size;
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered.
+	 * Flush the buffered slots when cur_size >= max_size.
+	 */
+	int64               cur_size;
+} HeapMultiInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -130,15 +164,20 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 
 extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
-extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
-extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool is_multi);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 33bffb6815..5fb00149ff 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -228,6 +228,32 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	Relation	rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData	*bistate;
+	CommandId	cid;
+	int	options;
+	/* Below members are only used for multi inserts. */
+	/* Array of buffered slots. */
+	TupleTableSlot	**mi_slots;
+	/* Number of slots that are currently buffered. */
+	int32	mi_cur_slots;
+	/*
+	 * Access method specific information such as parameters that are needed
+	 * for buffering and flushing decisions can go here.
+	 */
+	void	*mistate;
+	/*
+	 * This parameter indicates whether or not the buffered slots have been
+	 * flushed to a table. Used by callers of multi insert API for inserting
+	 * into indexes or executing after row triggers, if any.
+	 */
+	bool	flushed;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -475,6 +501,17 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool is_multi);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -822,6 +859,8 @@ typedef struct TableAmRoutine
 } TableAmRoutine;
 
 
+typedef struct BulkInsertStateData *BulkInsertState;
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -840,6 +879,10 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
+/* Bulk insert state functions. */
+extern BulkInsertState GetBulkInsertState(void);
+extern void FreeBulkInsertState(BulkInsertState);
+extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1343,6 +1386,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi)
+{
+	TableInsertState *state = rel->rd_tableam->tuple_insert_begin(rel, cid,
+										options, is_multi);
+
+	/* Allocate bulk insert state here, since it's AM independent. */
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	return state;
+}
+
+static inline void
+table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* Deallocate bulk insert state here, since it's AM independent. */
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 679e57fbdd..1f59614183 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



  [application/x-patch] v3-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch (6.0K, 3-v3-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch)
  download | inline diff:
From 321c64cb070e5e0b083634e057f267c80717b2f1 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Feb 2021 11:00:53 +0530
Subject: [PATCH v3] CTAS and REFRESH Mat View With New Multi Insert Table AM

This patch adds new multi insert table access methods to
CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED
VIEW.
---
 src/backend/commands/createas.c | 49 +++++++++++++++++----------------
 src/backend/commands/matview.c  | 35 ++++++++++++-----------
 2 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..36ad0ef698 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -546,22 +544,26 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	if (is_matview && !into->skipData)
 		SetMatViewPopulatedState(intoRelationDesc, true);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->rel = intoRelationDesc;
-	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
-
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * bulk inserts and multi inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		myState->istate = table_insert_begin(intoRelationDesc,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 true,
+											 true);
+	}
 	else
-		myState->bistate = NULL;
+		myState->istate = NULL;
+
+	/*
+	 * Fill private fields of myState for use by later routines
+	 */
+	myState->rel = intoRelationDesc;
+	myState->reladdr = intoRelationAddr;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -589,11 +591,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->istate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -608,12 +606,17 @@ static void
 intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	IntoClause *into = myState->into;
+	int ti_options;
 
-	if (!into->skipData)
+	if (!myState->into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		ti_options = myState->istate->options;
+
+		table_multi_insert_flush(myState->istate);
+
+		table_insert_end(myState->istate);
+
+		table_finish_bulk_insert(myState->rel, ti_options);
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..9c6b5f8525 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -466,10 +463,11 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	/*
 	 * Fill private fields of myState for use by later routines
 	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->istate = table_insert_begin(transientrel,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN,
+										 true,
+										 true);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -494,12 +492,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->istate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -513,14 +506,20 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	int ti_options;
+	Relation transientrel;
+
+	ti_options = myState->istate->options;
+	transientrel = myState->istate->rel;
+
+	table_multi_insert_flush(myState->istate);
 
-	FreeBulkInsertState(myState->bistate);
+	table_insert_end(myState->istate);
 
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_finish_bulk_insert(transientrel, ti_options);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.25.1



  [application/x-patch] v3-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch (22.2K, 4-v3-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch)
  download | inline diff:
From 74d658da3f24b8aae0e538260c1bb536e263a702 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Feb 2021 11:56:30 +0530
Subject: [PATCH v3] COPY With New Multi and Single Insert Table AM

This patch adds new single and multi insert table access method to
COPY code.
---
 src/backend/commands/copyfrom.c | 474 ++++++++++----------------------
 1 file changed, 150 insertions(+), 324 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 796ca7b3f7..26174a351b 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -46,54 +46,17 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * No more than this many tuples per CopyMultiInsertBuffer
- *
- * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
- * multiInsertBuffers list.  Increasing this can cause quadratic growth in
- * memory requirements during copies into partitioned tables with a large
- * number of partitions.
- */
-#define MAX_BUFFERED_TUPLES		1000
-
-/*
- * Flush buffers if there are >= this many bytes, as counted by the input
- * size, of tuples stored.
- */
-#define MAX_BUFFERED_BYTES		65535
-
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
-												 * stream */
+	TableInsertState *istate;
+	/* Line # of tuple in copy stream. */
+	uint64		linenos[MAX_BUFFERED_TUPLES];
 } CopyMultiInsertBuffer;
 
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState	cstate;			/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */
-	CommandId	mycid;			/* Command Id used for COPY */
-	int			ti_options;		/* table insert options */
-} CopyMultiInsertInfo;
-
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
@@ -210,143 +173,61 @@ limit_printout_length(const char *str)
  * Allocate memory and initialize a new CopyMultiInsertBuffer for this
  * ResultRelInfo.
  */
-static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
-{
-	CopyMultiInsertBuffer *buffer;
-
-	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
-	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
-	buffer->nused = 0;
-
-	return buffer;
-}
-
-/*
- * Make a new buffer for this ResultRelInfo.
- */
-static inline void
-CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+static void
+InitCopyMultiInsertBufferInfo(List **mirri, ResultRelInfo *rri,
+							  CommandId mycid, int ti_options)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = (CopyMultiInsertBuffer *) palloc0(sizeof(CopyMultiInsertBuffer));
+	buffer->istate = table_insert_begin(rri->ri_RelationDesc, mycid,
+										ti_options, true, true);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
-	/* Record that we're tracking this buffer */
-	miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
+	*mirri = lappend(*mirri, rri);
 }
 
 /*
- * Initialize an already allocated CopyMultiInsertInfo.
- *
- * If rri is a non-partitioned table then a CopyMultiInsertBuffer is set up
- * for that table.
+ * Run AFTER ROW INSERT triggers or insert into indexes, if any, after buffered
+ * tuples are flushed to table.
  */
 static void
-CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						CopyFromState cstate, EState *estate, CommandId mycid,
-						int ti_options)
+HandleAfterRowEvents(ResultRelInfo *rri, EState *estate,
+					 CopyFromState cstate, int32 cur_slots)
 {
-	miinfo->multiInsertBuffers = NIL;
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
-	miinfo->cstate = cstate;
-	miinfo->estate = estate;
-	miinfo->mycid = mycid;
-	miinfo->ti_options = ti_options;
-
-	/*
-	 * Only setup the buffer when not dealing with a partitioned table.
-	 * Buffers for partitioned tables will just be setup when we need to send
-	 * tuples their way for the first time.
-	 */
-	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
-}
-
-/*
- * Returns true if the buffers are full
- */
-static inline bool
-CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
-{
-	if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
-		miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
-		return true;
-	return false;
-}
-
-/*
- * Returns true if we have no buffered tuples
- */
-static inline bool
-CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
-{
-	return miinfo->bufferedTuples == 0;
-}
-
-/*
- * Write the tuples stored in 'buffer' out to the table.
- */
-static inline void
-CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
-{
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
-	CopyFromState	cstate = miinfo->cstate;
-	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
-	int			nused = buffer->nused;
-	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
-	TupleTableSlot **slots = buffer->slots;
+	int i;
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	uint64	save_cur_lineno = cstate->cur_lineno;
+	bool	line_buf_valid = cstate->line_buf_valid;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fail.
-	 */
 	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	for (i = 0; i < cur_slots; i++)
 	{
 		/*
 		 * If there are any indexes, update them for all the inserted tuples,
 		 * and run AFTER ROW INSERT triggers.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		if (rri->ri_NumIndices > 0)
 		{
-			List	   *recheckIndexes;
+			List       *recheckIndexes;
 
 			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
+			recheckIndexes = ExecInsertIndexTuples(rri,
+													istate->mi_slots[i], estate,
+													false,
+													false,
+													NULL,
+													NULL);
+
+			ExecARInsertTriggers(estate,
+								rri,
+								istate->mi_slots[i],
+								recheckIndexes,
+								cstate->transition_capture);
+
 			list_free(recheckIndexes);
 		}
 
@@ -354,79 +235,69 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 * There's no indexes, but see if we need to run AFTER ROW INSERT
 		 * triggers anyway.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		else if (rri->ri_TrigDesc != NULL &&
+				 (rri->ri_TrigDesc->trig_insert_after_row ||
+				  rri->ri_TrigDesc->trig_insert_new_table))
 		{
 			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mi_slots[i],
+								 NULL,
+								 cstate->transition_capture);
 		}
-
-		ExecClearTuple(slots[i]);
 	}
 
-	/* Mark that all slots are free */
-	buffer->nused = 0;
-
 	/* reset cur_lineno and line_buf_valid to what they were */
 	cstate->line_buf_valid = line_buf_valid;
 	cstate->cur_lineno = save_cur_lineno;
-}
+ }
 
 /*
- * Drop used slots and free member for this buffer.
- *
- * The buffer must be flushed before cleanup.
+ * Store tuple from the incoming slot into buffered slots.
  */
-static inline void
-CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
-							 CopyMultiInsertBuffer *buffer)
+static void
+CopyMultiInsertBufferTuple(ResultRelInfo *rri, TupleTableSlot *slot,
+						   CopyFromState cstate, EState *estate)
 {
-	int			i;
-
-	/* Ensure buffer was flushed */
-	Assert(buffer->nused == 0);
-
-	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-
-	FreeBulkInsertState(buffer->bistate);
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
 
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	table_multi_insert_v2(buffer->istate, slot);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	buffer->linenos[istate->mi_cur_slots - 1] = cstate->cur_lineno;
 
-	pfree(buffer);
+	if (istate->flushed)
+		HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 }
 
 /*
- * Write out all stored tuples in all buffers out to the tables.
- *
- * Once flushed we also trim the tracked buffers list down to size by removing
- * the buffers created earliest first.
- *
- * Callers should pass 'curr_rri' is the ResultRelInfo that's currently being
- * used.  When cleaning up old buffers we'll never remove the one for
- * 'curr_rri'.
+ * Flush tuples into table from the buffered slots.
  */
-static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
-{
+static void
+CopyMulitInsertFlushBuffers(List **mirri, ResultRelInfo *curr_rri,
+							CopyFromState cstate, EState *estate)
+ {
 	ListCell   *lc;
+	ResultRelInfo *rri;
+	CopyMultiInsertBuffer *buffer;
+	TableInsertState *istate;
 
-	foreach(lc, miinfo->multiInsertBuffers)
+	foreach(lc, *mirri)
 	{
-		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
+		rri = lfirst(lc);
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+
+		table_multi_insert_flush(istate);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		if (istate->flushed)
+			HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 	}
 
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
+	rri = NULL;
+	buffer = NULL;
+	istate = NULL;
 
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
@@ -434,87 +305,59 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	 * likely that these older ones will be needed than the ones that were
 	 * just created.
 	 */
-	while (list_length(miinfo->multiInsertBuffers) > MAX_PARTITION_BUFFERS)
+	while (list_length(*mirri) > MAX_PARTITION_BUFFERS)
 	{
-		CopyMultiInsertBuffer *buffer;
+		int ti_options;
 
-		buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+		rri = (ResultRelInfo *) linitial(*mirri);
 
 		/*
 		 * We never want to remove the buffer that's currently being used, so
 		 * if we happen to find that then move it to the end of the list.
 		 */
-		if (buffer->resultRelInfo == curr_rri)
+		if (rri == curr_rri)
 		{
-			miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-			miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-			buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+			*mirri = list_delete_first(*mirri);
+			*mirri = lappend(*mirri, rri);
+			rri = (ResultRelInfo *) linitial(*mirri);
 		}
 
-		CopyMultiInsertBufferCleanup(miinfo, buffer);
-		miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-	}
-}
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+		ti_options = istate->options;
 
-/*
- * Cleanup allocated buffers and free memory
- */
-static inline void
-CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
-{
-	ListCell   *lc;
-
-	foreach(lc, miinfo->multiInsertBuffers)
-		CopyMultiInsertBufferCleanup(miinfo, lfirst(lc));
+		table_insert_end(istate);
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	list_free(miinfo->multiInsertBuffers);
+		*mirri = list_delete_first(*mirri);
+	}
 }
 
 /*
- * Get the next TupleTableSlot that the next tuple should be stored in.
- *
- * Callers must ensure that the buffer is not full.
- *
- * Note: 'miinfo' is unused but has been included for consistency with the
- * other functions in this area.
+ * Drop the buffered slots.
  */
-static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+static void
+CopyMulitInsertDropBuffers(List *mirri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
-
-	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	ListCell   *lc;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
-}
+	foreach(lc, mirri)
+	{
+		int ti_options;
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
 
-/*
- * Record the previously reserved TupleTableSlot that was reserved by
- * CopyMultiInsertInfoNextFreeSlot as being consumed.
- */
-static inline void
-CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						 TupleTableSlot *slot, int tuplen, uint64 lineno)
-{
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		ti_options = istate->options;
 
-	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+		table_insert_end(istate);
 
-	/* Store the line number so we can properly report any errors later */
-	buffer->linenos[buffer->nused] = lineno;
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	/* Record this slot as being used */
-	buffer->nused++;
+		pfree(buffer);
+	}
 
-	/* Update how many tuples are stored and their size */
-	miinfo->bufferedTuples++;
-	miinfo->bufferedBytes += tuplen;
+	list_free(mirri);
 }
 
 /*
@@ -529,20 +372,20 @@ CopyFrom(CopyFromState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *singleslot = NULL;
+	TupleTableSlot *slot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
-	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
-	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	uint64		processed = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	List	   *multi_insert_rris = NULL;
+	TableInsertState *istate = NULL;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -726,7 +569,7 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables we can't support multi-inserts when there
 		 * are any statement level insert triggers. It might be possible to
 		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * now, CopyMulitInsertFlushBuffers expects that any before row insert
 		 * and statement level insert triggers are on the same relation.
 		 */
 		insertMethod = CIM_SINGLE;
@@ -774,22 +617,22 @@ CopyFrom(CopyFromState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		/*
+		* Only setup the buffer when not dealing with a partitioned table.
+		* Buffers for partitioned tables will just be setup when we need to
+		* send tuples their way for the first time.
+		*/
+		if (!proute)
+			InitCopyMultiInsertBufferInfo(&multi_insert_rris, resultRelInfo,
+										  mycid, ti_options);
 	}
 
 	/*
-	 * If not using batch mode (which allocates slots as needed) set up a
-	 * tuple slot too. When inserting into a partitioned table, we also need
-	 * one, even if we might batch insert, to read the tuple in the root
-	 * partition's form.
+	 * Set up a tuple slot to which the input data from copy stream is read
+	 * into and used for inserts into table.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
-	{
-		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
-									   &estate->es_tupleTable);
-		bistate = GetBulkInsertState();
-	}
+	slot = table_slot_create(resultRelInfo->ri_RelationDesc,
+							 &estate->es_tupleTable);
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 								  resultRelInfo->ri_TrigDesc->trig_insert_before_row);
@@ -827,19 +670,8 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
-		{
-			myslot = singleslot;
-			Assert(myslot != NULL);
-		}
-		else
-		{
-			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
-
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
-		}
+		myslot = slot;
+		Assert(myslot != NULL);
 
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
@@ -907,21 +739,22 @@ CopyFrom(CopyFromState cstate)
 				if (leafpart_use_multi_insert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
-						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+						InitCopyMultiInsertBufferInfo(&multi_insert_rris,
+													  resultRelInfo, mycid,
+													  ti_options);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (insertMethod == CIM_MULTI_CONDITIONAL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMulitInsertFlushBuffers(&multi_insert_rris,
+												resultRelInfo, cstate, estate);
 				}
 
-				if (bistate != NULL)
-					ReleaseBulkInsertStatePin(bistate);
+				if (istate && istate->bistate)
+					ReleaseBulkInsertStatePin(istate->bistate);
 				prevResultRelInfo = resultRelInfo;
 			}
 
@@ -963,8 +796,8 @@ CopyFrom(CopyFromState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+											  &estate->es_tupleTable);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -1036,24 +869,9 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
-					CopyMultiInsertInfoStore(&multiInsertInfo,
-											 resultRelInfo, myslot,
-											 cstate->line_buf.len,
-											 cstate->cur_lineno);
-
-					/*
-					 * If enough inserts have queued up, then flush all
-					 * buffers out to their tables.
-					 */
-					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertBufferTuple(resultRelInfo, myslot, cstate,
+											   estate);
 				}
 				else
 				{
@@ -1079,9 +897,19 @@ CopyFrom(CopyFromState cstate)
 					}
 					else
 					{
+						if (!istate)
+						{
+							istate = table_insert_begin(resultRelInfo->ri_RelationDesc,
+														mycid,
+														ti_options,
+														true,
+														false);
+						}
+
+						istate->rel = resultRelInfo->ri_RelationDesc;
+
 						/* OK, store the tuple and create index entries for it */
-						table_tuple_insert(resultRelInfo->ri_RelationDesc,
-										   myslot, mycid, ti_options, bistate);
+						table_insert_v2(istate, myslot);
 
 						if (resultRelInfo->ri_NumIndices > 0)
 							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
@@ -1113,16 +941,14 @@ CopyFrom(CopyFromState cstate)
 
 	/* Flush any remaining buffered tuples */
 	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+		CopyMulitInsertFlushBuffers(&multi_insert_rris, resultRelInfo,
+									cstate, estate);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (bistate != NULL)
-		FreeBulkInsertState(bistate);
+	if (istate)
+		table_insert_end(istate);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1149,7 +975,7 @@ CopyFrom(CopyFromState cstate)
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+		CopyMulitInsertDropBuffers(multi_insert_rris);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-02-20 05:45  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2021-02-20 05:45 UTC (permalink / raw)
  To: Luc Vlaming <[email protected]>; +Cc: Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Wed, Feb 17, 2021 at 12:46 PM Bharath Rupireddy
<[email protected]> wrote:
> Hi,
>
> I addressed the following review comments and attaching v3 patch set.
>
> 1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
> and this allowed us to remove clear_mi_slots flag from
> TableInsertState.
> 2) I retained the flushed variable inside TableInsertState so that the
> callers can know whether the buffered slots have been flushed. If yes,
> the callers can execute after insert row triggers or perform index
> insertions. This is easier than passing the after insert row triggers
> info and index info to new multi insert table am and let it do. This
> way the functionalities can be kept separate i.e. multi insert ams do
> only buffering, decisions on when to flush, insertions and the callers
> will execute triggers or index insertions. And also none of the
> existing table ams are performing these operations within them, so
> this is inline with the current design of the table ams.
> 3) I have kept the single and multi insert API separate. The previous
> suggestion was to have only a single insert API and let the callers
> provide initially whether they want multi or single inserts. One
> problem with that approach is that we have to allow table ams to
> execute the after row triggers or index insertions. That is something
> I personally don't like.
>
> 0001 - new table ams implementation
> 0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
> MATERIALIZED VIEW
> 0003 - the new multi table ams used in COPY
>
> Please review the v3 patch set further.

Below is the performance gain measured for CREATE TABLE AS with the
new multi insert am propsed in this thread:

case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
59 bytes, 100mn tuples
on master - 185sec
on master with multi inserts - 121sec, gain - 1.52X

case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
on master - 367sec
on master with multi inserts - 291sec, gain - 1.26X

case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
on master - 130sec
on master with multi inserts - 105sec, gain - 1.23X

case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
columns, tuple size 1064 bytes, 10mn tuples
on master - 120sec
on master with multi inserts - 115sec, gain - 1.04X

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-02-20 07:25  Zhihong Yu <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Zhihong Yu @ 2021-02-20 07:25 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

Hi,
bq. case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes

Is there some other column(s) per row apart from the integer columns ?
Since the 2 integer columns only occupy 8 bytes. I wonder where the other
32-8=24 bytes come from.

Thanks

On Fri, Feb 19, 2021 at 9:45 PM Bharath Rupireddy <
[email protected]> wrote:

> On Wed, Feb 17, 2021 at 12:46 PM Bharath Rupireddy
> <[email protected]> wrote:
> > Hi,
> >
> > I addressed the following review comments and attaching v3 patch set.
> >
> > 1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
> > and this allowed us to remove clear_mi_slots flag from
> > TableInsertState.
> > 2) I retained the flushed variable inside TableInsertState so that the
> > callers can know whether the buffered slots have been flushed. If yes,
> > the callers can execute after insert row triggers or perform index
> > insertions. This is easier than passing the after insert row triggers
> > info and index info to new multi insert table am and let it do. This
> > way the functionalities can be kept separate i.e. multi insert ams do
> > only buffering, decisions on when to flush, insertions and the callers
> > will execute triggers or index insertions. And also none of the
> > existing table ams are performing these operations within them, so
> > this is inline with the current design of the table ams.
> > 3) I have kept the single and multi insert API separate. The previous
> > suggestion was to have only a single insert API and let the callers
> > provide initially whether they want multi or single inserts. One
> > problem with that approach is that we have to allow table ams to
> > execute the after row triggers or index insertions. That is something
> > I personally don't like.
> >
> > 0001 - new table ams implementation
> > 0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
> > MATERIALIZED VIEW
> > 0003 - the new multi table ams used in COPY
> >
> > Please review the v3 patch set further.
>
> Below is the performance gain measured for CREATE TABLE AS with the
> new multi insert am propsed in this thread:
>
> case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> 59 bytes, 100mn tuples
> on master - 185sec
> on master with multi inserts - 121sec, gain - 1.52X
>
> case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> on master - 367sec
> on master with multi inserts - 291sec, gain - 1.26X
>
> case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn
> tuples
> on master - 130sec
> on master with multi inserts - 105sec, gain - 1.23X
>
> case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> columns, tuple size 1064 bytes, 10mn tuples
> on master - 120sec
> on master with multi inserts - 115sec, gain - 1.04X
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>


^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-02-20 07:50  Bharath Rupireddy <[email protected]>
  parent: Zhihong Yu <[email protected]>
  0 siblings, 0 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2021-02-20 07:50 UTC (permalink / raw)
  To: Zhihong Yu <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Sat, Feb 20, 2021 at 12:53 PM Zhihong Yu <[email protected]> wrote:
>
> Hi,
> bq. case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes
>
> Is there some other column(s) per row apart from the integer columns ? Since the 2 integer columns only occupy 8 bytes. I wonder where the other 32-8=24 bytes come from.

There are no other columns in the test case. Those 24 bytes are for
tuple header(23bytes) and 1 byte for other bookkeeping info. See
"Table Row Layout" from
https://www.postgresql.org/docs/devel/storage-page-layout.html.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-03-08 13:07  Dilip Kumar <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Dilip Kumar @ 2021-03-08 13:07 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Sat, Feb 20, 2021 at 11:15 AM Bharath Rupireddy
<[email protected]> wrote:

> > Please review the v3 patch set further.
>
> Below is the performance gain measured for CREATE TABLE AS with the
> new multi insert am propsed in this thread:
>
> case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> 59 bytes, 100mn tuples
> on master - 185sec
> on master with multi inserts - 121sec, gain - 1.52X
>
> case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> on master - 367sec
> on master with multi inserts - 291sec, gain - 1.26X
>
> case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
> on master - 130sec
> on master with multi inserts - 105sec, gain - 1.23X
>
> case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> columns, tuple size 1064 bytes, 10mn tuples
> on master - 120sec
> on master with multi inserts - 115sec, gain - 1.04X

Performance numbers look good, especially with the smaller tuple size.
I was looking into the patch and I have a question.

+static inline void
+table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+ state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+ state->rel->rd_tableam->multi_insert_v2(state, slot);
+}

Why do we need to invent a new version table_insert_v2?  And also why
it is named table_insert* instead of table_tuple_insert*?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-03-09 08:15  Bharath Rupireddy <[email protected]>
  parent: Dilip Kumar <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2021-03-09 08:15 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Mon, Mar 8, 2021 at 6:37 PM Dilip Kumar <[email protected]> wrote:
>
> On Sat, Feb 20, 2021 at 11:15 AM Bharath Rupireddy
> <[email protected]> wrote:
>
> > > Please review the v3 patch set further.
> >
> > Below is the performance gain measured for CREATE TABLE AS with the
> > new multi insert am propsed in this thread:
> >
> > case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> > 59 bytes, 100mn tuples
> > on master - 185sec
> > on master with multi inserts - 121sec, gain - 1.52X
> >
> > case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> > columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> > on master - 367sec
> > on master with multi inserts - 291sec, gain - 1.26X
> >
> > case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
> > on master - 130sec
> > on master with multi inserts - 105sec, gain - 1.23X
> >
> > case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> > columns, tuple size 1064 bytes, 10mn tuples
> > on master - 120sec
> > on master with multi inserts - 115sec, gain - 1.04X
>
> Performance numbers look good, especially with the smaller tuple size.

Thanks.

> I was looking into the patch and I have a question.
>
> +static inline void
> +table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> + state->rel->rd_tableam->tuple_insert_v2(state, slot);
> +}
> +
> +static inline void
> +table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> + state->rel->rd_tableam->multi_insert_v2(state, slot);
> +}
>
> Why do we need to invent a new version table_insert_v2?  And also why
> it is named table_insert* instead of table_tuple_insert*?

New version, because we changed the input parameters, now passing the
params via TableInsertState but existing table_tuple_insert doesn't do
that. If okay, I can change table_insert_v2  to table_tuple_insert_v2?
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-03-10 04:51  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2021-03-10 04:51 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Tue, Mar 9, 2021 at 1:45 PM Bharath Rupireddy
<[email protected]> wrote:
> On Mon, Mar 8, 2021 at 6:37 PM Dilip Kumar <[email protected]> wrote:
>>
> > Why do we need to invent a new version table_insert_v2?  And also why
> > it is named table_insert* instead of table_tuple_insert*?
>
> New version, because we changed the input parameters, now passing the
> params via TableInsertState but existing table_tuple_insert doesn't do
> that. If okay, I can change table_insert_v2  to table_tuple_insert_v2?
> Thoughts?

Changed table_insert_v2 to table_tuple_insert_v2. And also, rebased
the patches on to the latest master.

Attaching the v4 patch set. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/x-patch] v4-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (19.4K, 2-v4-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From 6518212583e24b017375512701d9fefa6de20e42 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:53:48 +0530
Subject: [PATCH v4 1/3] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c         | 212 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 ++++++++-
 src/include/access/heapam.h              |  49 +++++-
 src/include/access/tableam.h             |  87 ++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 438 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3b435c107d..d8bfe17f22 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2669,6 +2670,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mi_slots = NULL;
+	state->mistate = NULL;
+	state->mi_cur_slots = 0;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc(sizeof(HeapMultiInsertState));
+		state->mi_slots =
+				palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert",
+												 ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from slot into table. This is like heap_insert(), the only
+ * difference is that the parameters for insertion are inside table insert
+ * state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	Assert(state);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute size of tuple. See if the buffered slots can hold the tuple. If yes,
+ * store it in the buffers, otherwise flush i.e. insert the so far buffered
+ * tuples into heap.
+ *
+ * Flush can happen:
+ *  1) either if all the buffered slots are filled up
+ *  2) or if total tuple size of the currently buffered slots are >= max_size
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+	HeapMultiInsertState *mistate;
+	Size sz;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots);
+
+	/* Reset flush state if previously set. */
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+	}
+
+	Assert(state->mi_cur_slots < mistate->max_slots);
+
+	if (state->mi_slots[state->mi_cur_slots] == NULL)
+		state->mi_slots[state->mi_cur_slots] =
+									table_slot_create(state->rel, NULL);
+
+	batchslot = state->mi_slots[state->mi_cur_slots];
+
+	ExecClearTuple(batchslot);
+	ExecCopySlot(batchslot, slot);
+
+	/*
+	 * Calculate tuple size after original slot is copied, because the copied
+	 * slot type and tuple size may change.
+	 */
+	sz = GetTupleSize(batchslot, mistate->max_size);
+
+	Assert(sz > 0);
+
+	state->mi_cur_slots++;
+	mistate->cur_size += sz;
+
+	if (state->mi_cur_slots >= mistate->max_slots ||
+		mistate->cur_size >= mistate->max_size)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * heap_multi_insert_flush - flush buffered tuples, if any, into a heap
+ *
+ * Flush the buffered tuples, indicate caller that flushing happened and reset
+ * parameters.
+ */
+void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots && state->mi_cur_slots >= 0 &&
+		   mistate->context);
+
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+		return;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, state->mi_slots, state->mi_cur_slots,
+					  state->cid, state->options, state->bistate);
+	MemoryContextReset(mistate->context);
+	MemoryContextSwitchTo(oldcontext);
+
+	state->flushed = true;
+	mistate->cur_size = 0;
+}
+
+/*
+ * heap_insert_end - clean up TableInsertState
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function.
+ *
+ * In this function, buffered slots are dropped, short-lived memory context is
+ * deleted, mistate and TableInsertState are freed up.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	int i;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	if (!mistate)
+	{
+		pfree(state);
+		return;
+	}
+
+	Assert(state->mi_slots &&  mistate->context);
+
+	/* Ensure that the buffers have been flushed before. */
+	Assert(state->mi_cur_slots == 0 || state->flushed);
+
+	for (i = 0; i < mistate->max_slots && state->mi_slots[i] != NULL; i++)
+		ExecDropSingleTupleTableSlot(state->mi_slots[i]);
+
+	MemoryContextDelete(mistate->context);
+	pfree(mistate);
+	pfree(state->mi_slots);
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bd5faf0c1f..655de8e6b7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 325ecdc122..95f1f9b6a0 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 73c35df9c9..79ae22455a 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also used
+	 * in GetTupleSize(), hence ensure to have the same changes or fixes here
+	 * and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code being used here is from
+		 * tts_virtual_materialize(), ensure to have the same changes or fixes
+		 * here and also there.
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc0936bc2d..da74ab072d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,11 +36,26 @@
 #define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
 #define HEAP_INSERT_SPECULATIVE 0x0010
 
-typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
+/*
+ * No more than this many tuples per single multi insert batch
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. Increasing this can cause quadratic growth in
+ * memory requirements during copies into partitioned tables with a large
+ * number of partitions.
+ */
+#define MAX_BUFFERED_TUPLES		1000
+
+/*
+ * Flush multi insert buffers if there are >= this many bytes, as counted by
+ * the size of the tuples buffered.
+ */
+#define MAX_BUFFERED_BYTES		65535
+
 /*
  * Descriptor for heap table scans.
  */
@@ -93,6 +108,25 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/* Holds multi insert state for heap access method.*/
+typedef struct HeapMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold.
+	 */
+	int64               max_size;
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered.
+	 * Flush the buffered slots when cur_size >= max_size.
+	 */
+	int64               cur_size;
+} HeapMultiInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -134,15 +168,20 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 
 extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
-extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
-extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool is_multi);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 414b6b4d57..2a1470a7b6 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	Relation	rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData	*bistate;
+	CommandId	cid;
+	int	options;
+	/* Below members are only used for multi inserts. */
+	/* Array of buffered slots. */
+	TupleTableSlot	**mi_slots;
+	/* Number of slots that are currently buffered. */
+	int32	mi_cur_slots;
+	/*
+	 * Access method specific information such as parameters that are needed
+	 * for buffering and flushing decisions can go here.
+	 */
+	void	*mistate;
+	/*
+	 * This parameter indicates whether or not the buffered slots have been
+	 * flushed to a table. Used by callers of multi insert API for inserting
+	 * into indexes or executing after row triggers, if any.
+	 */
+	bool	flushed;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -504,6 +530,17 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool is_multi);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -851,6 +888,8 @@ typedef struct TableAmRoutine
 } TableAmRoutine;
 
 
+typedef struct BulkInsertStateData *BulkInsertState;
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -869,6 +908,10 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
+/* Bulk insert state functions. */
+extern BulkInsertState GetBulkInsertState(void);
+extern void FreeBulkInsertState(BulkInsertState);
+extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi)
+{
+	TableInsertState *state = rel->rd_tableam->tuple_insert_begin(rel, cid,
+										options, is_multi);
+
+	/* Allocate bulk insert state here, since it's AM independent. */
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	return state;
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* Deallocate bulk insert state here, since it's AM independent. */
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 679e57fbdd..1f59614183 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



  [application/x-patch] v4-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch (6.0K, 3-v4-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch)
  download | inline diff:
From d9de92281d7b5c44a6a15994a9a11052149c9981 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:54:59 +0530
Subject: [PATCH v4 2/3] CTAS and REFRESH Mat View With New Multi Insert Table AM

This patch adds new multi insert table access methods to
CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED
VIEW.
---
 src/backend/commands/createas.c | 49 +++++++++++++++++----------------
 src/backend/commands/matview.c  | 35 ++++++++++++-----------
 2 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..36ad0ef698 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -546,22 +544,26 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	if (is_matview && !into->skipData)
 		SetMatViewPopulatedState(intoRelationDesc, true);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->rel = intoRelationDesc;
-	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
-
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * bulk inserts and multi inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		myState->istate = table_insert_begin(intoRelationDesc,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 true,
+											 true);
+	}
 	else
-		myState->bistate = NULL;
+		myState->istate = NULL;
+
+	/*
+	 * Fill private fields of myState for use by later routines
+	 */
+	myState->rel = intoRelationDesc;
+	myState->reladdr = intoRelationAddr;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -589,11 +591,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->istate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -608,12 +606,17 @@ static void
 intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	IntoClause *into = myState->into;
+	int ti_options;
 
-	if (!into->skipData)
+	if (!myState->into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		ti_options = myState->istate->options;
+
+		table_multi_insert_flush(myState->istate);
+
+		table_insert_end(myState->istate);
+
+		table_finish_bulk_insert(myState->rel, ti_options);
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..9c6b5f8525 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -466,10 +463,11 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	/*
 	 * Fill private fields of myState for use by later routines
 	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->istate = table_insert_begin(transientrel,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN,
+										 true,
+										 true);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -494,12 +492,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->istate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -513,14 +506,20 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	int ti_options;
+	Relation transientrel;
+
+	ti_options = myState->istate->options;
+	transientrel = myState->istate->rel;
+
+	table_multi_insert_flush(myState->istate);
 
-	FreeBulkInsertState(myState->bistate);
+	table_insert_end(myState->istate);
 
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_finish_bulk_insert(transientrel, ti_options);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.25.1



  [application/x-patch] v4-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch (22.2K, 4-v4-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch)
  download | inline diff:
From f1f77ff21a36ad039688a53a16cad48633ecd921 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:59:32 +0530
Subject: [PATCH v4 3/3] COPY With New Multi and Single Insert Table AM

This patch adds new single and multi insert table access method to
COPY code.
---
 src/backend/commands/copyfrom.c | 474 ++++++++++----------------------
 1 file changed, 150 insertions(+), 324 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2ed696d429..b2f57f2b1f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -46,54 +46,17 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * No more than this many tuples per CopyMultiInsertBuffer
- *
- * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
- * multiInsertBuffers list.  Increasing this can cause quadratic growth in
- * memory requirements during copies into partitioned tables with a large
- * number of partitions.
- */
-#define MAX_BUFFERED_TUPLES		1000
-
-/*
- * Flush buffers if there are >= this many bytes, as counted by the input
- * size, of tuples stored.
- */
-#define MAX_BUFFERED_BYTES		65535
-
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
-												 * stream */
+	TableInsertState *istate;
+	/* Line # of tuple in copy stream. */
+	uint64		linenos[MAX_BUFFERED_TUPLES];
 } CopyMultiInsertBuffer;
 
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState	cstate;			/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */
-	CommandId	mycid;			/* Command Id used for COPY */
-	int			ti_options;		/* table insert options */
-} CopyMultiInsertInfo;
-
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
@@ -210,143 +173,61 @@ limit_printout_length(const char *str)
  * Allocate memory and initialize a new CopyMultiInsertBuffer for this
  * ResultRelInfo.
  */
-static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
-{
-	CopyMultiInsertBuffer *buffer;
-
-	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
-	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
-	buffer->nused = 0;
-
-	return buffer;
-}
-
-/*
- * Make a new buffer for this ResultRelInfo.
- */
-static inline void
-CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+static void
+InitCopyMultiInsertBufferInfo(List **mirri, ResultRelInfo *rri,
+							  CommandId mycid, int ti_options)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = (CopyMultiInsertBuffer *) palloc0(sizeof(CopyMultiInsertBuffer));
+	buffer->istate = table_insert_begin(rri->ri_RelationDesc, mycid,
+										ti_options, true, true);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
-	/* Record that we're tracking this buffer */
-	miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
+	*mirri = lappend(*mirri, rri);
 }
 
 /*
- * Initialize an already allocated CopyMultiInsertInfo.
- *
- * If rri is a non-partitioned table then a CopyMultiInsertBuffer is set up
- * for that table.
+ * Run AFTER ROW INSERT triggers or insert into indexes, if any, after buffered
+ * tuples are flushed to table.
  */
 static void
-CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						CopyFromState cstate, EState *estate, CommandId mycid,
-						int ti_options)
+HandleAfterRowEvents(ResultRelInfo *rri, EState *estate,
+					 CopyFromState cstate, int32 cur_slots)
 {
-	miinfo->multiInsertBuffers = NIL;
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
-	miinfo->cstate = cstate;
-	miinfo->estate = estate;
-	miinfo->mycid = mycid;
-	miinfo->ti_options = ti_options;
-
-	/*
-	 * Only setup the buffer when not dealing with a partitioned table.
-	 * Buffers for partitioned tables will just be setup when we need to send
-	 * tuples their way for the first time.
-	 */
-	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
-}
-
-/*
- * Returns true if the buffers are full
- */
-static inline bool
-CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
-{
-	if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
-		miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
-		return true;
-	return false;
-}
-
-/*
- * Returns true if we have no buffered tuples
- */
-static inline bool
-CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
-{
-	return miinfo->bufferedTuples == 0;
-}
-
-/*
- * Write the tuples stored in 'buffer' out to the table.
- */
-static inline void
-CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
-{
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
-	CopyFromState	cstate = miinfo->cstate;
-	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
-	int			nused = buffer->nused;
-	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
-	TupleTableSlot **slots = buffer->slots;
+	int i;
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	uint64	save_cur_lineno = cstate->cur_lineno;
+	bool	line_buf_valid = cstate->line_buf_valid;
 
-	/*
-	 * Print error context information correctly, if one of the operations
-	 * below fail.
-	 */
 	cstate->line_buf_valid = false;
-	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	for (i = 0; i < cur_slots; i++)
 	{
 		/*
 		 * If there are any indexes, update them for all the inserted tuples,
 		 * and run AFTER ROW INSERT triggers.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		if (rri->ri_NumIndices > 0)
 		{
-			List	   *recheckIndexes;
+			List       *recheckIndexes;
 
 			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
-								 cstate->transition_capture);
+			recheckIndexes = ExecInsertIndexTuples(rri,
+													istate->mi_slots[i], estate,
+													false,
+													false,
+													NULL,
+													NULL);
+
+			ExecARInsertTriggers(estate,
+								rri,
+								istate->mi_slots[i],
+								recheckIndexes,
+								cstate->transition_capture);
+
 			list_free(recheckIndexes);
 		}
 
@@ -354,79 +235,69 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 * There's no indexes, but see if we need to run AFTER ROW INSERT
 		 * triggers anyway.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		else if (rri->ri_TrigDesc != NULL &&
+				 (rri->ri_TrigDesc->trig_insert_after_row ||
+				  rri->ri_TrigDesc->trig_insert_new_table))
 		{
 			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mi_slots[i],
+								 NULL,
+								 cstate->transition_capture);
 		}
-
-		ExecClearTuple(slots[i]);
 	}
 
-	/* Mark that all slots are free */
-	buffer->nused = 0;
-
 	/* reset cur_lineno and line_buf_valid to what they were */
 	cstate->line_buf_valid = line_buf_valid;
 	cstate->cur_lineno = save_cur_lineno;
-}
+ }
 
 /*
- * Drop used slots and free member for this buffer.
- *
- * The buffer must be flushed before cleanup.
+ * Store tuple from the incoming slot into buffered slots.
  */
-static inline void
-CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
-							 CopyMultiInsertBuffer *buffer)
+static void
+CopyMultiInsertBufferTuple(ResultRelInfo *rri, TupleTableSlot *slot,
+						   CopyFromState cstate, EState *estate)
 {
-	int			i;
-
-	/* Ensure buffer was flushed */
-	Assert(buffer->nused == 0);
-
-	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-
-	FreeBulkInsertState(buffer->bistate);
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
 
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	table_multi_insert_v2(buffer->istate, slot);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	buffer->linenos[istate->mi_cur_slots - 1] = cstate->cur_lineno;
 
-	pfree(buffer);
+	if (istate->flushed)
+		HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 }
 
 /*
- * Write out all stored tuples in all buffers out to the tables.
- *
- * Once flushed we also trim the tracked buffers list down to size by removing
- * the buffers created earliest first.
- *
- * Callers should pass 'curr_rri' is the ResultRelInfo that's currently being
- * used.  When cleaning up old buffers we'll never remove the one for
- * 'curr_rri'.
+ * Flush tuples into table from the buffered slots.
  */
-static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
-{
+static void
+CopyMulitInsertFlushBuffers(List **mirri, ResultRelInfo *curr_rri,
+							CopyFromState cstate, EState *estate)
+ {
 	ListCell   *lc;
+	ResultRelInfo *rri;
+	CopyMultiInsertBuffer *buffer;
+	TableInsertState *istate;
 
-	foreach(lc, miinfo->multiInsertBuffers)
+	foreach(lc, *mirri)
 	{
-		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
+		rri = lfirst(lc);
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+
+		table_multi_insert_flush(istate);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		if (istate->flushed)
+			HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 	}
 
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
+	rri = NULL;
+	buffer = NULL;
+	istate = NULL;
 
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
@@ -434,87 +305,59 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	 * likely that these older ones will be needed than the ones that were
 	 * just created.
 	 */
-	while (list_length(miinfo->multiInsertBuffers) > MAX_PARTITION_BUFFERS)
+	while (list_length(*mirri) > MAX_PARTITION_BUFFERS)
 	{
-		CopyMultiInsertBuffer *buffer;
+		int ti_options;
 
-		buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+		rri = (ResultRelInfo *) linitial(*mirri);
 
 		/*
 		 * We never want to remove the buffer that's currently being used, so
 		 * if we happen to find that then move it to the end of the list.
 		 */
-		if (buffer->resultRelInfo == curr_rri)
+		if (rri == curr_rri)
 		{
-			miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-			miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-			buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+			*mirri = list_delete_first(*mirri);
+			*mirri = lappend(*mirri, rri);
+			rri = (ResultRelInfo *) linitial(*mirri);
 		}
 
-		CopyMultiInsertBufferCleanup(miinfo, buffer);
-		miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-	}
-}
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+		ti_options = istate->options;
 
-/*
- * Cleanup allocated buffers and free memory
- */
-static inline void
-CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
-{
-	ListCell   *lc;
-
-	foreach(lc, miinfo->multiInsertBuffers)
-		CopyMultiInsertBufferCleanup(miinfo, lfirst(lc));
+		table_insert_end(istate);
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	list_free(miinfo->multiInsertBuffers);
+		*mirri = list_delete_first(*mirri);
+	}
 }
 
 /*
- * Get the next TupleTableSlot that the next tuple should be stored in.
- *
- * Callers must ensure that the buffer is not full.
- *
- * Note: 'miinfo' is unused but has been included for consistency with the
- * other functions in this area.
+ * Drop the buffered slots.
  */
-static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+static void
+CopyMulitInsertDropBuffers(List *mirri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
-
-	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	ListCell   *lc;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
-}
+	foreach(lc, mirri)
+	{
+		int ti_options;
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
 
-/*
- * Record the previously reserved TupleTableSlot that was reserved by
- * CopyMultiInsertInfoNextFreeSlot as being consumed.
- */
-static inline void
-CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						 TupleTableSlot *slot, int tuplen, uint64 lineno)
-{
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		ti_options = istate->options;
 
-	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+		table_insert_end(istate);
 
-	/* Store the line number so we can properly report any errors later */
-	buffer->linenos[buffer->nused] = lineno;
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	/* Record this slot as being used */
-	buffer->nused++;
+		pfree(buffer);
+	}
 
-	/* Update how many tuples are stored and their size */
-	miinfo->bufferedTuples++;
-	miinfo->bufferedBytes += tuplen;
+	list_free(mirri);
 }
 
 /*
@@ -529,21 +372,21 @@ CopyFrom(CopyFromState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *singleslot = NULL;
+	TupleTableSlot *slot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
-	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
-	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	List       *multi_insert_rris = NULL;
+	TableInsertState *istate = NULL;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -727,7 +570,7 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables we can't support multi-inserts when there
 		 * are any statement level insert triggers. It might be possible to
 		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * now, CopyMulitInsertFlushBuffers expects that any before row insert
 		 * and statement level insert triggers are on the same relation.
 		 */
 		insertMethod = CIM_SINGLE;
@@ -775,22 +618,22 @@ CopyFrom(CopyFromState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		/*
+		* Only setup the buffer when not dealing with a partitioned table.
+		* Buffers for partitioned tables will just be setup when we need to
+		* send tuples their way for the first time.
+		*/
+		if (!proute)
+			InitCopyMultiInsertBufferInfo(&multi_insert_rris, resultRelInfo,
+										  mycid, ti_options);
 	}
 
 	/*
-	 * If not using batch mode (which allocates slots as needed) set up a
-	 * tuple slot too. When inserting into a partitioned table, we also need
-	 * one, even if we might batch insert, to read the tuple in the root
-	 * partition's form.
+	 * Set up a tuple slot to which the input data from copy stream is read
+	 * into and used for inserts into table.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
-	{
-		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
-									   &estate->es_tupleTable);
-		bistate = GetBulkInsertState();
-	}
+	slot = table_slot_create(resultRelInfo->ri_RelationDesc,
+							 &estate->es_tupleTable);
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 								  resultRelInfo->ri_TrigDesc->trig_insert_before_row);
@@ -828,19 +671,8 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
-		{
-			myslot = singleslot;
-			Assert(myslot != NULL);
-		}
-		else
-		{
-			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
-
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
-		}
+		myslot = slot;
+		Assert(myslot != NULL);
 
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
@@ -916,21 +748,22 @@ CopyFrom(CopyFromState cstate)
 				if (leafpart_use_multi_insert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
-						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+						InitCopyMultiInsertBufferInfo(&multi_insert_rris,
+													  resultRelInfo, mycid,
+													  ti_options);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (insertMethod == CIM_MULTI_CONDITIONAL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMulitInsertFlushBuffers(&multi_insert_rris,
+												resultRelInfo, cstate, estate);
 				}
 
-				if (bistate != NULL)
-					ReleaseBulkInsertStatePin(bistate);
+				if (istate && istate->bistate)
+					ReleaseBulkInsertStatePin(istate->bistate);
 				prevResultRelInfo = resultRelInfo;
 			}
 
@@ -972,8 +805,8 @@ CopyFrom(CopyFromState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+											  &estate->es_tupleTable);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -1045,24 +878,9 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
-					CopyMultiInsertInfoStore(&multiInsertInfo,
-											 resultRelInfo, myslot,
-											 cstate->line_buf.len,
-											 cstate->cur_lineno);
-
-					/*
-					 * If enough inserts have queued up, then flush all
-					 * buffers out to their tables.
-					 */
-					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertBufferTuple(resultRelInfo, myslot, cstate,
+											   estate);
 				}
 				else
 				{
@@ -1088,9 +906,19 @@ CopyFrom(CopyFromState cstate)
 					}
 					else
 					{
+						if (!istate)
+						{
+							istate = table_insert_begin(resultRelInfo->ri_RelationDesc,
+														mycid,
+														ti_options,
+														true,
+														false);
+						}
+
+						istate->rel = resultRelInfo->ri_RelationDesc;
+
 						/* OK, store the tuple and create index entries for it */
-						table_tuple_insert(resultRelInfo->ri_RelationDesc,
-										   myslot, mycid, ti_options, bistate);
+						table_tuple_insert_v2(istate, myslot);
 
 						if (resultRelInfo->ri_NumIndices > 0)
 							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
@@ -1123,16 +951,14 @@ CopyFrom(CopyFromState cstate)
 
 	/* Flush any remaining buffered tuples */
 	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+		CopyMulitInsertFlushBuffers(&multi_insert_rris, resultRelInfo,
+									cstate, estate);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (bistate != NULL)
-		FreeBulkInsertState(bistate);
+	if (istate)
+		table_insert_end(istate);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1152,7 +978,7 @@ CopyFrom(CopyFromState cstate)
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+		CopyMulitInsertDropBuffers(multi_insert_rris);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-04-05 04:19  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2021-04-05 04:19 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
<[email protected]> wrote:
> Attaching the v4 patch set. Please review it further.

Attaching v5 patch set after rebasing onto the latest master. Please
review it further.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/x-patch] v5-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (19.4K, 2-v5-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From 6518212583e24b017375512701d9fefa6de20e42 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:53:48 +0530
Subject: [PATCH v5 1/3] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c         | 212 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 ++++++++-
 src/include/access/heapam.h              |  49 +++++-
 src/include/access/tableam.h             |  87 ++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 438 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3b435c107d..d8bfe17f22 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2669,6 +2670,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mi_slots = NULL;
+	state->mistate = NULL;
+	state->mi_cur_slots = 0;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc(sizeof(HeapMultiInsertState));
+		state->mi_slots =
+				palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert",
+												 ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from slot into table. This is like heap_insert(), the only
+ * difference is that the parameters for insertion are inside table insert
+ * state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	Assert(state);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute size of tuple. See if the buffered slots can hold the tuple. If yes,
+ * store it in the buffers, otherwise flush i.e. insert the so far buffered
+ * tuples into heap.
+ *
+ * Flush can happen:
+ *  1) either if all the buffered slots are filled up
+ *  2) or if total tuple size of the currently buffered slots are >= max_size
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+	HeapMultiInsertState *mistate;
+	Size sz;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots);
+
+	/* Reset flush state if previously set. */
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+	}
+
+	Assert(state->mi_cur_slots < mistate->max_slots);
+
+	if (state->mi_slots[state->mi_cur_slots] == NULL)
+		state->mi_slots[state->mi_cur_slots] =
+									table_slot_create(state->rel, NULL);
+
+	batchslot = state->mi_slots[state->mi_cur_slots];
+
+	ExecClearTuple(batchslot);
+	ExecCopySlot(batchslot, slot);
+
+	/*
+	 * Calculate tuple size after original slot is copied, because the copied
+	 * slot type and tuple size may change.
+	 */
+	sz = GetTupleSize(batchslot, mistate->max_size);
+
+	Assert(sz > 0);
+
+	state->mi_cur_slots++;
+	mistate->cur_size += sz;
+
+	if (state->mi_cur_slots >= mistate->max_slots ||
+		mistate->cur_size >= mistate->max_size)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * heap_multi_insert_flush - flush buffered tuples, if any, into a heap
+ *
+ * Flush the buffered tuples, indicate caller that flushing happened and reset
+ * parameters.
+ */
+void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots && state->mi_cur_slots >= 0 &&
+		   mistate->context);
+
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+		return;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, state->mi_slots, state->mi_cur_slots,
+					  state->cid, state->options, state->bistate);
+	MemoryContextReset(mistate->context);
+	MemoryContextSwitchTo(oldcontext);
+
+	state->flushed = true;
+	mistate->cur_size = 0;
+}
+
+/*
+ * heap_insert_end - clean up TableInsertState
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function.
+ *
+ * In this function, buffered slots are dropped, short-lived memory context is
+ * deleted, mistate and TableInsertState are freed up.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	int i;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	if (!mistate)
+	{
+		pfree(state);
+		return;
+	}
+
+	Assert(state->mi_slots &&  mistate->context);
+
+	/* Ensure that the buffers have been flushed before. */
+	Assert(state->mi_cur_slots == 0 || state->flushed);
+
+	for (i = 0; i < mistate->max_slots && state->mi_slots[i] != NULL; i++)
+		ExecDropSingleTupleTableSlot(state->mi_slots[i]);
+
+	MemoryContextDelete(mistate->context);
+	pfree(mistate);
+	pfree(state->mi_slots);
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bd5faf0c1f..655de8e6b7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 325ecdc122..95f1f9b6a0 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 73c35df9c9..79ae22455a 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also used
+	 * in GetTupleSize(), hence ensure to have the same changes or fixes here
+	 * and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code being used here is from
+		 * tts_virtual_materialize(), ensure to have the same changes or fixes
+		 * here and also there.
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc0936bc2d..da74ab072d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,11 +36,26 @@
 #define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
 #define HEAP_INSERT_SPECULATIVE 0x0010
 
-typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
+/*
+ * No more than this many tuples per single multi insert batch
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. Increasing this can cause quadratic growth in
+ * memory requirements during copies into partitioned tables with a large
+ * number of partitions.
+ */
+#define MAX_BUFFERED_TUPLES		1000
+
+/*
+ * Flush multi insert buffers if there are >= this many bytes, as counted by
+ * the size of the tuples buffered.
+ */
+#define MAX_BUFFERED_BYTES		65535
+
 /*
  * Descriptor for heap table scans.
  */
@@ -93,6 +108,25 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/* Holds multi insert state for heap access method.*/
+typedef struct HeapMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold.
+	 */
+	int64               max_size;
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered.
+	 * Flush the buffered slots when cur_size >= max_size.
+	 */
+	int64               cur_size;
+} HeapMultiInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -134,15 +168,20 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 
 extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
-extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
-extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool is_multi);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 414b6b4d57..2a1470a7b6 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	Relation	rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData	*bistate;
+	CommandId	cid;
+	int	options;
+	/* Below members are only used for multi inserts. */
+	/* Array of buffered slots. */
+	TupleTableSlot	**mi_slots;
+	/* Number of slots that are currently buffered. */
+	int32	mi_cur_slots;
+	/*
+	 * Access method specific information such as parameters that are needed
+	 * for buffering and flushing decisions can go here.
+	 */
+	void	*mistate;
+	/*
+	 * This parameter indicates whether or not the buffered slots have been
+	 * flushed to a table. Used by callers of multi insert API for inserting
+	 * into indexes or executing after row triggers, if any.
+	 */
+	bool	flushed;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -504,6 +530,17 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool is_multi);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -851,6 +888,8 @@ typedef struct TableAmRoutine
 } TableAmRoutine;
 
 
+typedef struct BulkInsertStateData *BulkInsertState;
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -869,6 +908,10 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
+/* Bulk insert state functions. */
+extern BulkInsertState GetBulkInsertState(void);
+extern void FreeBulkInsertState(BulkInsertState);
+extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi)
+{
+	TableInsertState *state = rel->rd_tableam->tuple_insert_begin(rel, cid,
+										options, is_multi);
+
+	/* Allocate bulk insert state here, since it's AM independent. */
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	return state;
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* Deallocate bulk insert state here, since it's AM independent. */
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 679e57fbdd..1f59614183 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



  [application/x-patch] v5-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch (6.0K, 3-v5-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch)
  download | inline diff:
From d9de92281d7b5c44a6a15994a9a11052149c9981 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:54:59 +0530
Subject: [PATCH v5 2/3] CTAS and REFRESH Mat View With New Multi Insert Table AM

This patch adds new multi insert table access methods to
CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED
VIEW.
---
 src/backend/commands/createas.c | 49 +++++++++++++++++----------------
 src/backend/commands/matview.c  | 35 ++++++++++++-----------
 2 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..36ad0ef698 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -546,22 +544,26 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	if (is_matview && !into->skipData)
 		SetMatViewPopulatedState(intoRelationDesc, true);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->rel = intoRelationDesc;
-	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
-
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * bulk inserts and multi inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		myState->istate = table_insert_begin(intoRelationDesc,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 true,
+											 true);
+	}
 	else
-		myState->bistate = NULL;
+		myState->istate = NULL;
+
+	/*
+	 * Fill private fields of myState for use by later routines
+	 */
+	myState->rel = intoRelationDesc;
+	myState->reladdr = intoRelationAddr;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -589,11 +591,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->istate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -608,12 +606,17 @@ static void
 intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	IntoClause *into = myState->into;
+	int ti_options;
 
-	if (!into->skipData)
+	if (!myState->into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		ti_options = myState->istate->options;
+
+		table_multi_insert_flush(myState->istate);
+
+		table_insert_end(myState->istate);
+
+		table_finish_bulk_insert(myState->rel, ti_options);
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..9c6b5f8525 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -466,10 +463,11 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	/*
 	 * Fill private fields of myState for use by later routines
 	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->istate = table_insert_begin(transientrel,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN,
+										 true,
+										 true);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -494,12 +492,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->istate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -513,14 +506,20 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	int ti_options;
+	Relation transientrel;
+
+	ti_options = myState->istate->options;
+	transientrel = myState->istate->rel;
+
+	table_multi_insert_flush(myState->istate);
 
-	FreeBulkInsertState(myState->bistate);
+	table_insert_end(myState->istate);
 
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_finish_bulk_insert(transientrel, ti_options);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.25.1



  [application/x-patch] v5-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch (22.2K, 4-v5-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch)
  download | inline diff:
From 848f448d7e41203a20781b96d596ef46573c17ce Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 5 Apr 2021 09:36:52 +0530
Subject: [PATCH v5] COPY With New Multi and Single Insert Table AM

This patch adds new single and multi insert table access method to
COPY code.
---
 src/backend/commands/copyfrom.c | 468 ++++++++++----------------------
 1 file changed, 146 insertions(+), 322 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index be2e3d7354..aa7fbc1fb1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -53,54 +53,17 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * No more than this many tuples per CopyMultiInsertBuffer
- *
- * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
- * multiInsertBuffers list.  Increasing this can cause quadratic growth in
- * memory requirements during copies into partitioned tables with a large
- * number of partitions.
- */
-#define MAX_BUFFERED_TUPLES		1000
-
-/*
- * Flush buffers if there are >= this many bytes, as counted by the input
- * size, of tuples stored.
- */
-#define MAX_BUFFERED_BYTES		65535
-
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
-												 * stream */
+	TableInsertState *istate;
+	/* Line # of tuple in copy stream. */
+	uint64          linenos[MAX_BUFFERED_TUPLES];
 } CopyMultiInsertBuffer;
 
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState cstate;		/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */
-	CommandId	mycid;			/* Command Id used for COPY */
-	int			ti_options;		/* table insert options */
-} CopyMultiInsertInfo;
-
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
@@ -207,108 +170,33 @@ limit_printout_length(const char *str)
 	return res;
 }
 
-/*
- * Allocate memory and initialize a new CopyMultiInsertBuffer for this
- * ResultRelInfo.
- */
-static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
-{
-	CopyMultiInsertBuffer *buffer;
-
-	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
-	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
-	buffer->nused = 0;
-
-	return buffer;
-}
-
-/*
- * Make a new buffer for this ResultRelInfo.
- */
-static inline void
-CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+static void
+InitCopyMultiInsertBufferInfo(List **mirri, ResultRelInfo *rri,
+							   CommandId mycid, int ti_options)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
-
+	buffer = (CopyMultiInsertBuffer *) palloc0(sizeof(CopyMultiInsertBuffer));
+	buffer->istate = table_insert_begin(rri->ri_RelationDesc, mycid,
+										ti_options, true, true);
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
-	/* Record that we're tracking this buffer */
-	miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
+	*mirri = lappend(*mirri, rri);
 }
 
 /*
- * Initialize an already allocated CopyMultiInsertInfo.
- *
- * If rri is a non-partitioned table then a CopyMultiInsertBuffer is set up
- * for that table.
+ * Run AFTER ROW INSERT triggers or insert into indexes, if any, after buffered
+ * tuples are flushed to table.
  */
 static void
-CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						CopyFromState cstate, EState *estate, CommandId mycid,
-						int ti_options)
-{
-	miinfo->multiInsertBuffers = NIL;
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
-	miinfo->cstate = cstate;
-	miinfo->estate = estate;
-	miinfo->mycid = mycid;
-	miinfo->ti_options = ti_options;
-
-	/*
-	 * Only setup the buffer when not dealing with a partitioned table.
-	 * Buffers for partitioned tables will just be setup when we need to send
-	 * tuples their way for the first time.
-	 */
-	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
-}
-
-/*
- * Returns true if the buffers are full
- */
-static inline bool
-CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
-{
-	if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
-		miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
-		return true;
-	return false;
-}
-
-/*
- * Returns true if we have no buffered tuples
- */
-static inline bool
-CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
+HandleAfterRowEvents(ResultRelInfo *rri, EState *estate, CopyFromState cstate,
+					 int32 cur_slots)
 {
-	return miinfo->bufferedTuples == 0;
-}
-
-/*
- * Write the tuples stored in 'buffer' out to the table.
- */
-static inline void
-CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
-{
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
-	CopyFromState cstate = miinfo->cstate;
-	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
-	int			nused = buffer->nused;
-	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
-	TupleTableSlot **slots = buffer->slots;
+	int i;
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	uint64  save_cur_lineno;
+	bool    line_buf_valid = cstate->line_buf_valid;
 
 	/*
 	 * Print error context information correctly, if one of the operations
@@ -317,36 +205,27 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	for (i = 0; i < cur_slots; i++)
 	{
 		/*
 		 * If there are any indexes, update them for all the inserted tuples,
 		 * and run AFTER ROW INSERT triggers.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		if (rri->ri_NumIndices > 0)
 		{
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
+
+			recheckIndexes = ExecInsertIndexTuples(rri,
+												   istate->mi_slots[i], estate,
+												   false,
+												   false,
+												   NULL,
+												   NULL);
+
+			ExecARInsertTriggers(estate, rri,
+								 istate->mi_slots[i], recheckIndexes,
 								 cstate->transition_capture);
 			list_free(recheckIndexes);
 		}
@@ -355,79 +234,69 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 * There's no indexes, but see if we need to run AFTER ROW INSERT
 		 * triggers anyway.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		else if (rri->ri_TrigDesc != NULL &&
+				 (rri->ri_TrigDesc->trig_insert_after_row ||
+				  rri->ri_TrigDesc->trig_insert_new_table))
 		{
 			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mi_slots[i],
+								 NULL,
+								 cstate->transition_capture);
 		}
-
-		ExecClearTuple(slots[i]);
 	}
 
-	/* Mark that all slots are free */
-	buffer->nused = 0;
-
 	/* reset cur_lineno and line_buf_valid to what they were */
 	cstate->line_buf_valid = line_buf_valid;
 	cstate->cur_lineno = save_cur_lineno;
-}
+ }
 
 /*
- * Drop used slots and free member for this buffer.
- *
- * The buffer must be flushed before cleanup.
+ * Store tuple from the incoming slot into buffered slots.
  */
-static inline void
-CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
-							 CopyMultiInsertBuffer *buffer)
+static void
+CopyMultiInsertBufferTuple(ResultRelInfo *rri, TupleTableSlot *slot,
+						   CopyFromState cstate, EState *estate)
 {
-	int			i;
-
-	/* Ensure buffer was flushed */
-	Assert(buffer->nused == 0);
-
-	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-
-	FreeBulkInsertState(buffer->bistate);
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
 
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	table_multi_insert_v2(buffer->istate, slot);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	buffer->linenos[istate->mi_cur_slots - 1] = cstate->cur_lineno;
 
-	pfree(buffer);
+	if (istate->flushed)
+		HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 }
 
 /*
- * Write out all stored tuples in all buffers out to the tables.
- *
- * Once flushed we also trim the tracked buffers list down to size by removing
- * the buffers created earliest first.
- *
- * Callers should pass 'curr_rri' is the ResultRelInfo that's currently being
- * used.  When cleaning up old buffers we'll never remove the one for
- * 'curr_rri'.
+ * Flush tuples into table from the buffered slots.
  */
-static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
-{
+static void
+CopyMulitInsertFlushBuffers(List **mirri, ResultRelInfo *curr_rri,
+							CopyFromState cstate, EState *estate)
+ {
 	ListCell   *lc;
+	ResultRelInfo *rri;
+	CopyMultiInsertBuffer *buffer;
+	TableInsertState *istate;
 
-	foreach(lc, miinfo->multiInsertBuffers)
+	foreach(lc, *mirri)
 	{
-		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
+		rri = lfirst(lc);
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+
+		table_multi_insert_flush(istate);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		if (istate->flushed)
+			HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 	}
 
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
+	rri = NULL;
+	buffer = NULL;
+	istate = NULL;
 
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
@@ -435,87 +304,59 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	 * likely that these older ones will be needed than the ones that were
 	 * just created.
 	 */
-	while (list_length(miinfo->multiInsertBuffers) > MAX_PARTITION_BUFFERS)
+	while (list_length(*mirri) > MAX_PARTITION_BUFFERS)
 	{
-		CopyMultiInsertBuffer *buffer;
+		int ti_options;
 
-		buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+		rri = (ResultRelInfo *) linitial(*mirri);
 
 		/*
 		 * We never want to remove the buffer that's currently being used, so
 		 * if we happen to find that then move it to the end of the list.
 		 */
-		if (buffer->resultRelInfo == curr_rri)
+		if (rri == curr_rri)
 		{
-			miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-			miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-			buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+			*mirri = list_delete_first(*mirri);
+			*mirri = lappend(*mirri, rri);
+			rri = (ResultRelInfo *) linitial(*mirri);
 		}
 
-		CopyMultiInsertBufferCleanup(miinfo, buffer);
-		miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-	}
-}
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+		ti_options = istate->options;
 
-/*
- * Cleanup allocated buffers and free memory
- */
-static inline void
-CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
-{
-	ListCell   *lc;
-
-	foreach(lc, miinfo->multiInsertBuffers)
-		CopyMultiInsertBufferCleanup(miinfo, lfirst(lc));
+		table_insert_end(istate);
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	list_free(miinfo->multiInsertBuffers);
+		*mirri = list_delete_first(*mirri);
+	}
 }
 
 /*
- * Get the next TupleTableSlot that the next tuple should be stored in.
- *
- * Callers must ensure that the buffer is not full.
- *
- * Note: 'miinfo' is unused but has been included for consistency with the
- * other functions in this area.
+ * Drop the buffered slots.
  */
-static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+static void
+CopyMulitInsertDropBuffers(List *mirri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
-
-	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	ListCell   *lc;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
-}
+	foreach(lc, mirri)
+	{
+		int ti_options;
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
 
-/*
- * Record the previously reserved TupleTableSlot that was reserved by
- * CopyMultiInsertInfoNextFreeSlot as being consumed.
- */
-static inline void
-CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						 TupleTableSlot *slot, int tuplen, uint64 lineno)
-{
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		ti_options = istate->options;
 
-	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+		table_insert_end(istate);
 
-	/* Store the line number so we can properly report any errors later */
-	buffer->linenos[buffer->nused] = lineno;
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	/* Record this slot as being used */
-	buffer->nused++;
+		pfree(buffer);
+	}
 
-	/* Update how many tuples are stored and their size */
-	miinfo->bufferedTuples++;
-	miinfo->bufferedBytes += tuplen;
+	list_free(mirri);
 }
 
 /*
@@ -530,21 +371,21 @@ CopyFrom(CopyFromState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *singleslot = NULL;
+	TupleTableSlot *slot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
-	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
-	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	List       *multi_insert_rris = NULL;
+	TableInsertState *istate = NULL;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -729,7 +570,7 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables we can't support multi-inserts when there
 		 * are any statement level insert triggers. It might be possible to
 		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * now, CopyMulitInsertFlushBuffers expects that any before row insert
 		 * and statement level insert triggers are on the same relation.
 		 */
 		insertMethod = CIM_SINGLE;
@@ -777,22 +618,22 @@ CopyFrom(CopyFromState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		/*
+		* Only setup the buffer when not dealing with a partitioned table.
+		* Buffers for partitioned tables will just be setup when we need to
+		* send tuples their way for the first time.
+		*/
+		if (!proute)
+			InitCopyMultiInsertBufferInfo(&multi_insert_rris, resultRelInfo,
+										  mycid, ti_options);
 	}
 
 	/*
-	 * If not using batch mode (which allocates slots as needed) set up a
-	 * tuple slot too. When inserting into a partitioned table, we also need
-	 * one, even if we might batch insert, to read the tuple in the root
-	 * partition's form.
+	 * Set up a tuple slot to which the input data from copy stream is read
+	 * into and used for inserts into table.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
-	{
-		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
-									   &estate->es_tupleTable);
-		bistate = GetBulkInsertState();
-	}
+	slot = table_slot_create(resultRelInfo->ri_RelationDesc,
+							 &estate->es_tupleTable);
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 								  resultRelInfo->ri_TrigDesc->trig_insert_before_row);
@@ -830,19 +671,8 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
-		{
-			myslot = singleslot;
-			Assert(myslot != NULL);
-		}
-		else
-		{
-			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
-
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
-		}
+		myslot = slot;
+		Assert(myslot != NULL);
 
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
@@ -918,21 +748,22 @@ CopyFrom(CopyFromState cstate)
 				if (leafpart_use_multi_insert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
-						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+						InitCopyMultiInsertBufferInfo(&multi_insert_rris,
+													  resultRelInfo, mycid,
+													  ti_options);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (insertMethod == CIM_MULTI_CONDITIONAL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMulitInsertFlushBuffers(&multi_insert_rris,
+												resultRelInfo, cstate, estate);
 				}
 
-				if (bistate != NULL)
-					ReleaseBulkInsertStatePin(bistate);
+				if (istate && istate->bistate)
+					ReleaseBulkInsertStatePin(istate->bistate);
 				prevResultRelInfo = resultRelInfo;
 			}
 
@@ -974,8 +805,8 @@ CopyFrom(CopyFromState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+											  &estate->es_tupleTable);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -1047,24 +878,9 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
-					CopyMultiInsertInfoStore(&multiInsertInfo,
-											 resultRelInfo, myslot,
-											 cstate->line_buf.len,
-											 cstate->cur_lineno);
-
-					/*
-					 * If enough inserts have queued up, then flush all
-					 * buffers out to their tables.
-					 */
-					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertBufferTuple(resultRelInfo, myslot, cstate,
+											   estate);
 				}
 				else
 				{
@@ -1090,9 +906,19 @@ CopyFrom(CopyFromState cstate)
 					}
 					else
 					{
+						if (!istate)
+						{
+							istate = table_insert_begin(resultRelInfo->ri_RelationDesc,
+														mycid,
+														ti_options,
+														true,
+														false);
+						}
+
+						istate->rel = resultRelInfo->ri_RelationDesc;
+
 						/* OK, store the tuple and create index entries for it */
-						table_tuple_insert(resultRelInfo->ri_RelationDesc,
-										   myslot, mycid, ti_options, bistate);
+						table_tuple_insert_v2(istate, myslot);
 
 						if (resultRelInfo->ri_NumIndices > 0)
 							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
@@ -1125,16 +951,14 @@ CopyFrom(CopyFromState cstate)
 
 	/* Flush any remaining buffered tuples */
 	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+		CopyMulitInsertFlushBuffers(&multi_insert_rris, resultRelInfo,
+									cstate, estate);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (bistate != NULL)
-		FreeBulkInsertState(bistate);
+	if (istate)
+		table_insert_end(istate);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1154,7 +978,7 @@ CopyFrom(CopyFromState cstate)
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+		CopyMulitInsertDropBuffers(multi_insert_rris);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2021-04-19 04:51  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 2 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2021-04-19 04:51 UTC (permalink / raw)
  To: Dilip Kumar <[email protected]>; +Cc: Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy
<[email protected]> wrote:
>
> On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
> <[email protected]> wrote:
> > Attaching the v4 patch set. Please review it further.
>
> Attaching v5 patch set after rebasing onto the latest master.

Another rebase due to conflicts in 0003. Attaching v6 for review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


Attachments:

  [application/x-patch] v6-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch (19.4K, 2-v6-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch)
  download | inline diff:
From 6518212583e24b017375512701d9fefa6de20e42 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:53:48 +0530
Subject: [PATCH v6 1/3] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c         | 212 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c    |   7 +
 src/backend/executor/execTuples.c        |  83 ++++++++-
 src/include/access/heapam.h              |  49 +++++-
 src/include/access/tableam.h             |  87 ++++++++++
 src/include/executor/tuptable.h          |   1 +
 7 files changed, 438 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3b435c107d..d8bfe17f22 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2669,6 +2670,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mi_slots = NULL;
+	state->mistate = NULL;
+	state->mi_cur_slots = 0;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc(sizeof(HeapMultiInsertState));
+		state->mi_slots =
+				palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert",
+												 ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from slot into table. This is like heap_insert(), the only
+ * difference is that the parameters for insertion are inside table insert
+ * state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	Assert(state);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute size of tuple. See if the buffered slots can hold the tuple. If yes,
+ * store it in the buffers, otherwise flush i.e. insert the so far buffered
+ * tuples into heap.
+ *
+ * Flush can happen:
+ *  1) either if all the buffered slots are filled up
+ *  2) or if total tuple size of the currently buffered slots are >= max_size
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *batchslot;
+	HeapMultiInsertState *mistate;
+	Size sz;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots);
+
+	/* Reset flush state if previously set. */
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+	}
+
+	Assert(state->mi_cur_slots < mistate->max_slots);
+
+	if (state->mi_slots[state->mi_cur_slots] == NULL)
+		state->mi_slots[state->mi_cur_slots] =
+									table_slot_create(state->rel, NULL);
+
+	batchslot = state->mi_slots[state->mi_cur_slots];
+
+	ExecClearTuple(batchslot);
+	ExecCopySlot(batchslot, slot);
+
+	/*
+	 * Calculate tuple size after original slot is copied, because the copied
+	 * slot type and tuple size may change.
+	 */
+	sz = GetTupleSize(batchslot, mistate->max_size);
+
+	Assert(sz > 0);
+
+	state->mi_cur_slots++;
+	mistate->cur_size += sz;
+
+	if (state->mi_cur_slots >= mistate->max_slots ||
+		mistate->cur_size >= mistate->max_size)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * heap_multi_insert_flush - flush buffered tuples, if any, into a heap
+ *
+ * Flush the buffered tuples, indicate caller that flushing happened and reset
+ * parameters.
+ */
+void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	MemoryContext oldcontext;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	Assert(mistate && state->mi_slots && state->mi_cur_slots >= 0 &&
+		   mistate->context);
+
+	if (state->flushed)
+	{
+		state->mi_cur_slots = 0;
+		state->flushed = false;
+		return;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, state->mi_slots, state->mi_cur_slots,
+					  state->cid, state->options, state->bistate);
+	MemoryContextReset(mistate->context);
+	MemoryContextSwitchTo(oldcontext);
+
+	state->flushed = true;
+	mistate->cur_size = 0;
+}
+
+/*
+ * heap_insert_end - clean up TableInsertState
+ *
+ * For multi inserts, ensure to flush all the remaining buffers with
+ * heap_multi_insert_flush before calling this function.
+ *
+ * In this function, buffered slots are dropped, short-lived memory context is
+ * deleted, mistate and TableInsertState are freed up.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	int i;
+
+	Assert(state);
+
+	mistate = (HeapMultiInsertState *)state->mistate;
+
+	if (!mistate)
+	{
+		pfree(state);
+		return;
+	}
+
+	Assert(state->mi_slots &&  mistate->context);
+
+	/* Ensure that the buffers have been flushed before. */
+	Assert(state->mi_cur_slots == 0 || state->flushed);
+
+	for (i = 0; i < mistate->max_slots && state->mi_slots[i] != NULL; i++)
+		ExecDropSingleTupleTableSlot(state->mi_slots[i]);
+
+	MemoryContextDelete(mistate->context);
+	pfree(mistate);
+	pfree(state->mi_slots);
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index bd5faf0c1f..655de8e6b7 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.multi_insert_v2 = heap_multi_insert_v2,
+	.multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/backend/access/table/tableamapi.c b/src/backend/access/table/tableamapi.c
index 325ecdc122..95f1f9b6a0 100644
--- a/src/backend/access/table/tableamapi.c
+++ b/src/backend/access/table/tableamapi.c
@@ -78,6 +78,13 @@ GetTableAmRoutine(Oid amhandler)
 	Assert(routine->tuple_complete_speculative != NULL);
 
 	Assert(routine->multi_insert != NULL);
+
+	Assert(routine->tuple_insert_begin != NULL);
+	Assert(routine->tuple_insert_v2 != NULL);
+	Assert(routine->multi_insert_v2 != NULL);
+	Assert(routine->multi_insert_flush != NULL);
+	Assert(routine->tuple_insert_end != NULL);
+
 	Assert(routine->tuple_delete != NULL);
 	Assert(routine->tuple_update != NULL);
 	Assert(routine->tuple_lock != NULL);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 73c35df9c9..79ae22455a 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -159,7 +159,11 @@ tts_virtual_materialize(TupleTableSlot *slot)
 	if (TTS_SHOULDFREE(slot))
 		return;
 
-	/* compute size of memory required */
+	/*
+	 * Compute size of memory required. This size calculation code is also used
+	 * in GetTupleSize(), hence ensure to have the same changes or fixes here
+	 * and also there.
+	 */
 	for (int natt = 0; natt < desc->natts; natt++)
 	{
 		Form_pg_attribute att = TupleDescAttr(desc, natt);
@@ -1239,6 +1243,83 @@ ExecDropSingleTupleTableSlot(TupleTableSlot *slot)
 	pfree(slot);
 }
 
+/*
+ * GetTupleSize - Compute the tuple size given a table slot.
+ *
+ * For heap tuple, buffer tuple and minimal tuple slot types return the actual
+ * tuple size that exists. For virtual tuple, the size is calculated as the
+ * slot does not have the tuple size. If the computed size exceeds the given
+ * maxsize for the virtual tuple, this function exits, not investing time in
+ * further unnecessary calculation.
+ *
+ * Important Notes:
+ * 1) Size calculation code for virtual slots is being used from
+ * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
+ * 	  here and also there.
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
+ * 	  virtual slots. Ensure to add related code in case any new slot type is
+ *    introduced.
+ */
+inline Size
+GetTupleSize(TupleTableSlot *slot, Size maxsize)
+{
+	Size sz = 0;
+	HeapTuple tuple = NULL;
+
+	if (TTS_IS_HEAPTUPLE(slot))
+		tuple = ((HeapTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_BUFFERTUPLE(slot))
+		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
+	else if(TTS_IS_MINIMALTUPLE(slot))
+		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
+	else if(TTS_IS_VIRTUAL(slot))
+	{
+		/*
+		 * Size calculation code being used here is from
+		 * tts_virtual_materialize(), ensure to have the same changes or fixes
+		 * here and also there.
+		 */
+		TupleDesc	desc = slot->tts_tupleDescriptor;
+
+		for (int natt = 0; natt < desc->natts; natt++)
+		{
+			Form_pg_attribute att = TupleDescAttr(desc, natt);
+			Datum		val;
+
+			if (att->attbyval)
+				sz += att->attlen;
+
+			if (slot->tts_isnull[natt])
+				continue;
+
+			val = slot->tts_values[natt];
+
+			if (att->attlen == -1 &&
+				VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz += EOH_get_flat_size(DatumGetEOHP(val));
+			}
+			else
+			{
+				sz = att_align_nominal(sz, att->attalign);
+				sz = att_addlength_datum(sz, att->attlen, val);
+			}
+
+			/*
+			 * We are not interested in proceeding further if the computed size
+			 * crosses maxsize limit that we are looking for.
+			 */
+			if (maxsize != 0 && sz >= maxsize)
+				break;
+		}
+	}
+
+	if (tuple != NULL && !TTS_IS_VIRTUAL(slot))
+		sz = tuple->t_len;
+
+	return sz;
+}
 
 /* ----------------------------------------------------------------
  *				  tuple table slot accessor functions
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index bc0936bc2d..da74ab072d 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -36,11 +36,26 @@
 #define HEAP_INSERT_NO_LOGICAL	TABLE_INSERT_NO_LOGICAL
 #define HEAP_INSERT_SPECULATIVE 0x0010
 
-typedef struct BulkInsertStateData *BulkInsertState;
 struct TupleTableSlot;
 
 #define MaxLockTupleMode	LockTupleExclusive
 
+/*
+ * No more than this many tuples per single multi insert batch
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. Increasing this can cause quadratic growth in
+ * memory requirements during copies into partitioned tables with a large
+ * number of partitions.
+ */
+#define MAX_BUFFERED_TUPLES		1000
+
+/*
+ * Flush multi insert buffers if there are >= this many bytes, as counted by
+ * the size of the tuples buffered.
+ */
+#define MAX_BUFFERED_BYTES		65535
+
 /*
  * Descriptor for heap table scans.
  */
@@ -93,6 +108,25 @@ typedef enum
 	HEAPTUPLE_DELETE_IN_PROGRESS	/* deleting xact is still in progress */
 } HTSV_Result;
 
+/* Holds multi insert state for heap access method.*/
+typedef struct HeapMultiInsertState
+{
+	/* Switch to short-lived memory context before flushing. */
+	MemoryContext       context;
+	/* Maximum number of slots that can be buffered. */
+	int32               max_slots;
+	/*
+	 * Maximum size (in bytes) of all the tuples that a single batch of
+	 * buffered slots can hold.
+	 */
+	int64               max_size;
+	/*
+	 * Total tuple size (in bytes) of the slots that are currently buffered.
+	 * Flush the buffered slots when cur_size >= max_size.
+	 */
+	int64               cur_size;
+} HeapMultiInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -134,15 +168,20 @@ extern bool heap_hot_search_buffer(ItemPointer tid, Relation relation,
 
 extern void heap_get_latest_tid(TableScanDesc scan, ItemPointer tid);
 
-extern BulkInsertState GetBulkInsertState(void);
-extern void FreeBulkInsertState(BulkInsertState);
-extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
-
 extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 						int options, BulkInsertState bistate);
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel, CommandId cid,
+										   int options, bool is_multi);
+extern void heap_insert_v2(TableInsertState *state, TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_multi_insert_flush(TableInsertState *state);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 414b6b4d57..2a1470a7b6 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	Relation	rel;
+	/* Bulk insert state if requested, otherwise NULL. */
+	struct BulkInsertStateData	*bistate;
+	CommandId	cid;
+	int	options;
+	/* Below members are only used for multi inserts. */
+	/* Array of buffered slots. */
+	TupleTableSlot	**mi_slots;
+	/* Number of slots that are currently buffered. */
+	int32	mi_cur_slots;
+	/*
+	 * Access method specific information such as parameters that are needed
+	 * for buffering and flushing decisions can go here.
+	 */
+	void	*mistate;
+	/*
+	 * This parameter indicates whether or not the buffered slots have been
+	 * flushed to a table. Used by callers of multi insert API for inserting
+	 * into indexes or executing after row triggers, if any.
+	 */
+	bool	flushed;
+}TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -504,6 +530,17 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState* (*tuple_insert_begin) (Relation rel, CommandId cid,
+											 int options, bool is_multi);
+
+	void (*tuple_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_v2) (TableInsertState *state, TupleTableSlot *slot);
+
+	void (*multi_insert_flush) (TableInsertState *state);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -851,6 +888,8 @@ typedef struct TableAmRoutine
 } TableAmRoutine;
 
 
+typedef struct BulkInsertStateData *BulkInsertState;
+
 /* ----------------------------------------------------------------------------
  * Slot functions.
  * ----------------------------------------------------------------------------
@@ -869,6 +908,10 @@ extern const TupleTableSlotOps *table_slot_callbacks(Relation rel);
  */
 extern TupleTableSlot *table_slot_create(Relation rel, List **reglist);
 
+/* Bulk insert state functions. */
+extern BulkInsertState GetBulkInsertState(void);
+extern void FreeBulkInsertState(BulkInsertState);
+extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
 
 /* ----------------------------------------------------------------------------
  * Table scan functions.
@@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState*
+table_insert_begin(Relation rel, CommandId cid, int options,
+				   bool alloc_bistate, bool is_multi)
+{
+	TableInsertState *state = rel->rd_tableam->tuple_insert_begin(rel, cid,
+										options, is_multi);
+
+	/* Allocate bulk insert state here, since it's AM independent. */
+	if (alloc_bistate)
+		state->bistate = GetBulkInsertState();
+	else
+		state->bistate = NULL;
+
+	return state;
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	state->rel->rd_tableam->multi_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState *state)
+{
+	state->rel->rd_tableam->multi_insert_flush(state);
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* Deallocate bulk insert state here, since it's AM independent. */
+	if (state->bistate)
+		FreeBulkInsertState(state->bistate);
+
+	state->rel->rd_tableam->tuple_insert_end(state);
+}
+
 /*
  * Delete a tuple.
  *
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
index 679e57fbdd..1f59614183 100644
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -330,6 +330,7 @@ extern void slot_getmissingattrs(TupleTableSlot *slot, int startAttNum,
 								 int lastAttNum);
 extern void slot_getsomeattrs_int(TupleTableSlot *slot, int attnum);
 
+extern Size GetTupleSize(TupleTableSlot *slot, Size maxsize);
 
 #ifndef FRONTEND
 
-- 
2.25.1



  [application/x-patch] v6-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch (6.0K, 3-v6-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch)
  download | inline diff:
From d9de92281d7b5c44a6a15994a9a11052149c9981 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 10 Mar 2021 09:54:59 +0530
Subject: [PATCH v6 2/3] CTAS and REFRESH Mat View With New Multi Insert Table AM

This patch adds new multi insert table access methods to
CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED
VIEW.
---
 src/backend/commands/createas.c | 49 +++++++++++++++++----------------
 src/backend/commands/matview.c  | 35 ++++++++++++-----------
 2 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..36ad0ef698 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -546,22 +544,26 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	if (is_matview && !into->skipData)
 		SetMatViewPopulatedState(intoRelationDesc, true);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->rel = intoRelationDesc;
-	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
-
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
-	 * bulk inserts as there are no tuples to insert.
+	 * bulk inserts and multi inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		myState->istate = table_insert_begin(intoRelationDesc,
+											 GetCurrentCommandId(true),
+											 TABLE_INSERT_SKIP_FSM,
+											 true,
+											 true);
+	}
 	else
-		myState->bistate = NULL;
+		myState->istate = NULL;
+
+	/*
+	 * Fill private fields of myState for use by later routines
+	 */
+	myState->rel = intoRelationDesc;
+	myState->reladdr = intoRelationAddr;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -589,11 +591,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->istate, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -608,12 +606,17 @@ static void
 intorel_shutdown(DestReceiver *self)
 {
 	DR_intorel *myState = (DR_intorel *) self;
-	IntoClause *into = myState->into;
+	int ti_options;
 
-	if (!into->skipData)
+	if (!myState->into->skipData)
 	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
+		ti_options = myState->istate->options;
+
+		table_multi_insert_flush(myState->istate);
+
+		table_insert_end(myState->istate);
+
+		table_finish_bulk_insert(myState->rel, ti_options);
 	}
 
 	/* close rel, but keep lock until commit */
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index c5c25ce11d..9c6b5f8525 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *istate;	/* insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -466,10 +463,11 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	/*
 	 * Fill private fields of myState for use by later routines
 	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	myState->istate = table_insert_begin(transientrel,
+										 GetCurrentCommandId(true),
+										 TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN,
+										 true,
+										 true);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -494,12 +492,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->istate, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -513,14 +506,20 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	int ti_options;
+	Relation transientrel;
+
+	ti_options = myState->istate->options;
+	transientrel = myState->istate->rel;
+
+	table_multi_insert_flush(myState->istate);
 
-	FreeBulkInsertState(myState->bistate);
+	table_insert_end(myState->istate);
 
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_finish_bulk_insert(transientrel, ti_options);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.25.1



  [application/x-patch] v6-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch (22.2K, 4-v6-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch)
  download | inline diff:
From 26740527f650f6edb70e580d46a4b86124da74e5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 19 Apr 2021 10:02:31 +0530
Subject: [PATCH v6 3/3] COPY With New Multi and Single Insert Table AM

This patch adds new single and multi insert table access method to
COPY code.
---
 src/backend/commands/copyfrom.c | 464 ++++++++++----------------------
 1 file changed, 144 insertions(+), 320 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 40a54ad0bd..0117413943 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -53,54 +53,17 @@
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 
-/*
- * No more than this many tuples per CopyMultiInsertBuffer
- *
- * Caution: Don't make this too big, as we could end up with this many
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
- * multiInsertBuffers list.  Increasing this can cause quadratic growth in
- * memory requirements during copies into partitioned tables with a large
- * number of partitions.
- */
-#define MAX_BUFFERED_TUPLES		1000
-
-/*
- * Flush buffers if there are >= this many bytes, as counted by the input
- * size, of tuples stored.
- */
-#define MAX_BUFFERED_BYTES		65535
-
 /* Trim the list of buffers back down to this number after flushing */
 #define MAX_PARTITION_BUFFERS	32
 
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
-	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel */
-	int			nused;			/* number of 'slots' containing tuples */
-	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
-												 * stream */
+	TableInsertState *istate;
+	/* Line # of tuple in copy stream. */
+	uint64          linenos[MAX_BUFFERED_TUPLES];
 } CopyMultiInsertBuffer;
 
-/*
- * Stores one or many CopyMultiInsertBuffers and details about the size and
- * number of tuples which are stored in them.  This allows multiple buffers to
- * exist at once when COPYing into a partitioned table.
- */
-typedef struct CopyMultiInsertInfo
-{
-	List	   *multiInsertBuffers; /* List of tracked CopyMultiInsertBuffers */
-	int			bufferedTuples; /* number of tuples buffered over all buffers */
-	int			bufferedBytes;	/* number of bytes from all buffered tuples */
-	CopyFromState cstate;		/* Copy state for this CopyMultiInsertInfo */
-	EState	   *estate;			/* Executor state used for COPY */
-	CommandId	mycid;			/* Command Id used for COPY */
-	int			ti_options;		/* table insert options */
-} CopyMultiInsertInfo;
-
-
 /* non-export function prototypes */
 static char *limit_printout_length(const char *str);
 
@@ -207,108 +170,33 @@ limit_printout_length(const char *str)
 	return res;
 }
 
-/*
- * Allocate memory and initialize a new CopyMultiInsertBuffer for this
- * ResultRelInfo.
- */
-static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
-{
-	CopyMultiInsertBuffer *buffer;
-
-	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
-	buffer->resultRelInfo = rri;
-	buffer->bistate = GetBulkInsertState();
-	buffer->nused = 0;
-
-	return buffer;
-}
-
-/*
- * Make a new buffer for this ResultRelInfo.
- */
-static inline void
-CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
-							   ResultRelInfo *rri)
+static void
+InitCopyMultiInsertBufferInfo(List **mirri, ResultRelInfo *rri,
+							   CommandId mycid, int ti_options)
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
-
+	buffer = (CopyMultiInsertBuffer *) palloc0(sizeof(CopyMultiInsertBuffer));
+	buffer->istate = table_insert_begin(rri->ri_RelationDesc, mycid,
+										ti_options, true, true);
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
-	/* Record that we're tracking this buffer */
-	miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
+	*mirri = lappend(*mirri, rri);
 }
 
 /*
- * Initialize an already allocated CopyMultiInsertInfo.
- *
- * If rri is a non-partitioned table then a CopyMultiInsertBuffer is set up
- * for that table.
+ * Run AFTER ROW INSERT triggers or insert into indexes, if any, after buffered
+ * tuples are flushed to table.
  */
 static void
-CopyMultiInsertInfoInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						CopyFromState cstate, EState *estate, CommandId mycid,
-						int ti_options)
+HandleAfterRowEvents(ResultRelInfo *rri, EState *estate, CopyFromState cstate,
+					 int32 cur_slots)
 {
-	miinfo->multiInsertBuffers = NIL;
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
-	miinfo->cstate = cstate;
-	miinfo->estate = estate;
-	miinfo->mycid = mycid;
-	miinfo->ti_options = ti_options;
-
-	/*
-	 * Only setup the buffer when not dealing with a partitioned table.
-	 * Buffers for partitioned tables will just be setup when we need to send
-	 * tuples their way for the first time.
-	 */
-	if (rri->ri_RelationDesc->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-		CopyMultiInsertInfoSetupBuffer(miinfo, rri);
-}
-
-/*
- * Returns true if the buffers are full
- */
-static inline bool
-CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
-{
-	if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
-		miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
-		return true;
-	return false;
-}
-
-/*
- * Returns true if we have no buffered tuples
- */
-static inline bool
-CopyMultiInsertInfoIsEmpty(CopyMultiInsertInfo *miinfo)
-{
-	return miinfo->bufferedTuples == 0;
-}
-
-/*
- * Write the tuples stored in 'buffer' out to the table.
- */
-static inline void
-CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
-						   CopyMultiInsertBuffer *buffer)
-{
-	MemoryContext oldcontext;
-	int			i;
-	uint64		save_cur_lineno;
-	CopyFromState cstate = miinfo->cstate;
-	EState	   *estate = miinfo->estate;
-	CommandId	mycid = miinfo->mycid;
-	int			ti_options = miinfo->ti_options;
-	bool		line_buf_valid = cstate->line_buf_valid;
-	int			nused = buffer->nused;
-	ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
-	TupleTableSlot **slots = buffer->slots;
+	int i;
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
+	uint64  save_cur_lineno;
+	bool    line_buf_valid = cstate->line_buf_valid;
 
 	/*
 	 * Print error context information correctly, if one of the operations
@@ -317,36 +205,27 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	cstate->line_buf_valid = false;
 	save_cur_lineno = cstate->cur_lineno;
 
-	/*
-	 * table_multi_insert may leak memory, so switch to short-lived memory
-	 * context before calling it.
-	 */
-	oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-	table_multi_insert(resultRelInfo->ri_RelationDesc,
-					   slots,
-					   nused,
-					   mycid,
-					   ti_options,
-					   buffer->bistate);
-	MemoryContextSwitchTo(oldcontext);
-
-	for (i = 0; i < nused; i++)
+	for (i = 0; i < cur_slots; i++)
 	{
 		/*
 		 * If there are any indexes, update them for all the inserted tuples,
 		 * and run AFTER ROW INSERT triggers.
 		 */
-		if (resultRelInfo->ri_NumIndices > 0)
+		if (rri->ri_NumIndices > 0)
 		{
 			List	   *recheckIndexes;
 
 			cstate->cur_lineno = buffer->linenos[i];
-			recheckIndexes =
-				ExecInsertIndexTuples(resultRelInfo,
-									  buffer->slots[i], estate, false, false,
-									  NULL, NIL);
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], recheckIndexes,
+
+			recheckIndexes = ExecInsertIndexTuples(rri,
+												   istate->mi_slots[i], estate,
+												   false,
+												   false,
+												   NULL,
+												   NULL);
+
+			ExecARInsertTriggers(estate, rri,
+								 istate->mi_slots[i], recheckIndexes,
 								 cstate->transition_capture);
 			list_free(recheckIndexes);
 		}
@@ -355,79 +234,69 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 * There's no indexes, but see if we need to run AFTER ROW INSERT
 		 * triggers anyway.
 		 */
-		else if (resultRelInfo->ri_TrigDesc != NULL &&
-				 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
-				  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+		else if (rri->ri_TrigDesc != NULL &&
+				 (rri->ri_TrigDesc->trig_insert_after_row ||
+				 rri->ri_TrigDesc->trig_insert_new_table))
 		{
 			cstate->cur_lineno = buffer->linenos[i];
-			ExecARInsertTriggers(estate, resultRelInfo,
-								 slots[i], NIL, cstate->transition_capture);
+			ExecARInsertTriggers(estate,
+								 rri,
+								 istate->mi_slots[i],
+								 NULL,
+								 cstate->transition_capture);
 		}
-
-		ExecClearTuple(slots[i]);
 	}
 
-	/* Mark that all slots are free */
-	buffer->nused = 0;
-
 	/* reset cur_lineno and line_buf_valid to what they were */
 	cstate->line_buf_valid = line_buf_valid;
 	cstate->cur_lineno = save_cur_lineno;
 }
 
 /*
- * Drop used slots and free member for this buffer.
- *
- * The buffer must be flushed before cleanup.
+ * Store tuple from the incoming slot into buffered slots.
  */
-static inline void
-CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
-							 CopyMultiInsertBuffer *buffer)
+static void
+CopyMultiInsertBufferTuple(ResultRelInfo *rri, TupleTableSlot *slot,
+						   CopyFromState cstate, EState *estate)
 {
-	int			i;
-
-	/* Ensure buffer was flushed */
-	Assert(buffer->nused == 0);
-
-	/* Remove back-link to ourself */
-	buffer->resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
-
-	FreeBulkInsertState(buffer->bistate);
+	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+	TableInsertState *istate = buffer->istate;
 
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	table_multi_insert_v2(buffer->istate, slot);
 
-	table_finish_bulk_insert(buffer->resultRelInfo->ri_RelationDesc,
-							 miinfo->ti_options);
+	buffer->linenos[istate->mi_cur_slots - 1] = cstate->cur_lineno;
 
-	pfree(buffer);
+	if (istate->flushed)
+		HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 }
 
 /*
- * Write out all stored tuples in all buffers out to the tables.
- *
- * Once flushed we also trim the tracked buffers list down to size by removing
- * the buffers created earliest first.
- *
- * Callers should pass 'curr_rri' as the ResultRelInfo that's currently being
- * used.  When cleaning up old buffers we'll never remove the one for
- * 'curr_rri'.
+ * Flush tuples into table from the buffered slots.
  */
-static inline void
-CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
+static void
+CopyMulitInsertFlushBuffers(List **mirri, ResultRelInfo *curr_rri,
+							CopyFromState cstate, EState *estate)
 {
 	ListCell   *lc;
+	ResultRelInfo *rri;
+	CopyMultiInsertBuffer *buffer;
+	TableInsertState *istate;
 
-	foreach(lc, miinfo->multiInsertBuffers)
+	foreach(lc, *mirri)
 	{
-		CopyMultiInsertBuffer *buffer = (CopyMultiInsertBuffer *) lfirst(lc);
+		rri = lfirst(lc);
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+
+		table_multi_insert_flush(istate);
 
-		CopyMultiInsertBufferFlush(miinfo, buffer);
+		if (istate->flushed)
+			HandleAfterRowEvents(rri, estate, cstate, istate->mi_cur_slots);
 	}
 
-	miinfo->bufferedTuples = 0;
-	miinfo->bufferedBytes = 0;
+	rri = NULL;
+	buffer = NULL;
+	istate = NULL;
 
 	/*
 	 * Trim the list of tracked buffers down if it exceeds the limit.  Here we
@@ -435,87 +304,59 @@ CopyMultiInsertInfoFlush(CopyMultiInsertInfo *miinfo, ResultRelInfo *curr_rri)
 	 * likely that these older ones will be needed than the ones that were
 	 * just created.
 	 */
-	while (list_length(miinfo->multiInsertBuffers) > MAX_PARTITION_BUFFERS)
+	while (list_length(*mirri) > MAX_PARTITION_BUFFERS)
 	{
-		CopyMultiInsertBuffer *buffer;
+		int ti_options;
 
-		buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+		rri = (ResultRelInfo *) linitial(*mirri);
 
 		/*
 		 * We never want to remove the buffer that's currently being used, so
 		 * if we happen to find that then move it to the end of the list.
 		 */
-		if (buffer->resultRelInfo == curr_rri)
+		if (rri == curr_rri)
 		{
-			miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-			miinfo->multiInsertBuffers = lappend(miinfo->multiInsertBuffers, buffer);
-			buffer = (CopyMultiInsertBuffer *) linitial(miinfo->multiInsertBuffers);
+			*mirri = list_delete_first(*mirri);
+			*mirri = lappend(*mirri, rri);
+			rri = (ResultRelInfo *) linitial(*mirri);
 		}
 
-		CopyMultiInsertBufferCleanup(miinfo, buffer);
-		miinfo->multiInsertBuffers = list_delete_first(miinfo->multiInsertBuffers);
-	}
-}
-
-/*
- * Cleanup allocated buffers and free memory
- */
-static inline void
-CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
-{
-	ListCell   *lc;
+		buffer = rri->ri_CopyMultiInsertBuffer;
+		istate = buffer->istate;
+		ti_options = istate->options;
 
-	foreach(lc, miinfo->multiInsertBuffers)
-		CopyMultiInsertBufferCleanup(miinfo, lfirst(lc));
+		table_insert_end(istate);
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	list_free(miinfo->multiInsertBuffers);
+		*mirri = list_delete_first(*mirri);
+	}
 }
 
 /*
- * Get the next TupleTableSlot that the next tuple should be stored in.
- *
- * Callers must ensure that the buffer is not full.
- *
- * Note: 'miinfo' is unused but has been included for consistency with the
- * other functions in this area.
+ * Drop the buffered slots.
  */
-static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+static void
+CopyMulitInsertDropBuffers(List *mirri)
 {
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
-	int			nused = buffer->nused;
-
-	Assert(buffer != NULL);
-	Assert(nused < MAX_BUFFERED_TUPLES);
+	ListCell   *lc;
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
-}
+	foreach(lc, mirri)
+	{
+		int ti_options;
+		ResultRelInfo *rri = lfirst(lc);
+		CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		TableInsertState *istate = buffer->istate;
 
-/*
- * Record the previously reserved TupleTableSlot that was reserved by
- * CopyMultiInsertInfoNextFreeSlot as being consumed.
- */
-static inline void
-CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
-						 TupleTableSlot *slot, int tuplen, uint64 lineno)
-{
-	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
+		ti_options = istate->options;
 
-	Assert(buffer != NULL);
-	Assert(slot == buffer->slots[buffer->nused]);
+		table_insert_end(istate);
 
-	/* Store the line number so we can properly report any errors later */
-	buffer->linenos[buffer->nused] = lineno;
+		table_finish_bulk_insert(rri->ri_RelationDesc, ti_options);
 
-	/* Record this slot as being used */
-	buffer->nused++;
+		pfree(buffer);
+	}
 
-	/* Update how many tuples are stored and their size */
-	miinfo->bufferedTuples++;
-	miinfo->bufferedBytes += tuplen;
+	list_free(mirri);
 }
 
 /*
@@ -530,21 +371,21 @@ CopyFrom(CopyFromState cstate)
 	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
 	ModifyTableState *mtstate;
 	ExprContext *econtext;
-	TupleTableSlot *singleslot = NULL;
+	TupleTableSlot *slot = NULL;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
 	PartitionTupleRouting *proute = NULL;
 	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			ti_options = 0; /* start with default options for insert */
-	BulkInsertState bistate = NULL;
 	CopyInsertMethod insertMethod;
-	CopyMultiInsertInfo multiInsertInfo = {0};	/* pacify compiler */
 	int64		processed = 0;
 	int64		excluded = 0;
 	bool		has_before_insert_row_trig;
 	bool		has_instead_insert_row_trig;
 	bool		leafpart_use_multi_insert = false;
+	List       *multi_insert_rris = NULL;
+	TableInsertState *istate = NULL;
 
 	Assert(cstate->rel);
 	Assert(list_length(cstate->range_table) == 1);
@@ -729,7 +570,7 @@ CopyFrom(CopyFromState cstate)
 		 * For partitioned tables we can't support multi-inserts when there
 		 * are any statement level insert triggers. It might be possible to
 		 * allow partitioned tables with such triggers in the future, but for
-		 * now, CopyMultiInsertInfoFlush expects that any before row insert
+		 * now, CopyMulitInsertFlushBuffers expects that any before row insert
 		 * and statement level insert triggers are on the same relation.
 		 */
 		insertMethod = CIM_SINGLE;
@@ -777,22 +618,22 @@ CopyFrom(CopyFromState cstate)
 		else
 			insertMethod = CIM_MULTI;
 
-		CopyMultiInsertInfoInit(&multiInsertInfo, resultRelInfo, cstate,
-								estate, mycid, ti_options);
+		/*
+		* Only setup the buffer when not dealing with a partitioned table.
+		* Buffers for partitioned tables will just be setup when we need to
+		* send tuples their way for the first time.
+		*/
+		if (!proute)
+			InitCopyMultiInsertBufferInfo(&multi_insert_rris, resultRelInfo,
+										  mycid, ti_options);
 	}
 
 	/*
-	 * If not using batch mode (which allocates slots as needed) set up a
-	 * tuple slot too. When inserting into a partitioned table, we also need
-	 * one, even if we might batch insert, to read the tuple in the root
-	 * partition's form.
+	 * Set up a tuple slot to which the input data from copy stream is read
+	 * into and used for inserts into table.
 	 */
-	if (insertMethod == CIM_SINGLE || insertMethod == CIM_MULTI_CONDITIONAL)
-	{
-		singleslot = table_slot_create(resultRelInfo->ri_RelationDesc,
-									   &estate->es_tupleTable);
-		bistate = GetBulkInsertState();
-	}
+	slot = table_slot_create(resultRelInfo->ri_RelationDesc,
+							 &estate->es_tupleTable);
 
 	has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
 								  resultRelInfo->ri_TrigDesc->trig_insert_before_row);
@@ -830,19 +671,8 @@ CopyFrom(CopyFromState cstate)
 		ResetPerTupleExprContext(estate);
 
 		/* select slot to (initially) load row into */
-		if (insertMethod == CIM_SINGLE || proute)
-		{
-			myslot = singleslot;
-			Assert(myslot != NULL);
-		}
-		else
-		{
-			Assert(resultRelInfo == target_resultRelInfo);
-			Assert(insertMethod == CIM_MULTI);
-
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
-		}
+		myslot = slot;
+		Assert(myslot != NULL);
 
 		/*
 		 * Switch to per-tuple context before calling NextCopyFrom, which does
@@ -918,21 +748,22 @@ CopyFrom(CopyFromState cstate)
 				if (leafpart_use_multi_insert)
 				{
 					if (resultRelInfo->ri_CopyMultiInsertBuffer == NULL)
-						CopyMultiInsertInfoSetupBuffer(&multiInsertInfo,
-													   resultRelInfo);
+						InitCopyMultiInsertBufferInfo(&multi_insert_rris,
+													  resultRelInfo, mycid,
+													  ti_options);
 				}
-				else if (insertMethod == CIM_MULTI_CONDITIONAL &&
-						 !CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
+				else if (insertMethod == CIM_MULTI_CONDITIONAL)
 				{
 					/*
 					 * Flush pending inserts if this partition can't use
 					 * batching, so rows are visible to triggers etc.
 					 */
-					CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMulitInsertFlushBuffers(&multi_insert_rris,
+												resultRelInfo, cstate, estate);
 				}
 
-				if (bistate != NULL)
-					ReleaseBulkInsertStatePin(bistate);
+				if (istate && istate->bistate)
+					ReleaseBulkInsertStatePin(istate->bistate);
 				prevResultRelInfo = resultRelInfo;
 			}
 
@@ -974,8 +805,8 @@ CopyFrom(CopyFromState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = table_slot_create(resultRelInfo->ri_RelationDesc,
+											  &estate->es_tupleTable);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -1047,24 +878,9 @@ CopyFrom(CopyFromState cstate)
 				/* Store the slot in the multi-insert buffer, when enabled. */
 				if (insertMethod == CIM_MULTI || leafpart_use_multi_insert)
 				{
-					/*
-					 * The slot previously might point into the per-tuple
-					 * context. For batching it needs to be longer lived.
-					 */
-					ExecMaterializeSlot(myslot);
-
 					/* Add this tuple to the tuple buffer */
-					CopyMultiInsertInfoStore(&multiInsertInfo,
-											 resultRelInfo, myslot,
-											 cstate->line_buf.len,
-											 cstate->cur_lineno);
-
-					/*
-					 * If enough inserts have queued up, then flush all
-					 * buffers out to their tables.
-					 */
-					if (CopyMultiInsertInfoIsFull(&multiInsertInfo))
-						CopyMultiInsertInfoFlush(&multiInsertInfo, resultRelInfo);
+					CopyMultiInsertBufferTuple(resultRelInfo, myslot, cstate,
+											   estate);
 				}
 				else
 				{
@@ -1090,9 +906,19 @@ CopyFrom(CopyFromState cstate)
 					}
 					else
 					{
+						if (!istate)
+						{
+							istate = table_insert_begin(resultRelInfo->ri_RelationDesc,
+														mycid,
+														ti_options,
+														true,
+														false);
+						}
+
+						istate->rel = resultRelInfo->ri_RelationDesc;
+
 						/* OK, store the tuple and create index entries for it */
-						table_tuple_insert(resultRelInfo->ri_RelationDesc,
-										   myslot, mycid, ti_options, bistate);
+						table_tuple_insert_v2(istate, myslot);
 
 						if (resultRelInfo->ri_NumIndices > 0)
 							recheckIndexes = ExecInsertIndexTuples(resultRelInfo,
@@ -1125,16 +951,14 @@ CopyFrom(CopyFromState cstate)
 
 	/* Flush any remaining buffered tuples */
 	if (insertMethod != CIM_SINGLE)
-	{
-		if (!CopyMultiInsertInfoIsEmpty(&multiInsertInfo))
-			CopyMultiInsertInfoFlush(&multiInsertInfo, NULL);
-	}
+		CopyMulitInsertFlushBuffers(&multi_insert_rris, resultRelInfo,
+									cstate, estate);
 
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
 
-	if (bistate != NULL)
-		FreeBulkInsertState(bistate);
+	if (istate)
+		table_insert_end(istate);
 
 	MemoryContextSwitchTo(oldcontext);
 
@@ -1154,7 +978,7 @@ CopyFrom(CopyFromState cstate)
 
 	/* Tear down the multi-insert buffer data */
 	if (insertMethod != CIM_SINGLE)
-		CopyMultiInsertInfoCleanup(&multiInsertInfo);
+		CopyMulitInsertDropBuffers(multi_insert_rris);
 
 	/* Close all the partitioned tables, leaf partitions, and their indices */
 	if (proute)
-- 
2.25.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2022-10-12 05:30  Michael Paquier <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Michael Paquier @ 2022-10-12 05:30 UTC (permalink / raw)
  To: Matthias van de Meent <[email protected]>; +Cc: Bharath Rupireddy <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Mon, Mar 07, 2022 at 05:09:23PM +0100, Matthias van de Meent wrote:
> That's for the AM-internal flushing; yes. I was thinking about the AM
> api for flushing that's used when finalizing the batched insert; i.e.
> table_multi_insert_flush.
> 
> Currently it assumes that all buffered tuples will be flushed after
> one call (which is correct for heap), but putting those unflushed
> tuples all at once back in memory might not be desirable or possible
> (for e.g. columnar); so we might need to call table_multi_insert_flush
> until there's no more buffered tuples.

This thread has been idle for 6 months now, so I have marked it as
returned with feedback as of what looks like a lack of activity.  I
have looked at what's been proposed, and I am not really sure if the
direction taken is correct, though there may be a potential gain in
consolidating the multi-insert path within the table AM set of
callbacks.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2022-10-12 05:35  Bharath Rupireddy <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2022-10-12 05:35 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Andres Freund <[email protected]>; Paul Guo <[email protected]>; Jeff Davis <[email protected]>

On Wed, Oct 12, 2022 at 11:01 AM Michael Paquier <[email protected]> wrote:
>
> This thread has been idle for 6 months now, so I have marked it as
> returned with feedback as of what looks like a lack of activity.  I
> have looked at what's been proposed, and I am not really sure if the
> direction taken is correct, though there may be a potential gain in
> consolidating the multi-insert path within the table AM set of
> callbacks.

Thanks. Unfortunately, I'm not finding enough cycles to work on this
feature. I'm happy to help if others have any further thoughts and
take it from here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2023-06-03 22:38  Andres Freund <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  1 sibling, 2 replies; 43+ messages in thread

From: Andres Freund @ 2023-06-03 22:38 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Paul Guo <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

Hi,

This patch was referenced in a discussion at pgcon, so I thought I'd give it a
look, even though Bharat said that he won't have time to drive it forward...


On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
> index bd5faf0c1f..655de8e6b7 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
>  	.tuple_insert_speculative = heapam_tuple_insert_speculative,
>  	.tuple_complete_speculative = heapam_tuple_complete_speculative,
>  	.multi_insert = heap_multi_insert,
> +	.tuple_insert_begin = heap_insert_begin,
> +	.tuple_insert_v2 = heap_insert_v2,
> +	.multi_insert_v2 = heap_multi_insert_v2,
> +	.multi_insert_flush = heap_multi_insert_flush,
> +	.tuple_insert_end = heap_insert_end,
>  	.tuple_delete = heapam_tuple_delete,
>  	.tuple_update = heapam_tuple_update,
>  	.tuple_lock = heapam_tuple_lock,

I don't think we should have multiple callback for the insertion APIs in
tableam.h. I think it'd be good to continue supporting the old table_*()
functions, but supporting multiple insert APIs in each AM doesn't make much
sense to me.


> +/*
> + * GetTupleSize - Compute the tuple size given a table slot.
> + *
> + * For heap tuple, buffer tuple and minimal tuple slot types return the actual
> + * tuple size that exists. For virtual tuple, the size is calculated as the
> + * slot does not have the tuple size. If the computed size exceeds the given
> + * maxsize for the virtual tuple, this function exits, not investing time in
> + * further unnecessary calculation.
> + *
> + * Important Notes:
> + * 1) Size calculation code for virtual slots is being used from
> + * 	  tts_virtual_materialize(), hence ensure to have the same changes or fixes
> + * 	  here and also there.
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
> + * 	  virtual slots. Ensure to add related code in case any new slot type is
> + *    introduced.
> + */
> +inline Size
> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
> +{
> +	Size sz = 0;
> +	HeapTuple tuple = NULL;
> +
> +	if (TTS_IS_HEAPTUPLE(slot))
> +		tuple = ((HeapTupleTableSlot *) slot)->tuple;
> +	else if(TTS_IS_BUFFERTUPLE(slot))
> +		tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
> +	else if(TTS_IS_MINIMALTUPLE(slot))
> +		tuple = ((MinimalTupleTableSlot *) slot)->tuple;
> +	else if(TTS_IS_VIRTUAL(slot))

I think this embeds too much knowledge of the set of slot types in core
code. I don't see why it's needed either?


> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 414b6b4d57..2a1470a7b6 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
>  	TM_IndexStatus *status;
>  } TM_IndexDeleteOp;
>
> +/* Holds table insert state. */
> +typedef struct TableInsertState

I suspect we should design it to be usable for updates and deletes in the
future, and thus name it TableModifyState.



> +{
> +	Relation	rel;
> +	/* Bulk insert state if requested, otherwise NULL. */
> +	struct BulkInsertStateData	*bistate;
> +	CommandId	cid;

Hm - I'm not sure it's a good idea to force the cid to be the same for all
inserts done via one TableInsertState.



> +	int	options;
> +	/* Below members are only used for multi inserts. */
> +	/* Array of buffered slots. */
> +	TupleTableSlot	**mi_slots;
> +	/* Number of slots that are currently buffered. */
> +	int32	mi_cur_slots;

> +	/*
> +	 * Access method specific information such as parameters that are needed
> +	 * for buffering and flushing decisions can go here.
> +	 */
> +	void	*mistate;

I think we should instead have a generic TableModifyState, which each AM then
embeds into an AM specific AM state. Forcing two very related structs to be
allocated separately doesn't seem wise in this case.



> @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
>  								  cid, options, bistate);
>  }
>
> +static inline TableInsertState*
> +table_insert_begin(Relation rel, CommandId cid, int options,
> +				   bool alloc_bistate, bool is_multi)

Why have alloc_bistate and options?


> +static inline void
> +table_insert_end(TableInsertState *state)
> +{
> +	/* Deallocate bulk insert state here, since it's AM independent. */
> +	if (state->bistate)
> +		FreeBulkInsertState(state->bistate);
> +
> +	state->rel->rd_tableam->tuple_insert_end(state);
> +}

Seems like the order in here should be swapped?


Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2023-06-05 02:30  Bharath Rupireddy <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 0 replies; 43+ messages in thread

From: Bharath Rupireddy @ 2023-06-05 02:30 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Paul Guo <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Sun, Jun 4, 2023 at 4:08 AM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> This patch was referenced in a discussion at pgcon, so I thought I'd give it a
> look, even though Bharat said that he won't have time to drive it forward...

Thanks. I'm glad to know that the feature was discussed at PGCon.

If there's an interest, I'm happy to spend time again on it.

I'll look into the review comments and respond soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2023-08-01 16:30  Bharath Rupireddy <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2023-08-01 16:30 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Paul Guo <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Sun, Jun 4, 2023 at 4:08 AM Andres Freund <[email protected]> wrote:
>
> Hi,
>
> This patch was referenced in a discussion at pgcon, so I thought I'd give it a
> look, even though Bharat said that he won't have time to drive it forward...

Thanks. Finally, I  started to spend time on this. Just curious - may
I know the discussion in/for which this patch is referenced? What was
the motive? Is it captured somewhere?

> On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> > +     .tuple_insert_begin = heap_insert_begin,
> > +     .tuple_insert_v2 = heap_insert_v2,
> > +     .multi_insert_v2 = heap_multi_insert_v2,
> > +     .multi_insert_flush = heap_multi_insert_flush,
> > +     .tuple_insert_end = heap_insert_end,
>
> I don't think we should have multiple callback for the insertion APIs in
> tableam.h. I think it'd be good to continue supporting the old table_*()
> functions, but supporting multiple insert APIs in each AM doesn't make much
> sense to me.

I named these new functions XXX_v2 for compatibility reasons. Because,
it's quite possible for external modules to use existing
table_tuple_insert, table_multi_insert functions. If we were to change
the existing insert tableams, all the external modules using them
would have to change their code, is that okay?

> > +/*
> > + * GetTupleSize - Compute the tuple size given a table slot.
> > +inline Size
>
> I think this embeds too much knowledge of the set of slot types in core
> code. I don't see why it's needed either?

The heapam multi-insert implementation needs to know the tuple size
from the slot to decide whether or not to flush the tuples from the
buffers. I couldn't find a direct way then to know the tuple size from
the slot, so added that helper function. With a better understanding
now, I think we can rely on the memory allocated for TupleTableSlot's
tts_mcxt. While this works for the materialized slots passed in to the
insert functions, for non-materialized slots the flushing decision can
be solely on the number of tuples stored in the buffers. Another way
is to add a get_tuple_size callback to TupleTableSlotOps and let the
tuple slot providers give us the tuple size.

> > diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> > index 414b6b4d57..2a1470a7b6 100644
> > --- a/src/include/access/tableam.h
> > +++ b/src/include/access/tableam.h
> > @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
> >       TM_IndexStatus *status;
> >  } TM_IndexDeleteOp;
> >
> > +/* Holds table insert state. */
> > +typedef struct TableInsertState
>
> I suspect we should design it to be usable for updates and deletes in the
> future, and thus name it TableModifyState.

There are different parameters that insert/update/delete would want to
pass across in the state. So, having Table{Insert/Update/Delete}State
may be a better idea than having the unneeded variables lying around
or having a union and state_type as INSERT/UPDATE/DELETE, no? Do you
have a different thought here?

> I think we should instead have a generic TableModifyState, which each AM then
> embeds into an AM specific AM state. Forcing two very related structs to be
> allocated separately doesn't seem wise in this case.

The v7 patches have largely changed the way these options and
parameters are passed, please have a look.

> > +{
> > +     Relation        rel;
> > +     /* Bulk insert state if requested, otherwise NULL. */
> > +     struct BulkInsertStateData      *bistate;
> > +     CommandId       cid;
>
> Hm - I'm not sure it's a good idea to force the cid to be the same for all
> inserts done via one TableInsertState.

If required, someone can always pass a new CID before every
tuple_insert_v2/tuple_multi_insert_v2 call via TableInsertState. Isn't
it sufficient?

> > @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> >                                                                 cid, options, bistate);
> >  }
> >
> > +static inline TableInsertState*
> > +table_insert_begin(Relation rel, CommandId cid, int options,
> > +                                bool alloc_bistate, bool is_multi)
>
> Why have alloc_bistate and options?

"alloc_bistate" is for the caller to specify if they need a bulk
insert state or not. "options" is for the caller to specify if they
need table_tuple_insert performance options such as
TABLE_INSERT_SKIP_FSM, TABLE_INSERT_FROZEN, TABLE_INSERT_NO_LOGICAL.
The v7 patches have changed the way these options and parameters are
passed, please have a look.

> > +static inline void
> > +table_insert_end(TableInsertState *state)
> > +{
> > +     /* Deallocate bulk insert state here, since it's AM independent. */
> > +     if (state->bistate)
> > +             FreeBulkInsertState(state->bistate);
> > +
> > +     state->rel->rd_tableam->tuple_insert_end(state);
> > +}
>
> Seems like the order in here should be swapped?

Right. It looks like BulkInsertState is for heapam, it really doesn't
have to be in table_XXX functions, hence it all the way down to
heap_insert_XXX functions.

I'm attaching the v7 patch set with the above review comments
addressed. My initial idea behind these new insert APIs was the
ability to re-use the multi insert code in COPY for CTAS and REFRESH
MATERIALIZED VIEW. I'm open to more thoughts here.

The v7 patches have largely changed the way state structure (heapam
specific things are moved all the way down to heapam.c) is defined,
the parameters are passed, and simplified the multi insert logic a
lot.

0001 - introduces new single and multi insert table AM and heapam
implementation of the new AM.
0002 - optimizes CREATE TABLE AS to use the new multi inserts table AM
making it faster by 2.13X or 53%.
0003 - optimizes REFRESH MATERIALIZED VIEW to use the new multi
inserts table AM making it faster by 1.52X or 34%.
0004 - uses the new multi inserts table AM for COPY FROM - I'm yet to
spend time on this, I'll share the patch when ready.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v7-0001-New-table-AMs-for-single-and-multi-inserts.patch (13.7K, 2-v7-0001-New-table-AMs-for-single-and-multi-inserts.patch)
  download | inline diff:
From 6803736e5695ab0ef06d263e9ba260db02d3b80c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 1 Aug 2023 09:38:47 +0000
Subject: [PATCH v7] New table AMs for single and multi inserts

---
 src/backend/access/heap/heapam.c         | 180 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   6 +
 src/include/access/heapam.h              |  45 ++++++
 src/include/access/tableam.h             | 107 ++++++++++++++
 4 files changed, 338 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7ed72abe59..ba4347026a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -68,6 +68,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -75,6 +76,7 @@
 
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 									 TransactionId xid, CommandId cid, int options);
+static void heap_multi_insert_flush(TableInsertState *state);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
 								  Buffer newbuf, HeapTuple oldtup,
 								  HeapTuple newtup, HeapTuple old_key_tuple,
@@ -2443,6 +2445,184 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int table_am_flags,
+				  int table_insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = (TableInsertState *) palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->table_am_flags = table_am_flags;
+	tistate->table_insert_flags = table_insert_flags;
+
+	if ((table_am_flags & TABLEAM_USE_MULTI_INSERTS) != 0 ||
+		(table_am_flags & TABLEAM_USE_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+	{
+		tistate->table_am_data =
+			(HeapInsertState *) palloc0(sizeof(HeapInsertState));
+	}
+
+	if ((table_am_flags & TABLEAM_USE_MULTI_INSERTS) != 0)
+	{
+		((HeapInsertState *) tistate->table_am_data)->mistate =
+			(HeapMultiInsertState *) palloc0(sizeof(HeapMultiInsertState));
+
+		((HeapInsertState *) tistate->table_am_data)->mistate->slots =
+				palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		((HeapInsertState *) tistate->table_am_data)->mistate->context =
+				AllocSetContextCreate(CurrentMemoryContext,
+									  "heap_multi_insert_v2 memory context",
+									  ALLOCSET_DEFAULT_SIZES);
+	}
+
+	if ((table_am_flags & TABLEAM_USE_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->table_am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->table_am_data != NULL &&
+		((HeapInsertState *) state->table_am_data)->bistate != NULL)
+	{
+		bistate = ((HeapInsertState *) state->table_am_data)->bistate;
+	}
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->table_insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	TupleTableSlot  *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->table_am_data != NULL &&
+		   ((HeapInsertState *) state->table_am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->table_am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	ExecClearTuple(dstslot);
+	ExecCopySlot(dstslot, slot);
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState *state)
+{
+	if (state->table_am_data != NULL &&
+		((HeapInsertState *) state->table_am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->table_am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->table_am_data)->mistate = NULL;
+	}
+
+	if (state->table_am_data != NULL &&
+		((HeapInsertState *) state->table_am_data)->bistate != NULL)
+	{
+		FreeBulkInsertState(((HeapInsertState *) state->table_am_data)->bistate);
+	}
+
+	pfree(state->table_am_data);
+	state->table_am_data = NULL;
+	pfree(state);
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+static void
+heap_multi_insert_flush(TableInsertState *state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->table_am_data)->mistate;
+
+	if (state->table_am_data != NULL &&
+		((HeapInsertState *) state->table_am_data)->bistate != NULL)
+	{
+		bistate = ((HeapInsertState *) state->table_am_data)->bistate;
+	}
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->table_insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 5a17112c91..6f144d88dd 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2568,6 +2568,12 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index faf5026519..a1ea26cbd6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -191,6 +191,40 @@ typedef struct HeapPageFreeze
 
 } HeapPageFreeze;
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext	context;
+
+	/* Array of buffered slots */
+	TupleTableSlot	**slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int		cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size	cur_size;
+} HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData	*bistate;
+	HeapMultiInsertState	*mistate;
+} HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -241,6 +275,17 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState* heap_insert_begin(Relation rel,
+										   CommandId cid,
+										   int table_am_flags,
+										   int table_insert_flags);
+extern void heap_insert_v2(TableInsertState *state,
+						   TupleTableSlot *slot);
+extern void heap_multi_insert_v2(TableInsertState *state,
+								 TupleTableSlot *slot);
+extern void heap_insert_end(TableInsertState *state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 230bc39cc0..5ea3eeee8a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,35 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi (buffer multiple tuples and insert them at once) inserts */
+#define TABLEAM_USE_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_USE_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+	Relation	rel;	/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass of
+	 * insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int	table_am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int		table_insert_flags;
+
+	/* Table AM specific data starts here */
+	void	*table_am_data;
+} TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +551,19 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int table_am_flags,
+											 int table_insert_flags);
+
+	void (*tuple_insert_v2) (TableInsertState *state,
+							 TupleTableSlot *slot);
+
+	void (*tuple_multi_insert_v2) (TableInsertState *state,
+								   TupleTableSlot *slot);
+
+	void (*tuple_insert_end) (TableInsertState *state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1498,71 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int table_am_flags,
+				   int table_insert_flags)
+{
+	/* XXX: Really it doesn't have to be an optional callback */
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+	{
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, table_am_flags,
+												   table_insert_flags);
+	}
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("table_insert_begin access method is not implemented for relation \"%s\"",
+						RelationGetRelationName(rel)));
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	/* XXX: Really it doesn't have to be an optional callback */
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_begin)
+	{
+		return state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	}
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+						RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	/* XXX: Really it doesn't have to be an optional callback */
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_begin)
+	{
+		return state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	}
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+						RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_insert_end(TableInsertState *state)
+{
+	/* XXX: Really it doesn't have to be an optional callback */
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_begin)
+	{
+		return state->rel->rd_tableam->tuple_insert_end(state);
+	}
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("table_insert_end access method is not implemented for relation \"%s\"",
+						RelationGetRelationName(state->rel)));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/octet-stream] v7-0002-Optimize-CTAS-with-multi-inserts.patch (2.8K, 3-v7-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From d999b3e11272f44b8cbe75fb4d5d4c518ab43324 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 1 Aug 2023 09:37:22 +0000
Subject: [PATCH v7] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index e91920ca14..ac30906288 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state;	/*  table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -557,17 +555,24 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+	{
+		int	table_am_flags = TABLEAM_USE_MULTI_INSERTS |
+			TABLEAM_USE_BULKWRITE_BUFFER_ACCESS_STRATEGY;
+		int	table_insert_flags = TABLE_INSERT_SKIP_FSM;
+
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   table_am_flags,
+											   table_insert_flags);
+	}
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -595,11 +600,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -617,10 +618,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
-- 
2.34.1



  [application/octet-stream] v7-0003-Optimize-RMV-with-multi-inserts.patch (3.1K, 4-v7-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From ceb8f43687dd0baf86fac3a096d98134e06bcec8 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Tue, 1 Aug 2023 09:50:20 +0000
Subject: [PATCH v7] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 36 +++++++++++++---------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ac2e74fa3f..c7ab2d1f44 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state;	/*  table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -454,16 +451,18 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
 	Relation	transientrel;
+	int	table_am_flags = TABLEAM_USE_MULTI_INSERTS |
+			TABLEAM_USE_BULKWRITE_BUFFER_ACCESS_STRATEGY;
+	int	table_insert_flags = TABLE_INSERT_SKIP_FSM |
+			TABLE_INSERT_FROZEN;
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   table_am_flags,
+										   table_insert_flags);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -488,12 +487,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,14 +501,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2023-08-01 17:02  Jacob Champion <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Jacob Champion @ 2023-08-01 17:02 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Paul Guo <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Tue, Aug 1, 2023 at 9:31 AM Bharath Rupireddy
<[email protected]> wrote:
> Thanks. Finally, I  started to spend time on this. Just curious - may
> I know the discussion in/for which this patch is referenced? What was
> the motive? Is it captured somewhere?

It may not have been the only place, but we at least touched on it
during the unconference:

    https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference#Table_AMs

We discussed two related-but-separate ideas:
1) bulk/batch operations and
2) maintenance of TAM state across multiple related operations.

--Jacob






^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-01-17 17:27  Bharath Rupireddy <[email protected]>
  parent: Jacob Champion <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2024-01-17 17:27 UTC (permalink / raw)
  To: Jacob Champion <[email protected]>; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; pgsql-hackers; Paul Guo <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Tue, Aug 1, 2023 at 10:32 PM Jacob Champion <[email protected]> wrote:
>
> On Tue, Aug 1, 2023 at 9:31 AM Bharath Rupireddy
> <[email protected]> wrote:
> > Thanks. Finally, I  started to spend time on this. Just curious - may
> > I know the discussion in/for which this patch is referenced? What was
> > the motive? Is it captured somewhere?
>
> It may not have been the only place, but we at least touched on it
> during the unconference:
>
>     https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference#Table_AMs
>
> We discussed two related-but-separate ideas:
> 1) bulk/batch operations and
> 2) maintenance of TAM state across multiple related operations.

Thank you. I'm attaching v8 patch-set here which includes use of new
insert TAMs for COPY FROM. With this, postgres not only will have the
new TAM for inserts, but they also can make the following commands
faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
the coming days and post the results here, until then I appreciate any
feedback on the patches.

I've also added this proposal to CF -
https://commitfest.postgresql.org/47/4777/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v8-0001-New-TAMs-for-inserts.patch (16.2K, 2-v8-0001-New-TAMs-for-inserts.patch)
  download | inline diff:
From cbdf2935be360017c0d62479e879630d4fec8766 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Jan 2024 16:44:19 +0000
Subject: [PATCH v8] New TAMs for inserts

---
 src/backend/access/heap/heapam.c         | 224 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 143 +++++++++++++++
 4 files changed, 425 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..7df305380e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -68,6 +68,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2446,6 +2447,229 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..795177812d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2564,6 +2564,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 932ec0d6f2..46dba5245c 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d..8fcaf6fe5a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1507,98 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_insert_begin access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(rel)));
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_slots access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_flush access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_insert_end access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/x-patch] v8-0002-Optimize-CTAS-with-multi-inserts.patch (2.7K, 3-v8-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From 4835495e675bb178ecb67d84e6b00de15751ce8b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Jan 2024 15:23:38 +0000
Subject: [PATCH v8] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 16a2fe65e6..3a02ea9578 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -557,17 +555,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -595,11 +595,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -617,10 +613,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
-- 
2.34.1



  [application/x-patch] v8-0003-Optimize-RMV-with-multi-inserts.patch (2.9K, 4-v8-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From d5fd779aa51c624662eefee8349f2d3f6517c3c5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Jan 2024 15:27:37 +0000
Subject: [PATCH v8] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 1dcfbe879b..f84c79f5f0 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -457,13 +454,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -488,12 +485,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,14 +499,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/x-patch] v8-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch (6.3K, 5-v8-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch)
  download | inline diff:
From 24062422b0f213f188bad844b2191923ff258807 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Wed, 17 Jan 2024 16:49:52 +0000
Subject: [PATCH v8] Use new multi insert TAM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 4058b08134..a6c703a99e 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -77,10 +77,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -223,14 +222,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -245,7 +261,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -322,8 +338,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -395,13 +409,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -409,18 +418,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -435,7 +433,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -493,20 +491,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -593,13 +586,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -615,6 +620,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-01-29 07:27  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2024-01-29 07:27 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Wed, Jan 17, 2024 at 10:57 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Thank you. I'm attaching v8 patch-set here which includes use of new
> insert TAMs for COPY FROM. With this, postgres not only will have the
> new TAM for inserts, but they also can make the following commands
> faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
> REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
> the coming days and post the results here, until then I appreciate any
> feedback on the patches.
>
> I've also added this proposal to CF -
> https://commitfest.postgresql.org/47/4777/.

Some of the tests related to Incremental Sort added by a recent commit
0452b461bc4 in aggregates.sql are failing when the multi inserts
feature is used for CTAS (like done in 0002 patch). I'm not so sure if
it's because of the reduction in the CTAS execution times. Execution
time for table 'btg' created with CREATE TABLE AS added by commit
0452b461bc4 with single inserts is 25.3 msec, with multi inserts is
17.7 msec. This means that the multi inserts are about 1.43 times or
30.04% faster  than the single inserts. Couple of ways to make these
tests pick Incremental Sort as expected - 1) CLUSTER btg USING abc; or
2) increase the number of rows in table btg to 100K from 10K. FWIW, if
I reduce the number of rows in the table from 10K to 1K, the
Incremental Sort won't get picked on HEAD with CTAS using single
inserts. Hence, I chose option (2) to fix the issue.

Please find the attached v9 patch set.

[1]
 -- Engage incremental sort
 explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
-                   QUERY PLAN
--------------------------------------------------
+          QUERY PLAN
+------------------------------
  Group
    Group Key: x, y, z, w
-   ->  Incremental Sort
+   ->  Sort
          Sort Key: x, y, z, w
-         Presorted Key: x, y
-         ->  Index Scan using btg_x_y_idx on btg
-(6 rows)
+         ->  Seq Scan on btg
+(5 rows)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v9-0001-New-TAMs-for-inserts.patch (16.2K, 2-v9-0001-New-TAMs-for-inserts.patch)
  download | inline diff:
From a84107e498ffddc56ef4fbb207d6ba6e82717901 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 05:39:55 +0000
Subject: [PATCH v9 1/4] New TAMs for inserts

---
 src/backend/access/heap/heapam.c         | 224 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 143 +++++++++++++++
 4 files changed, 425 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..7df305380e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -68,6 +68,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2446,6 +2447,229 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..795177812d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2564,6 +2564,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..053be18110 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d..8fcaf6fe5a 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1507,98 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_insert_begin access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(rel)));
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_slots access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_multi_insert_flush access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		ereport(ERROR,
+				errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				errmsg("table_insert_end access method is not implemented for relation \"%s\"",
+					   RelationGetRelationName(state->rel)));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/x-patch] v9-0003-Optimize-RMV-with-multi-inserts.patch (2.9K, 3-v9-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From 1c3eea3d0ac69f590ca641d0efaeaa0585a7a850 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 05:58:33 +0000
Subject: [PATCH v9 3/4] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 1dcfbe879b..f84c79f5f0 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -457,13 +454,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -488,12 +485,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,14 +499,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/x-patch] v9-0002-Optimize-CTAS-with-multi-inserts.patch (3.8K, 4-v9-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From 891047c4b20aab2c6d25187b45b775ee9d71fb48 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 05:57:59 +0000
Subject: [PATCH v9 2/4] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c          | 25 +++++++++---------------
 src/test/regress/expected/aggregates.out |  2 +-
 src/test/regress/sql/aggregates.sql      |  2 +-
 3 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 16a2fe65e6..3a02ea9578 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -557,17 +555,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -595,11 +595,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -617,10 +613,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7a73c19314..2889fd315d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2734,7 +2734,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,100000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 -- GROUP BY optimization by reorder columns by frequency
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 916dbf908f..99f890bb85 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,100000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 
-- 
2.34.1



  [application/x-patch] v9-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch (6.3K, 5-v9-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch)
  download | inline diff:
From 538c515617a320007d8b76fb48efd75242641428 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 05:59:12 +0000
Subject: [PATCH v9 4/4] Use new multi insert TAM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..8abf33aa97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -77,10 +77,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -223,14 +222,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -245,7 +261,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -322,8 +338,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -395,13 +409,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -409,18 +418,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -435,7 +433,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -493,20 +491,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -593,13 +586,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -615,6 +620,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-01-29 11:46  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2024-01-29 11:46 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Mon, Jan 29, 2024 at 12:57 PM Bharath Rupireddy
<[email protected]> wrote:
>
> On Wed, Jan 17, 2024 at 10:57 PM Bharath Rupireddy
> <[email protected]> wrote:
> >
> > Thank you. I'm attaching v8 patch-set here which includes use of new
> > insert TAMs for COPY FROM. With this, postgres not only will have the
> > new TAM for inserts, but they also can make the following commands
> > faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
> > REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
> > the coming days and post the results here, until then I appreciate any
> > feedback on the patches.
> >
> > I've also added this proposal to CF -
> > https://commitfest.postgresql.org/47/4777/.
>
> Some of the tests related to Incremental Sort added by a recent commit
> 0452b461bc4 in aggregates.sql are failing when the multi inserts
> feature is used for CTAS (like done in 0002 patch). I'm not so sure if
> it's because of the reduction in the CTAS execution times. Execution
> time for table 'btg' created with CREATE TABLE AS added by commit
> 0452b461bc4 with single inserts is 25.3 msec, with multi inserts is
> 17.7 msec. This means that the multi inserts are about 1.43 times or
> 30.04% faster  than the single inserts. Couple of ways to make these
> tests pick Incremental Sort as expected - 1) CLUSTER btg USING abc; or
> 2) increase the number of rows in table btg to 100K from 10K. FWIW, if
> I reduce the number of rows in the table from 10K to 1K, the
> Incremental Sort won't get picked on HEAD with CTAS using single
> inserts. Hence, I chose option (2) to fix the issue.
>
> Please find the attached v9 patch set.
>
> [1]
>  -- Engage incremental sort
>  explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> -                   QUERY PLAN
> --------------------------------------------------
> +          QUERY PLAN
> +------------------------------
>   Group
>     Group Key: x, y, z, w
> -   ->  Incremental Sort
> +   ->  Sort
>           Sort Key: x, y, z, w
> -         Presorted Key: x, y
> -         ->  Index Scan using btg_x_y_idx on btg
> -(6 rows)
> +         ->  Seq Scan on btg
> +(5 rows)

CF bot machine with Windows isn't happy with the compilation [1], so
fixed those warnings and attached v10 patch set.

[1]
[07:35:25.458] [632/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_copyfrom.c.obj
[07:35:25.458] c:\cirrus\src\include\access\tableam.h(1574) : warning
C4715: 'table_multi_insert_slots': not all control paths return a
value
[07:35:25.458] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value
[07:35:25.680] c:\cirrus\src\include\access\tableam.h(1561) : warning
C4715: 'table_multi_insert_next_free_slot': not all control paths
return a value
[07:35:25.680] [633/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_createas.c.obj
[07:35:25.680] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value
[07:35:26.310] [646/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_matview.c.obj
[07:35:26.310] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/octet-stream] v10-0001-New-TAMs-for-inserts.patch (15.9K, 2-v10-0001-New-TAMs-for-inserts.patch)
  download | inline diff:
From 3c892cf5c2df949efac1ec5dc8fc390b868fe400 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 10:59:41 +0000
Subject: [PATCH v10 1/4] New TAMs for inserts

---
 src/backend/access/heap/heapam.c         | 224 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 138 ++++++++++++++
 4 files changed, 420 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..7df305380e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -68,6 +68,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2446,6 +2447,229 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index d15a02b2be..795177812d 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2564,6 +2564,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..053be18110 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d..834de15b9b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1507,93 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+	{
+		elog(ERROR, "table_insert_begin access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+	{
+		elog(ERROR, "table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+	{
+		elog(ERROR, "table_multi_insert_slots access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		elog(ERROR, "table_multi_insert_flush access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		elog(ERROR, "table_insert_end access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/octet-stream] v10-0002-Optimize-CTAS-with-multi-inserts.patch (3.8K, 3-v10-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From ab21bc9db0b6a033db3d6d00f72c5b1abf445240 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 11:01:56 +0000
Subject: [PATCH v10 2/4] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c          | 25 +++++++++---------------
 src/test/regress/expected/aggregates.out |  2 +-
 src/test/regress/sql/aggregates.sql      |  2 +-
 3 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 16a2fe65e6..3a02ea9578 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -557,17 +555,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -595,11 +595,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -617,10 +613,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 7a73c19314..2889fd315d 100644
--- a/src/test/regress/expected/aggregates.out
+++ b/src/test/regress/expected/aggregates.out
@@ -2734,7 +2734,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,100000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 -- GROUP BY optimization by reorder columns by frequency
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 916dbf908f..99f890bb85 100644
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -1187,7 +1187,7 @@ CREATE TABLE btg AS SELECT
   i % 100 AS y,
   'abc' || i % 10 AS z,
   i AS w
-FROM generate_series(1,10000) AS i;
+FROM generate_series(1,100000) AS i;
 CREATE INDEX btg_x_y_idx ON btg(x,y);
 ANALYZE btg;
 
-- 
2.34.1



  [application/octet-stream] v10-0003-Optimize-RMV-with-multi-inserts.patch (2.9K, 4-v10-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From 621aa97a1708ba178f5ebf6aca869788a2cf1b56 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 11:02:19 +0000
Subject: [PATCH v10 3/4] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 1dcfbe879b..f84c79f5f0 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -457,13 +454,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -488,12 +485,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,14 +499,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/octet-stream] v10-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch (6.3K, 5-v10-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch)
  download | inline diff:
From 169131f28e09c41b0b100f953b54dd16b2e3185a Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Mon, 29 Jan 2024 11:02:37 +0000
Subject: [PATCH v10 4/4] Use new multi insert TAM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 1fe70b9133..8abf33aa97 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -77,10 +77,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -223,14 +222,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -245,7 +261,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -322,8 +338,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -395,13 +409,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -409,18 +418,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -435,7 +433,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -493,20 +491,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -593,13 +586,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -615,6 +620,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-03-02 06:32  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2024-03-02 06:32 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
<[email protected]> wrote:
>
> > Please find the attached v9 patch set.

I've had to rebase the patches due to commit 874d817, please find the
attached v11 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v11-0001-New-TAMs-for-inserts.patch (15.9K, 2-v11-0001-New-TAMs-for-inserts.patch)
  download | inline diff:
From c338f541e01850fa4bb423e09acce618be9e21ba Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 1 Mar 2024 14:23:07 +0000
Subject: [PATCH v11 1/4] New TAMs for inserts

---
 src/backend/access/heap/heapam.c         | 224 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 138 ++++++++++++++
 4 files changed, 420 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 707460a536..7df305380e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -68,6 +68,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2446,6 +2447,229 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b..84793f324e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2562,6 +2562,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..053be18110 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d..834de15b9b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1507,93 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+	{
+		elog(ERROR, "table_insert_begin access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+	{
+		elog(ERROR, "table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+	{
+		elog(ERROR, "table_multi_insert_slots access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		elog(ERROR, "table_multi_insert_flush access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		elog(ERROR, "table_insert_end access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/x-patch] v11-0002-Optimize-CTAS-with-multi-inserts.patch (2.7K, 3-v11-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From 3ee91fcf1ba848ee07993af51fba3b6a8a2714b5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 1 Mar 2024 14:27:08 +0000
Subject: [PATCH v11 2/4] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 16a2fe65e6..3a02ea9578 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -58,9 +58,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -557,17 +555,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -595,11 +595,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -617,10 +613,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
-- 
2.34.1



  [application/x-patch] v11-0003-Optimize-RMV-with-multi-inserts.patch (2.9K, 4-v11-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From 623cbb495fb58a07de5de2884c5cd19059abbb9b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 1 Mar 2024 14:27:40 +0000
Subject: [PATCH v11 3/4] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 59920ced83..6a8834e84b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -52,10 +52,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -457,13 +454,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -488,12 +485,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -507,14 +499,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/x-patch] v11-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch (6.3K, 5-v11-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch)
  download | inline diff:
From aa0965b2887cd07f5fe3373dcc2509659a9b2fa2 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 1 Mar 2024 14:28:58 +0000
Subject: [PATCH v11 4/4] Use new multi insert TAM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index c3bc897028..a18cf718cb 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -77,10 +77,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -223,14 +222,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -245,7 +261,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -322,8 +338,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -395,13 +409,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -409,18 +418,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -435,7 +433,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -493,20 +491,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -593,13 +586,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -615,6 +620,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-03-08 10:36  Bharath Rupireddy <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 1 reply; 43+ messages in thread

From: Bharath Rupireddy @ 2024-03-08 10:36 UTC (permalink / raw)
  To: pgsql-hackers; +Cc: Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

On Sat, Mar 2, 2024 at 12:02 PM Bharath Rupireddy
<[email protected]> wrote:
>
> On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
> <[email protected]> wrote:
> >
> > > Please find the attached v9 patch set.
>
> I've had to rebase the patches due to commit 874d817, please find the
> attached v11 patch set.

Rebase needed. Please see the v12 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Attachments:

  [application/x-patch] v12-0001-New-TAMs-for-inserts.patch (15.9K, 2-v12-0001-New-TAMs-for-inserts.patch)
  download | inline diff:
From 8a3552e65e62afc40db99fbd7bf4f98990d45390 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 8 Mar 2024 10:11:17 +0000
Subject: [PATCH v12 1/4] New TAMs for inserts

---
 src/backend/access/heap/heapam.c         | 224 +++++++++++++++++++++++
 src/backend/access/heap/heapam_handler.c |   9 +
 src/include/access/heapam.h              |  49 +++++
 src/include/access/tableam.h             | 138 ++++++++++++++
 4 files changed, 420 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 34bc60f625..497940d74a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -64,6 +64,7 @@
 #include "storage/standby.h"
 #include "utils/datum.h"
 #include "utils/inval.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2442,6 +2443,229 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * Initialize state required for an insert a single tuple or multiple tuples
+ * into a heap.
+ */
+TableInsertState *
+heap_insert_begin(Relation rel, CommandId cid, int am_flags, int insert_flags)
+{
+	TableInsertState *tistate;
+
+	tistate = palloc0(sizeof(TableInsertState));
+	tistate->rel = rel;
+	tistate->cid = cid;
+	tistate->am_flags = am_flags;
+	tistate->insert_flags = insert_flags;
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0 ||
+		(am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY))
+		tistate->am_data = palloc0(sizeof(HeapInsertState));
+
+	if ((am_flags & TABLEAM_MULTI_INSERTS) != 0)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc0(sizeof(HeapMultiInsertState));
+		mistate->slots = palloc0(sizeof(TupleTableSlot *) * HEAP_MAX_BUFFERED_SLOTS);
+
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+												 "heap_multi_insert_v2 memory context",
+												 ALLOCSET_DEFAULT_SIZES);
+
+		((HeapInsertState *) tistate->am_data)->mistate = mistate;
+	}
+
+	if ((am_flags & TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY) != 0)
+		((HeapInsertState *) tistate->am_data)->bistate = GetBulkInsertState();
+
+	return tistate;
+}
+
+/*
+ * Insert a single tuple into a heap.
+ */
+void
+heap_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+	BulkInsertState bistate = NULL;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate == NULL);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->insert_flags,
+				bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * Create/return next free slot from multi-insert buffered slots array.
+ */
+TupleTableSlot *
+heap_multi_insert_next_free_slot(TableInsertState * state)
+{
+	TupleTableSlot *slot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	slot = mistate->slots[mistate->cur_slots];
+
+	if (slot == NULL)
+	{
+		slot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = slot;
+	}
+	else
+		ExecClearTuple(slot);
+
+	return slot;
+}
+
+/*
+ * Store passed-in tuple into in-memory buffered slots. When full, insert
+ * multiple tuples from the buffers into heap.
+ */
+void
+heap_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	TupleTableSlot *dstslot;
+	HeapMultiInsertState *mistate;
+
+	Assert(state->am_data != NULL &&
+		   ((HeapInsertState *) state->am_data)->mistate != NULL);
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	dstslot = mistate->slots[mistate->cur_slots];
+
+	if (dstslot == NULL)
+	{
+		dstslot = table_slot_create(state->rel, NULL);
+		mistate->slots[mistate->cur_slots] = dstslot;
+	}
+
+	/*
+	 * Caller may have got the slot using heap_multi_insert_next_free_slot,
+	 * filled it and passed. So, skip copying in such a case.
+	 */
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0)
+	{
+		ExecClearTuple(dstslot);
+		ExecCopySlot(dstslot, slot);
+	}
+	else
+		Assert(dstslot == slot);
+
+	mistate->cur_slots++;
+
+	/*
+	 * When passed-in slot is already materialized, memory allocated in slot's
+	 * memory context is a close approximation for us to track the required
+	 * space for the tuple in slot.
+	 *
+	 * For non-materialized slots, the flushing decision happens solely on the
+	 * number of tuples stored in the buffer.
+	 */
+	if (TTS_SHOULDFREE(slot))
+		mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+	if ((state->am_flags & TABLEAM_SKIP_MULTI_INSERTS_FLUSH) == 0 &&
+		(mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+		 mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES))
+		heap_multi_insert_flush(state);
+}
+
+/*
+ * Return pointer to multi-insert buffered slots array and number of currently
+ * occupied slots.
+ */
+TupleTableSlot **
+heap_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	HeapMultiInsertState *mistate;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+	*num_slots = mistate->cur_slots;
+
+	return mistate->slots;
+}
+
+/*
+ * Insert multiple tuples from in-memory buffered slots into heap.
+ */
+void
+heap_multi_insert_flush(TableInsertState * state)
+{
+	HeapMultiInsertState *mistate;
+	BulkInsertState bistate = NULL;
+	MemoryContext oldcontext;
+
+	mistate = ((HeapInsertState *) state->am_data)->mistate;
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		bistate = ((HeapInsertState *) state->am_data)->bistate;
+
+	oldcontext = MemoryContextSwitchTo(mistate->context);
+	heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
+					  state->cid, state->insert_flags, bistate);
+	MemoryContextSwitchTo(oldcontext);
+	MemoryContextReset(mistate->context);
+
+	mistate->cur_slots = 0;
+	mistate->cur_size = 0;
+}
+
+/*
+ * Clean up state used to insert a single or multiple tuples into a heap.
+ */
+void
+heap_insert_end(TableInsertState * state)
+{
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->mistate != NULL)
+	{
+		HeapMultiInsertState *mistate =
+			((HeapInsertState *) state->am_data)->mistate;
+
+		/* Insert remaining tuples from multi-insert buffers */
+		if (mistate->cur_slots > 0 || mistate->cur_size > 0)
+			heap_multi_insert_flush(state);
+
+		MemoryContextDelete(mistate->context);
+
+		for (int i = 0; i < HEAP_MAX_BUFFERED_SLOTS && mistate->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(mistate->slots[i]);
+
+		pfree(mistate);
+		((HeapInsertState *) state->am_data)->mistate = NULL;
+	}
+
+	if (state->am_data != NULL &&
+		((HeapInsertState *) state->am_data)->bistate != NULL)
+		FreeBulkInsertState(((HeapInsertState *) state->am_data)->bistate);
+
+	pfree(state->am_data);
+	state->am_data = NULL;
+	pfree(state);
+}
+
 /*
  *	simple_heap_insert - insert a tuple
  *
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 680a50bf8b..84793f324e 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2562,6 +2562,15 @@ static const TableAmRoutine heapam_methods = {
 	.tuple_insert_speculative = heapam_tuple_insert_speculative,
 	.tuple_complete_speculative = heapam_tuple_complete_speculative,
 	.multi_insert = heap_multi_insert,
+
+	.tuple_insert_begin = heap_insert_begin,
+	.tuple_insert_v2 = heap_insert_v2,
+	.tuple_multi_insert_next_free_slot = heap_multi_insert_next_free_slot,
+	.tuple_multi_insert_v2 = heap_multi_insert_v2,
+	.tuple_multi_insert_slots = heap_multi_insert_slots,
+	.tuple_multi_insert_flush = heap_multi_insert_flush,
+	.tuple_insert_end = heap_insert_end,
+
 	.tuple_delete = heapam_tuple_delete,
 	.tuple_update = heapam_tuple_update,
 	.tuple_lock = heapam_tuple_lock,
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4b133f6859..053be18110 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -225,6 +225,40 @@ htsv_get_valid_status(int status)
 	return (HTSV_Result) status;
 }
 
+/*
+ * Maximum number of slots that multi-insert buffers can hold.
+ *
+ * Caution: Don't make this too big, as we could end up with this many tuples
+ * stored in multi insert buffer. For instance, increasing this can cause
+ * quadratic growth in memory requirements during copies into partitioned
+ * tables with a large number of partitions.
+ */
+#define HEAP_MAX_BUFFERED_SLOTS		1000
+
+/* Maximum size of all tuples that multi-insert buffers can hold */
+#define HEAP_MAX_BUFFERED_BYTES		65535
+
+typedef struct HeapMultiInsertState
+{
+	/* Memory context to use for flushing multi-insert buffers */
+	MemoryContext context;
+
+	/* Array of buffered slots */
+	TupleTableSlot **slots;
+
+	/* Number of slots that multi-insert buffers currently hold */
+	int			cur_slots;
+
+	/* Size of all tuples that multi-insert buffers currently hold */
+	Size		cur_size;
+}			HeapMultiInsertState;
+
+typedef struct HeapInsertState
+{
+	struct BulkInsertStateData *bistate;
+	HeapMultiInsertState *mistate;
+}			HeapInsertState;
+
 /* ----------------
  *		function prototypes for heap access method
  *
@@ -275,6 +309,21 @@ extern void heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 extern void heap_multi_insert(Relation relation, struct TupleTableSlot **slots,
 							  int ntuples, CommandId cid, int options,
 							  BulkInsertState bistate);
+
+extern TableInsertState * heap_insert_begin(Relation rel,
+											CommandId cid,
+											int am_flags,
+											int insert_flags);
+extern void heap_insert_v2(TableInsertState * state,
+						   TupleTableSlot *slot);
+extern TupleTableSlot *heap_multi_insert_next_free_slot(TableInsertState * state);
+extern void heap_multi_insert_v2(TableInsertState * state,
+								 TupleTableSlot *slot);
+extern TupleTableSlot **heap_multi_insert_slots(TableInsertState * state,
+												int *num_slots);
+extern void heap_multi_insert_flush(TableInsertState * state);
+extern void heap_insert_end(TableInsertState * state);
+
 extern TM_Result heap_delete(Relation relation, ItemPointer tid,
 							 CommandId cid, Snapshot crosscheck, bool wait,
 							 struct TM_FailureData *tmfd, bool changingPart);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 5f8474871d..834de15b9b 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -247,6 +247,43 @@ typedef struct TM_IndexDeleteOp
 	TM_IndexStatus *status;
 } TM_IndexDeleteOp;
 
+/* Use multi inserts, i.e. buffer multiple tuples and insert them at once */
+#define TABLEAM_MULTI_INSERTS 0x000001
+
+/* Use BAS_BULKWRITE buffer access strategy */
+#define TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY 0x000002
+
+/*
+ * Skip flushing buffered tuples automatically. Responsibility lies with the
+ * caller to flush the buffered tuples.
+ */
+#define TABLEAM_SKIP_MULTI_INSERTS_FLUSH 0x000004
+
+
+/* Holds table insert state. */
+typedef struct TableInsertState
+{
+	/* Table AM-agnostic data starts here */
+
+	Relation	rel;			/* Target relation */
+
+	/*
+	 * Command ID for this insertion. If required, change this for each pass
+	 * of insert functions.
+	 */
+	CommandId	cid;
+
+	/* Table AM options (TABLEAM_XXX macros) */
+	int			am_flags;
+
+	/* table_tuple_insert performance options (TABLE_INSERT_XXX macros) */
+	int			insert_flags;
+
+	/* Table AM specific data starts here */
+
+	void	   *am_data;
+}			TableInsertState;
+
 /* "options" flag bits for table_tuple_insert */
 /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
 #define TABLE_INSERT_SKIP_FSM		0x0002
@@ -522,6 +559,20 @@ typedef struct TableAmRoutine
 	void		(*multi_insert) (Relation rel, TupleTableSlot **slots, int nslots,
 								 CommandId cid, int options, struct BulkInsertStateData *bistate);
 
+	TableInsertState *(*tuple_insert_begin) (Relation rel,
+											 CommandId cid,
+											 int am_flags,
+											 int insert_flags);
+	void		(*tuple_insert_v2) (TableInsertState * state,
+									TupleTableSlot *slot);
+	void		(*tuple_multi_insert_v2) (TableInsertState * state,
+										  TupleTableSlot *slot);
+	TupleTableSlot *(*tuple_multi_insert_next_free_slot) (TableInsertState * state);
+	TupleTableSlot **(*tuple_multi_insert_slots) (TableInsertState * state,
+												  int *num_slots);
+	void		(*tuple_multi_insert_flush) (TableInsertState * state);
+	void		(*tuple_insert_end) (TableInsertState * state);
+
 	/* see table_tuple_delete() for reference about parameters */
 	TM_Result	(*tuple_delete) (Relation rel,
 								 ItemPointer tid,
@@ -1456,6 +1507,93 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
 								  cid, options, bistate);
 }
 
+static inline TableInsertState *
+table_insert_begin(Relation rel, CommandId cid, int am_flags,
+				   int insert_flags)
+{
+	if (rel->rd_tableam && rel->rd_tableam->tuple_insert_begin)
+		return rel->rd_tableam->tuple_insert_begin(rel, cid, am_flags,
+												   insert_flags);
+	else
+	{
+		elog(ERROR, "table_insert_begin access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_tuple_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_v2)
+		state->rel->rd_tableam->tuple_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_tuple_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState * state, TupleTableSlot *slot)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_v2)
+		state->rel->rd_tableam->tuple_multi_insert_v2(state, slot);
+	else
+		elog(ERROR, "table_multi_insert_v2 access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline TupleTableSlot *
+table_multi_insert_next_free_slot(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_next_free_slot)
+		return state->rel->rd_tableam->tuple_multi_insert_next_free_slot(state);
+	else
+	{
+		elog(ERROR, "table_multi_insert_next_free_slot access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline TupleTableSlot **
+table_multi_insert_slots(TableInsertState * state, int *num_slots)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_slots)
+		return state->rel->rd_tableam->tuple_multi_insert_slots(state, num_slots);
+	else
+	{
+		elog(ERROR, "table_multi_insert_slots access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+		return NULL;			/* keep compiler quiet */
+	}
+}
+
+static inline void
+table_multi_insert_flush(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_multi_insert_flush)
+		state->rel->rd_tableam->tuple_multi_insert_flush(state);
+	else
+		elog(ERROR, "table_multi_insert_flush access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
+static inline void
+table_insert_end(TableInsertState * state)
+{
+	if (state->rel->rd_tableam &&
+		state->rel->rd_tableam->tuple_insert_end)
+		state->rel->rd_tableam->tuple_insert_end(state);
+	else
+		elog(ERROR, "table_insert_end access method is not implemented for relation \"%s\"",
+			 RelationGetRelationName(state->rel));
+}
+
 /*
  * Delete a tuple.
  *
-- 
2.34.1



  [application/x-patch] v12-0002-Optimize-CTAS-with-multi-inserts.patch (2.7K, 3-v12-0002-Optimize-CTAS-with-multi-inserts.patch)
  download | inline diff:
From fd891115178bc33df87844417e35a724b359af96 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 8 Mar 2024 10:11:41 +0000
Subject: [PATCH v12 2/4] Optimize CTAS with multi inserts

---
 src/backend/commands/createas.c | 25 +++++++++----------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 62050f4dc5..7a4415c62f 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -53,9 +53,7 @@ typedef struct
 	/* These fields are filled by intorel_startup: */
 	Relation	rel;			/* relation to write to */
 	ObjectAddress reladdr;		/* address of rel, for ExecCreateTableAs */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_intorel;
 
 /* utility functions for CTAS definition creation */
@@ -552,17 +550,19 @@ intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 	 */
 	myState->rel = intoRelationDesc;
 	myState->reladdr = intoRelationAddr;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM;
 
 	/*
 	 * If WITH NO DATA is specified, there is no need to set up the state for
 	 * bulk inserts as there are no tuples to insert.
 	 */
 	if (!into->skipData)
-		myState->bistate = GetBulkInsertState();
+		myState->ti_state = table_insert_begin(intoRelationDesc,
+											   GetCurrentCommandId(true),
+											   TABLEAM_MULTI_INSERTS |
+											   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+											   TABLE_INSERT_SKIP_FSM);
 	else
-		myState->bistate = NULL;
+		myState->ti_state = NULL;
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -590,11 +590,7 @@ intorel_receive(TupleTableSlot *slot, DestReceiver *self)
 		 * would not be cheap either. This also doesn't allow accessing per-AM
 		 * data (say a tuple's xmin), but since we don't do that here...
 		 */
-		table_tuple_insert(myState->rel,
-						   slot,
-						   myState->output_cid,
-						   myState->ti_options,
-						   myState->bistate);
+		table_multi_insert_v2(myState->ti_state, slot);
 	}
 
 	/* We know this is a newly created relation, so there are no indexes */
@@ -612,10 +608,7 @@ intorel_shutdown(DestReceiver *self)
 	IntoClause *into = myState->into;
 
 	if (!into->skipData)
-	{
-		FreeBulkInsertState(myState->bistate);
-		table_finish_bulk_insert(myState->rel, myState->ti_options);
-	}
+		table_insert_end(myState->ti_state);
 
 	/* close rel, but keep lock until commit */
 	table_close(myState->rel, NoLock);
-- 
2.34.1



  [application/x-patch] v12-0003-Optimize-RMV-with-multi-inserts.patch (2.9K, 4-v12-0003-Optimize-RMV-with-multi-inserts.patch)
  download | inline diff:
From 44caa58dc21e8e4634d214c074a88986b2311b41 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 8 Mar 2024 10:12:02 +0000
Subject: [PATCH v12 3/4] Optimize RMV with multi inserts

---
 src/backend/commands/matview.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 03373462f0..889a9a21f8 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -47,10 +47,7 @@ typedef struct
 	DestReceiver pub;			/* publicly-known function pointers */
 	Oid			transientoid;	/* OID of new heap into which to store */
 	/* These fields are filled by transientrel_startup: */
-	Relation	transientrel;	/* relation to write to */
-	CommandId	output_cid;		/* cmin to insert in output tuples */
-	int			ti_options;		/* table_tuple_insert performance options */
-	BulkInsertState bistate;	/* bulk insert state */
+	TableInsertState *ti_state; /* table insert state */
 } DR_transientrel;
 
 static int	matview_maintenance_depth = 0;
@@ -453,13 +450,13 @@ transientrel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
 
 	transientrel = table_open(myState->transientoid, NoLock);
 
-	/*
-	 * Fill private fields of myState for use by later routines
-	 */
-	myState->transientrel = transientrel;
-	myState->output_cid = GetCurrentCommandId(true);
-	myState->ti_options = TABLE_INSERT_SKIP_FSM | TABLE_INSERT_FROZEN;
-	myState->bistate = GetBulkInsertState();
+	/* Fill private fields of myState for use by later routines */
+	myState->ti_state = table_insert_begin(transientrel,
+										   GetCurrentCommandId(true),
+										   TABLEAM_MULTI_INSERTS |
+										   TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY,
+										   TABLE_INSERT_SKIP_FSM |
+										   TABLE_INSERT_FROZEN);
 
 	/*
 	 * Valid smgr_targblock implies something already wrote to the relation.
@@ -484,12 +481,7 @@ transientrel_receive(TupleTableSlot *slot, DestReceiver *self)
 	 * cheap either. This also doesn't allow accessing per-AM data (say a
 	 * tuple's xmin), but since we don't do that here...
 	 */
-
-	table_tuple_insert(myState->transientrel,
-					   slot,
-					   myState->output_cid,
-					   myState->ti_options,
-					   myState->bistate);
+	table_multi_insert_v2(myState->ti_state, slot);
 
 	/* We know this is a newly created relation, so there are no indexes */
 
@@ -503,14 +495,12 @@ static void
 transientrel_shutdown(DestReceiver *self)
 {
 	DR_transientrel *myState = (DR_transientrel *) self;
+	Relation	transientrel = myState->ti_state->rel;
 
-	FreeBulkInsertState(myState->bistate);
-
-	table_finish_bulk_insert(myState->transientrel, myState->ti_options);
+	table_insert_end(myState->ti_state);
 
 	/* close transientrel, but keep lock until commit */
-	table_close(myState->transientrel, NoLock);
-	myState->transientrel = NULL;
+	table_close(transientrel, NoLock);
 }
 
 /*
-- 
2.34.1



  [application/x-patch] v12-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch (6.3K, 5-v12-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch)
  download | inline diff:
From d53ee9b1b31b0e68858e673a618905d7bfdcf4de Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <[email protected]>
Date: Fri, 8 Mar 2024 10:12:32 +0000
Subject: [PATCH v12 4/4] Use new multi insert TAM for COPY FROM

---
 src/backend/commands/copyfrom.c | 92 ++++++++++++++++++---------------
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 8908a440e1..c2a81d4df1 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -74,10 +74,9 @@
 /* Stores multi-insert data related to a single relation in CopyFrom. */
 typedef struct CopyMultiInsertBuffer
 {
-	TupleTableSlot *slots[MAX_BUFFERED_TUPLES]; /* Array to store tuples */
+	TableInsertState *ti_state; /* Table insert state; NULL if foreign table */
+	TupleTableSlot **slots;		/* Array to store tuples */
 	ResultRelInfo *resultRelInfo;	/* ResultRelInfo for 'relid' */
-	BulkInsertState bistate;	/* BulkInsertState for this rel if plain
-								 * table; NULL if foreign table */
 	int			nused;			/* number of 'slots' containing tuples */
 	uint64		linenos[MAX_BUFFERED_TUPLES];	/* Line # of tuple in copy
 												 * stream */
@@ -220,14 +219,31 @@ limit_printout_length(const char *str)
  * ResultRelInfo.
  */
 static CopyMultiInsertBuffer *
-CopyMultiInsertBufferInit(ResultRelInfo *rri)
+CopyMultiInsertBufferInit(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer;
 
 	buffer = (CopyMultiInsertBuffer *) palloc(sizeof(CopyMultiInsertBuffer));
-	memset(buffer->slots, 0, sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+
+	if (rri->ri_FdwRoutine == NULL)
+	{
+		int			num_slots;
+
+		buffer->ti_state = table_insert_begin(rri->ri_RelationDesc,
+											  miinfo->mycid,
+											  TABLEAM_MULTI_INSERTS |
+											  TABLEAM_BULKWRITE_BUFFER_ACCESS_STRATEGY |
+											  TABLEAM_SKIP_MULTI_INSERTS_FLUSH,
+											  miinfo->ti_options);
+		buffer->slots = table_multi_insert_slots(buffer->ti_state, &num_slots);
+	}
+	else
+	{
+		buffer->slots = palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		buffer->ti_state = NULL;
+	}
+
 	buffer->resultRelInfo = rri;
-	buffer->bistate = (rri->ri_FdwRoutine == NULL) ? GetBulkInsertState() : NULL;
 	buffer->nused = 0;
 
 	return buffer;
@@ -242,7 +258,7 @@ CopyMultiInsertInfoSetupBuffer(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer;
 
-	buffer = CopyMultiInsertBufferInit(rri);
+	buffer = CopyMultiInsertBufferInit(miinfo, rri);
 
 	/* Setup back-link so we can easily find this buffer again */
 	rri->ri_CopyMultiInsertBuffer = buffer;
@@ -319,8 +335,6 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		int			batch_size = resultRelInfo->ri_BatchSize;
 		int			sent = 0;
 
-		Assert(buffer->bistate == NULL);
-
 		/* Ensure that the FDW supports batching and it's enabled */
 		Assert(resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert);
 		Assert(batch_size > 1);
@@ -392,13 +406,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 	}
 	else
 	{
-		CommandId	mycid = miinfo->mycid;
-		int			ti_options = miinfo->ti_options;
 		bool		line_buf_valid = cstate->line_buf_valid;
 		uint64		save_cur_lineno = cstate->cur_lineno;
-		MemoryContext oldcontext;
-
-		Assert(buffer->bistate != NULL);
 
 		/*
 		 * Print error context information correctly, if one of the operations
@@ -406,18 +415,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 		 */
 		cstate->line_buf_valid = false;
 
-		/*
-		 * table_multi_insert may leak memory, so switch to short-lived memory
-		 * context before calling it.
-		 */
-		oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-		table_multi_insert(resultRelInfo->ri_RelationDesc,
-						   slots,
-						   nused,
-						   mycid,
-						   ti_options,
-						   buffer->bistate);
-		MemoryContextSwitchTo(oldcontext);
+		table_multi_insert_flush(buffer->ti_state);
 
 		for (i = 0; i < nused; i++)
 		{
@@ -432,7 +430,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 				cstate->cur_lineno = buffer->linenos[i];
 				recheckIndexes =
 					ExecInsertIndexTuples(resultRelInfo,
-										  buffer->slots[i], estate, false,
+										  slots[i], estate, false,
 										  false, NULL, NIL, false);
 				ExecARInsertTriggers(estate, resultRelInfo,
 									 slots[i], recheckIndexes,
@@ -490,20 +488,15 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo,
 	resultRelInfo->ri_CopyMultiInsertBuffer = NULL;
 
 	if (resultRelInfo->ri_FdwRoutine == NULL)
-	{
-		Assert(buffer->bistate != NULL);
-		FreeBulkInsertState(buffer->bistate);
-	}
+		table_insert_end(buffer->ti_state);
 	else
-		Assert(buffer->bistate == NULL);
-
-	/* Since we only create slots on demand, just drop the non-null ones. */
-	for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
-		ExecDropSingleTupleTableSlot(buffer->slots[i]);
+	{
+		/* Since we only create slots on demand, just drop the non-null ones. */
+		for (i = 0; i < MAX_BUFFERED_TUPLES && buffer->slots[i] != NULL; i++)
+			ExecDropSingleTupleTableSlot(buffer->slots[i]);
 
-	if (resultRelInfo->ri_FdwRoutine == NULL)
-		table_finish_bulk_insert(resultRelInfo->ri_RelationDesc,
-								 miinfo->ti_options);
+		pfree(buffer->slots);
+	}
 
 	pfree(buffer);
 }
@@ -590,13 +583,25 @@ CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
+	TupleTableSlot *slot;
 
 	Assert(buffer != NULL);
 	Assert(nused < MAX_BUFFERED_TUPLES);
 
-	if (buffer->slots[nused] == NULL)
-		buffer->slots[nused] = table_slot_create(rri->ri_RelationDesc, NULL);
-	return buffer->slots[nused];
+	if (rri->ri_FdwRoutine == NULL)
+		slot = table_multi_insert_next_free_slot(buffer->ti_state);
+	else
+	{
+		if (buffer->slots[nused] == NULL)
+		{
+			slot = table_slot_create(rri->ri_RelationDesc, NULL);
+			buffer->slots[nused] = slot;
+		}
+		else
+			slot = buffer->slots[nused];
+	}
+
+	return slot;
 }
 
 /*
@@ -612,6 +617,9 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
 	Assert(buffer != NULL);
 	Assert(slot == buffer->slots[buffer->nused]);
 
+	if (rri->ri_FdwRoutine == NULL)
+		table_multi_insert_v2(buffer->ti_state, slot);
+
 	/* Store the line number so we can properly report any errors later */
 	buffer->linenos[buffer->nused] = lineno;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 43+ messages in thread

* Re: New Table Access Methods for Multi and Single Inserts
@ 2024-03-19 05:09  Masahiko Sawada <[email protected]>
  parent: Bharath Rupireddy <[email protected]>
  0 siblings, 0 replies; 43+ messages in thread

From: Masahiko Sawada @ 2024-03-19 05:09 UTC (permalink / raw)
  To: Bharath Rupireddy <[email protected]>; +Cc: pgsql-hackers; Andres Freund <[email protected]>; Dilip Kumar <[email protected]>; Luc Vlaming <[email protected]>; Justin Pryzby <[email protected]>; Jeff Davis <[email protected]>; Michael Paquier <[email protected]>; Matthias van de Meent <[email protected]>

Hi,

On Fri, Mar 8, 2024 at 7:37 PM Bharath Rupireddy
<[email protected]> wrote:
>
> On Sat, Mar 2, 2024 at 12:02 PM Bharath Rupireddy
> <[email protected]> wrote:
> >
> > On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
> > <[email protected]> wrote:
> > >
> > > > Please find the attached v9 patch set.
> >
> > I've had to rebase the patches due to commit 874d817, please find the
> > attached v11 patch set.
>
> Rebase needed. Please see the v12 patch set.
>

I've not reviewed the patches in depth yet, but run performance tests
for CREATE MATERIALIZED VIEW. The test scenarios is:

-- setup
create unlogged table test (c int);
insert into test select generate_series(1, 10000000);

-- run
create materialized view test_mv as select * from test;

Here are the results:

* HEAD
3775.221 ms
3744.039 ms
3723.228 ms

* v12 patch
6289.972 ms
5880.674 ms
7663.509 ms

I can see performance regressions and the perf report says that CPU
spent most time on extending the ResourceOwner's array while copying
the buffer-heap tuple:

- 52.26% 0.18% postgres postgres [.] intorel_receive
    52.08% intorel_receive
        table_multi_insert_v2 (inlined)
        - heap_multi_insert_v2
            - 51.53% ExecCopySlot (inlined)
                tts_buffer_heap_copyslot
                tts_buffer_heap_store_tuple (inlined)
             - IncrBufferRefCount
                 - ResourceOwnerEnlarge
                     ResourceOwnerAddToHash (inlined)

Is there any reason why we copy a buffer-heap tuple to another
buffer-heap tuple? Which results in that we increments the buffer
refcount and register it to ResourceOwner for every tuples. I guess
that the destination tuple slot is not necessarily a buffer-heap, and
we could use VirtualTupleTableSlot instead. It would in turn require
copying a heap tuple. I might be missing something but it improved the
performance at least in my env. The change I made was:

-       dstslot = table_slot_create(state->rel, NULL);
+       //dstslot = table_slot_create(state->rel, NULL);
+       dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+                                    &TTSOpsVirtual);
+

And the execution times are:
1588.984 ms
1591.618 ms
1582.519 ms

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com






^ permalink  raw  reply  [nested|flat] 43+ messages in thread

end of thread, other threads:[~2024-03-19 05:09 UTC | newest]

Thread overview: 43+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 12:57 New Table Access Methods for Multi and Single Inserts Bharath Rupireddy <[email protected]>
2020-12-11 13:47 ` Bharath Rupireddy <[email protected]>
2020-12-17 05:05 ` Justin Pryzby <[email protected]>
2020-12-17 11:05   ` Bharath Rupireddy <[email protected]>
2020-12-17 20:44     ` Justin Pryzby <[email protected]>
2020-12-18 02:09       ` Bharath Rupireddy <[email protected]>
2020-12-18 17:54         ` Justin Pryzby <[email protected]>
2020-12-21 07:42           ` Bharath Rupireddy <[email protected]>
2020-12-21 07:47           ` Justin Pryzby <[email protected]>
2020-12-24 00:18             ` Bharath Rupireddy <[email protected]>
2020-12-25 02:40               ` Justin Pryzby <[email protected]>
2020-12-28 12:48                 ` Bharath Rupireddy <[email protected]>
2021-01-04 07:59                   ` Luc Vlaming <[email protected]>
2021-01-05 10:06                     ` Bharath Rupireddy <[email protected]>
2021-01-06 07:26                       ` Luc Vlaming <[email protected]>
2021-01-06 13:06                         ` Bharath Rupireddy <[email protected]>
2021-01-12 08:03                           ` Luc Vlaming <[email protected]>
2021-01-16 23:04                           ` Jeff Davis <[email protected]>
2021-01-18 07:58                             ` Luc Vlaming <[email protected]>
2021-01-19 17:33                               ` Jeff Davis <[email protected]>
2021-02-17 07:16                           ` Bharath Rupireddy <[email protected]>
2021-02-20 05:45                             ` Bharath Rupireddy <[email protected]>
2021-02-20 07:25                               ` Zhihong Yu <[email protected]>
2021-02-20 07:50                                 ` Bharath Rupireddy <[email protected]>
2021-03-08 13:07                               ` Dilip Kumar <[email protected]>
2021-03-09 08:15                                 ` Bharath Rupireddy <[email protected]>
2021-03-10 04:51                                   ` Bharath Rupireddy <[email protected]>
2021-04-05 04:19                                     ` Bharath Rupireddy <[email protected]>
2021-04-19 04:51                                       ` Bharath Rupireddy <[email protected]>
2022-10-12 05:30                                         ` Michael Paquier <[email protected]>
2022-10-12 05:35                                           ` Bharath Rupireddy <[email protected]>
2023-06-03 22:38                                         ` Andres Freund <[email protected]>
2023-06-05 02:30                                           ` Bharath Rupireddy <[email protected]>
2023-08-01 16:30                                           ` Bharath Rupireddy <[email protected]>
2023-08-01 17:02                                             ` Jacob Champion <[email protected]>
2024-01-17 17:27                                               ` Bharath Rupireddy <[email protected]>
2024-01-29 07:27                                                 ` Bharath Rupireddy <[email protected]>
2024-01-29 11:46                                                   ` Bharath Rupireddy <[email protected]>
2024-03-02 06:32                                                     ` Bharath Rupireddy <[email protected]>
2024-03-08 10:36                                                       ` Bharath Rupireddy <[email protected]>
2024-03-19 05:09                                                         ` Masahiko Sawada <[email protected]>
2021-01-05 21:28                     ` Jeff Davis <[email protected]>
2021-01-06 07:00                       ` Luc Vlaming <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox