MIME-Version: 1.0
References: <CALj2ACVi9eTRYR=gdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ@mail.gmail.com>
 <20201217050522.GU30237@telsasoft.com> <CALj2ACVgT1iocd5nQ+rEmqt3xcCONkR037qbc8PiojdR39Ag=w@mail.gmail.com>
 <20201217204442.GX30237@telsasoft.com> <CALj2ACW3BC5kgdffZ2LD_CT2wQoXVc29kGB74SVWnGZ=UFqcAQ@mail.gmail.com>
 <20201218175439.GA30237@telsasoft.com> <20201221074725.GF30237@telsasoft.com>
 <CALj2ACWMnZZCu=G0PJkEeYYicKeuJ-X=SU19i6vQ1+=uXz8u0Q@mail.gmail.com>
 <20201225023958.GW30237@telsasoft.com> <CALj2ACVDtYYRYD2SC+X2ALOUkhnUcgC7RLxiEYVWW2HxxrfRww@mail.gmail.com>
 <96eaa813-4ad6-e80a-04a4-cc8082d356dd@swarm64.com> <CALj2ACVsiAZMsP8p5MPg6SSEtoMFFaiAa6j2AFtEQJDhfbgs3Q@mail.gmail.com>
 <508af801-6356-d36b-1867-011ac6df8f55@swarm64.com> <CALj2ACUmL3+xLFtVbdcNpo_=ubdi=_nsp6MNq__xWwL=NGkdgA@mail.gmail.com>
 <CALj2ACXoKTQuz8FKJxgB_=Jr_2_ZCy7gDteBrUa_5pd7Ov_1Tg@mail.gmail.com>
 <CALj2ACVbcYCvTw_jMnuGjLMBiQug7YAL3ezJFM9QMdewoJZLcw@mail.gmail.com>
 <CAFiTN-tBRvtRNgeW5mmdwXoDNubhUrYqzLF58O_+JLo_cFb_7Q@mail.gmail.com>
 <CALj2ACWZ95GhLTXc0dw1_Nu05xs130HPCnWX4tfkmp0CtBCMJg@mail.gmail.com>
 <CALj2ACX0fvSAxiGB8_yDsXo7JHSaDNUSHwH9OEpiMgnrynaP+g@mail.gmail.com>
 <CALj2ACXBTFKOTi0_ni0Ef+DHWy8pH=SX6Q2tyG-8WmmsfxatNQ@mail.gmail.com> <CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com>
In-Reply-To: <CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com>
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 4 Mar 2022 15:37:32 +0100
Message-ID: <CAEze2Wj2+s64uw-g3fkNJgLfmT9mzJd10z4sB6NogYO_DgPKwQ@mail.gmail.com>
Subject: Re: New Table Access Methods for Multi and Single Inserts
To: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Cc: Dilip Kumar <dilipbalaut@gmail.com>, Luc Vlaming <luc@swarm64.com>, 
	Justin Pryzby <pryzby@telsasoft.com>, 
	PostgreSQL-development <pgsql-hackers@postgresql.org>, Andres Freund <andres@anarazel.de>, 
	Paul Guo <guopa@vmware.com>, Jeff Davis <pgsql@j-davis.com>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://www.postgresql.org/message-id/CAEze2Wj2%2Bs64uw-g3fkNJgLfmT9mzJd10z4sB6NogYO_DgPKwQ%40mail.gmail.com>
Precedence: bulk

On Mon, 19 Apr 2021 at 06:52, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > Attaching the v4 patch set. Please review it further.
> >
> > Attaching v5 patch set after rebasing onto the latest master.
>
> Another rebase due to conflicts in 0003. Attaching v6 for review.

I recently touched the topic of multi_insert, and I remembered this
patch. I had to dig a bit to find it, but as it's still open I've
added some comments:

> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> +#define MAX_BUFFERED_TUPLES        1000
> +#define MAX_BUFFERED_BYTES        65535

It looks like these values were copied over from copyfrom.c, but are
now expressed in the context of the heapam.
As these values are now heap-specific (as opposed to the
TableAM-independent COPY infrastructure), shouldn't we instead
optimize for heap page insertions? That is, I suggest using a multiple
(1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.

> TableInsertState->flushed
> TableInsertState->mi_slots

I don't quite like the current storage-and-feedback mechanism for
flushed tuples. The current assumptions in this mechanism seem to be
that
1.) access methods always want to flush all available tuples at once,
2.) access methods want to maintain the TupleTableSlots for all
inserted tuples that have not yet had all triggers handled, and
3.) we need access to the not-yet-flushed TupleTableSlots.

I think that that is not a correct set of assumptions; I think that
only flushed tuples need to be visible to the tableam-agnostic code;
and that tableams should be allowed to choose which tuples to flush at
which point; as long as they're all flushed after a final
multi_insert_flush.

Examples:
A heap-based access method might want bin-pack tuples and write out
full pages only; and thus keep some tuples in the buffers as they
didn't fill a page; thus having flushed only a subset of the current
buffered tuples.
A columnstore-based access method might not want to maintain the
TupleTableSlots of original tuples, but instead use temporary columnar
storage: TupleTableSlots are quite large when working with huge
amounts of tuples; and keeping lots of tuple data in memory is
expensive.

As such, I think that this should be replaced with a
TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
managed by the tableAM, in which only the flushed tuples are visible
to the AM-agnostic code. This is not much different from how the
implementation currently works; except that ->mi_slots now does not
expose unflushed tuples; and that ->flushed is replaced by an integer
value of number of flushed tuples.

A further improvement (in my opinion) would be the change from a
single multi_insert_flush() to a signalling-based multi_insert_flush:
It is not unreasonable for e.g. a columnstore to buffer tens of
thousands of inserts; but doing so in TupleTableSlots would introduce
a high memory usage. Allowing for batched retrieval of flushed tuples
would help in memory usage; which is why multiple calls to
multi_insert_flush() could be useful. To handle this gracefully, we'd
probably add TIS->mi_flush_remaining, where each insert adds one to
mi_flush_remaining; and each time mi_flushed_slots has been handled
mi_flush_remaining is decreased by mi_flushed_len by the handler code.
Once we're done inserting into the table, we keep calling
multi_insert_flush until no more tuples are being flushed (and error
out if we're still waiting for flushes but no new flushed tuples are
returned).

- Matthias