public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jeff Davis <[email protected]>
To: Bharath Rupireddy <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Dilip Kumar <[email protected]>
Cc: Luc Vlaming <[email protected]>
Cc: Justin Pryzby <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: Wed, 15 May 2024 16:31:42 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALj2ACUz5+_YNEa4ZY-XG960_oXefM50MjD71VgSCAVDkF3bzQ@mail.gmail.com>
References: <CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com>
	<[email protected]>
	<CALj2ACX5UMWVFdrRNUE0KDrg54WV1cumBXwcETXhrPc1ibKAQA@mail.gmail.com>
	<CAAWbhmj5Pio3nOUakObzLGCSS9dwFfgsNVDhwTGzXNwZc00uCQ@mail.gmail.com>
	<CALj2ACVHC=c6eC9SRxhcTUrnXvNDNkEBgedi2WkVJYRb=0sWYw@mail.gmail.com>
	<CALj2ACVE2h=LnFnpr3rh+6SZzdwzW5EZOYG2Z0t=p28Fn75eag@mail.gmail.com>
	<CALj2ACWT0Rz8oybWBm5W4CeS0DvFkwaw-pEvGArhDLyPbZnW_g@mail.gmail.com>
	<CALj2ACWxO3HPtpYZb765LZk-uKVuAvZPO1HDeZ8=mzMgVPgaww@mail.gmail.com>
	<CALj2ACXJA4QQ_6zAHez0Uy-9t-ebmpox2y1QBja+mF4QP+h8WQ@mail.gmail.com>
	<CAD21AoD97mhzF8cqsd2v1jg9z8xfvAJrPx6Wvi+Ev0Hmu96LJA@mail.gmail.com>
	<CALj2ACUcv5pZoB0=gRrz54M9+YT9JCmo6FYyo5WUS6wnS+em=A@mail.gmail.com>
	<CALj2ACWm77YofBMs9x3Zmp3ctNAhcS4TvPVuXKdfwCr22FqOHg@mail.gmail.com>
	<[email protected]>
	<CALj2ACWqVzhxDuWNTWAH-LuADvsyX0r-wpwgeJ+Q1FnAKjY5Yw@mail.gmail.com>
	<[email protected]>
	<CALj2ACU70HZm+0QRJdkGA5RdJUo4zPYnV2hzkiV-wH5QS2PAEQ@mail.gmail.com>
	<[email protected]>
	<CALj2ACVMV=gMROte2=0LBFnSCRvzL4D9WK6oQ9ZHr4Qj2S8xWA@mail.gmail.com>
	<[email protected]>
	<CALj2ACX9vVYHYkX8e6w058EuAs8JL5EsnzadTxGhpiE_Ep_ByA@mail.gmail.com>
	<[email protected]>
	<CALj2ACWTrx1zxWvq8Uj2rEwCsDgQHeJ53WdvzZUw3kW+_VPG6A@mail.gmail.com>
	<CALj2ACUz5+_YNEa4ZY-XG960_oXefM50MjD71VgSCAVDkF3bzQ@mail.gmail.com>

On Wed, 2024-05-15 at 12:56 +0530, Bharath Rupireddy wrote:
> Because of this, the
> buffers get flushed sooner than that of the existing COPY with
> table_multi_insert AM causing regression in pgbench which uses COPY
> extensively.

The flushing behavior is entirely controlled by the table AM. The heap
can use the same flushing logic that it did before, which is to hold
1000 tuples.

I like that it's accounting for memory, too, but it doesn't need to be
overly restrictive. Why not just use work_mem? That should hold 1000
reasonably-sized tuples, plus overhead.

Even better would be if we could take into account partitioning. That
might be out of scope for your current work, but it would be very
useful. We could have a couple new GUCs like modify_table_buffer and
modify_table_buffer_per_partition or something like that.

> 1. Try to get the actual tuple sizes excluding header sizes for each
> column in the new TAM.

I don't see the point in arbitrarily excluding the header.

> v21 also adds code to maintain tuple size for virtual tuple slots.
> This helps make better memory-based flushing decisions in the new
> TAM.

That seems wrong. We shouldn't need to change the TupleTableSlot
structure for this patch.


Comments on v21:

* All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

* The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
ExecInsert(). What's the disadvantage to using a bulk insert state
there?

* I'm a bit confused by TableModifyState->modify_end_callback. The AM
both sets the callback and calls the callback -- why can't the code
just go into the table_modify_end method?

* The code structure in table_modify_begin() (and related) is strange.
Can it be simplified or am I missing something?

* Why are table_modify_state and insert_modify_buffer_flush_context
globals? What if there are multiple modify nodes in a plan?

* Can you explain the design in logical rep?

Regards,
	Jeff Davis







view thread (30+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox