public inbox for [email protected]
help / color / mirror / Atom feedFrom: Jeff Davis <[email protected]>
To: Bharath Rupireddy <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Dilip Kumar <[email protected]>
Cc: Luc Vlaming <[email protected]>
Cc: Justin Pryzby <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: Alexander Korotkov <[email protected]>
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date: Mon, 26 Aug 2024 14:18:21 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CALj2ACXnCvcYvaz4aDH_ezUBJanz_Boi8W=76fOwrxiwSnUFOQ@mail.gmail.com>
References: <CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com>
<[email protected]>
<CALj2ACX5UMWVFdrRNUE0KDrg54WV1cumBXwcETXhrPc1ibKAQA@mail.gmail.com>
<CAAWbhmj5Pio3nOUakObzLGCSS9dwFfgsNVDhwTGzXNwZc00uCQ@mail.gmail.com>
<CALj2ACVHC=c6eC9SRxhcTUrnXvNDNkEBgedi2WkVJYRb=0sWYw@mail.gmail.com>
<CALj2ACVE2h=LnFnpr3rh+6SZzdwzW5EZOYG2Z0t=p28Fn75eag@mail.gmail.com>
<CALj2ACWT0Rz8oybWBm5W4CeS0DvFkwaw-pEvGArhDLyPbZnW_g@mail.gmail.com>
<CALj2ACWxO3HPtpYZb765LZk-uKVuAvZPO1HDeZ8=mzMgVPgaww@mail.gmail.com>
<CALj2ACXJA4QQ_6zAHez0Uy-9t-ebmpox2y1QBja+mF4QP+h8WQ@mail.gmail.com>
<CAD21AoD97mhzF8cqsd2v1jg9z8xfvAJrPx6Wvi+Ev0Hmu96LJA@mail.gmail.com>
<CALj2ACUcv5pZoB0=gRrz54M9+YT9JCmo6FYyo5WUS6wnS+em=A@mail.gmail.com>
<CALj2ACWm77YofBMs9x3Zmp3ctNAhcS4TvPVuXKdfwCr22FqOHg@mail.gmail.com>
<[email protected]>
<CALj2ACWqVzhxDuWNTWAH-LuADvsyX0r-wpwgeJ+Q1FnAKjY5Yw@mail.gmail.com>
<[email protected]>
<CALj2ACU70HZm+0QRJdkGA5RdJUo4zPYnV2hzkiV-wH5QS2PAEQ@mail.gmail.com>
<[email protected]>
<CALj2ACVMV=gMROte2=0LBFnSCRvzL4D9WK6oQ9ZHr4Qj2S8xWA@mail.gmail.com>
<[email protected]>
<CALj2ACX9vVYHYkX8e6w058EuAs8JL5EsnzadTxGhpiE_Ep_ByA@mail.gmail.com>
<[email protected]>
<CALj2ACWTrx1zxWvq8Uj2rEwCsDgQHeJ53WdvzZUw3kW+_VPG6A@mail.gmail.com>
<CALj2ACUz5+_YNEa4ZY-XG960_oXefM50MjD71VgSCAVDkF3bzQ@mail.gmail.com>
<[email protected]>
<CALj2ACX90L5Mb5Vv=jsvhOdZ8BVsfpZf-CdCGhtm2N+bGUCSjg@mail.gmail.com>
<CALj2ACXnCvcYvaz4aDH_ezUBJanz_Boi8W=76fOwrxiwSnUFOQ@mail.gmail.com>
On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <[email protected]> wrote:
> >
> > Please find the v22 patches with the above changes.
>
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.
Thank you.
0001 API design:
* Remove TableModifyState.modify_end_callback.
* This patch means that we will either remove or deprecate
TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
strong opinions about maintaining support for multi-insert, or should
we just remove it outright and force any new AMs to implement the new
APIs to maintain COPY performance?
* Why do we need a separate "modify_flags" and "options"? Can't we just
combine them into TABLE_MODIFY_* flags?
Alexander, you had some work in this area as well, such b1484a3f19. I
believe 0001 covers this use case in a different way: rather than
giving complete responsibility to the AM to insert into the indexes,
the caller provides a callback and the AM is responsible for calling it
at the time the tuples are flushed. Is that right?
The design has been out for a while, so unless others have suggestions,
I'm considering the major design points mostly settled and I will move
forward with something like 0001 (pending implementation issues).
Note: I believe this API will extend naturally to updates and deletes,
as well.
0001 implementation issues:
* We need default implementations for AMs that don't implement the new
APIs, so that the AM will still function even if it only defines the
single-tuple APIs. If we need to make use of the AM's multi_insert
method (I'm not sure we do), then the default methods would need to
handle that as well. (I thought a previous version had these default
implementations -- is there a reason they were removed?)
* I am confused about how the heap implementation manages state and
resets it. mistate->mem_cxt is initialized to a new memory context in
heap_modify_begin, and then re-initialized to another new memory
context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also
used as a temp context for executing heap_multi_insert, and it gets
reset before calling the flush callback, which still needs the slots.
* Why materialize the slot at copyfrom.c:1308 if the slot is going to
be copied anyway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?
* After correcting the memory issues, can you get updated performance
numbers for COPY?
Regards,
Jeff Davis
view thread (30+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox