Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1knilY-00076t-3E for pgsql-hackers@arkaria.postgresql.org; Fri, 11 Dec 2020 13:47:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1knilW-0001vo-W0 for pgsql-hackers@arkaria.postgresql.org; Fri, 11 Dec 2020 13:47:22 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1knilW-0001vg-L4 for pgsql-hackers@lists.postgresql.org; Fri, 11 Dec 2020 13:47:22 +0000 Received: from mail-pj1-x1033.google.com ([2607:f8b0:4864:20::1033]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1knilT-0006vp-TW for pgsql-hackers@postgresql.org; Fri, 11 Dec 2020 13:47:22 +0000 Received: by mail-pj1-x1033.google.com with SMTP id iq13so2220501pjb.3 for ; Fri, 11 Dec 2020 05:47:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=DTUXRC0lqSmN2WtPbj4qz5ZbnXZ4fVmQeb/vAjER8b8=; b=LlwJN2JjX8HtGNvKmqMCaFEamsmVwiITiLRdbkfxfkSMsGfVwjK6RCuDqc9go8BrRb qb4uSLWb07ntrg4QBejK1Yhz0Zqa1a9qQ+enkFOuRv1SfyuHbfxMXvyiaSxFnDr5ryB2 UmI2zek/Q8pS07SckaDUHIj/GtXPktl3Xx4vBXwckL8WtJZwtGQWFIfCu5MnbjPNRhe9 fvBydKtGsf5MPeXduXdBpr4W9Kgv+swzK8q8eyvgkO22wg+Z35vUQ3zrj7byinUIQjvg KiLbzdYtwd2vN0FspnPjuLA7k6nxV6VHRjjWfAAeXXw4I7lv+dUIhbBNQBk/CWYuS1ga cNBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=DTUXRC0lqSmN2WtPbj4qz5ZbnXZ4fVmQeb/vAjER8b8=; b=g4nCtwuhhMbOLnJd0vFeefq6gQ7rm5KA3F/bBfre6kV7ZilqlGBr7rwZxfd4v1dYTd ZNT8JmWjKjkbWT6715mEpYP0LI47SOIv04qdm/Z706MT4rg90TZxtL5UMnpzw3it6DaI yjVYaiE1FfDz3HcXHwCu0eRli3wz3bltTQ49ZHz7N9udV8w5Qx+l5uPxe2VsE5XrIfF6 EhIc798pINGewoRWwUBCSxU+GLY+OLzK5y/TLMHDsj5szm5fPTuTvZLGcAdR78VlaiFZ iIQYlrOiKsYtlEvYYZSIqurkCA3HrGIOEkEVpntwETEakUOkZFgomi+PeRvQKn+8dLdc ka2Q== X-Gm-Message-State: AOAM532wskgXb5icM/qZFz8DW9XtHa9gd+g4GiI4oA9MoaBtVrPGlG0s TUXWr9ptIbDpBSdwKzNADV90Q+e+MQY9DkvyozcksEL4 X-Google-Smtp-Source: ABdhPJwd4lm9IEU/w+hNKdeDCMz0T9/Buw6tyy/0YJXPbJMsgfJGrrwsVn4TPIwqA/tY9sx8cqB3tYyUq+4lgrNDCpk= X-Received: by 2002:a17:90a:970b:: with SMTP id x11mr13174025pjo.16.1607694437171; Fri, 11 Dec 2020 05:47:17 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bharath Rupireddy Date: Fri, 11 Dec 2020 19:17:05 +0530 Message-ID: Subject: Re: New Table Access Methods for Multi and Single Inserts To: PostgreSQL-development , Andres Freund , Luc Vlaming Content-Type: multipart/alternative; boundary="000000000000d5c53205b63087b5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --000000000000d5c53205b63087b5 Content-Type: text/plain; charset="UTF-8" On Tue, Dec 8, 2020 at 6:27 PM Bharath Rupireddy < bharath.rupireddyforpostgres@gmail.com> wrote: > Hi, > > Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH > Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer > slots allocation, maintenance, decision to flush and clean up, need to > be implemented outside the table_multi_insert() API. The main problem > is that it fails to take into consideration the underlying storage > engine capabilities, for more details of this point refer to a > discussion in multi inserts in CTAS thread[1]. This also creates a lot > of duplicate code which is more error prone and not maintainable. > > More importantly, in another thread [3] @Andres Freund suggested to > have table insert APIs in such a way that they look more like 'scan' > APIs i.e. insert_begin, insert, insert_end. The main advantages doing > this are(quoting from his statement in [3]) - "more importantly it'd > allow an AM to optimize operations across multiple inserts, which is > important for column stores." > > I propose to introduce new table access methods for both multi and > single inserts based on the prototype suggested by Andres in [3]. Main > design goal of these new APIs is to give flexibility to tableam > developers in implementing multi insert logic dependent on the > underlying storage engine. > > Below are the APIs. I suggest to have a look at > v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details > of the new data structure and the API functionality. Note that > temporarily I used XX_v2, we can change it later. > > TableInsertState* table_insert_begin(initial_args); > void table_insert_v2(TableInsertState *state, TupleTableSlot *slot); > void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot); > void table_multi_insert_flush(TableInsertState *state); > void table_insert_end(TableInsertState *state); > > I'm attaching a few patches(just to show that these APIs work, avoids > a lot of duplicate code and makes life easier). Better commenting can > be added later. If these APIs and patches look okay, we can even > consider replacing them in other places such as nodeModifyTable.c and > so on. > > v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch ---> > introduces new table access methods for multi and single inserts. Also > implements/rearranges the outside code for heap am into these new > APIs. > v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch > ---> adds new multi insert table access methods to CREATE TABLE AS, > CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW. > v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds > new single insert table access method to ALTER TABLE rewrite table > code. > v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds > new single and multi insert table access method to COPY code. > > Thoughts? > > [1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com > [2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com > [3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de Added this to commitfest to get it reviewed further. https://commitfest.postgresql.org/31/2871/ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com --000000000000d5c53205b63087b5 Content-Type: text/html; charset="UTF-8"
On Tue, Dec 8, 2020 at 6:27 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
> Hi,
>
> Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
> Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
> slots allocation, maintenance, decision to flush and clean up, need to
> be implemented outside the table_multi_insert() API. The main problem
> is that it fails to take into consideration the underlying storage
> engine capabilities, for more details of this point refer to a
> discussion in multi inserts in CTAS thread[1]. This also creates a lot
> of duplicate code which is more error prone and not maintainable.
>
> More importantly, in another thread [3] @Andres Freund suggested to
> have table insert APIs in such a way that they look more like 'scan'
> APIs i.e. insert_begin, insert, insert_end. The main advantages doing
> this are(quoting from his statement in [3]) - "more importantly it'd
> allow an AM to optimize operations across multiple inserts, which is
> important for column stores."
>
> I propose to introduce new table access methods for both multi and
> single inserts based on the prototype suggested by Andres in [3]. Main
> design goal of these new APIs is to give flexibility to tableam
> developers in implementing multi insert logic dependent on the
> underlying storage engine.
>
> Below are the APIs. I suggest to have a look at
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
> of the new data structure and the API functionality. Note that
> temporarily I used XX_v2, we can change it later.
>
> TableInsertState* table_insert_begin(initial_args);
> void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_flush(TableInsertState *state);
> void table_insert_end(TableInsertState *state);
>
> I'm attaching a few patches(just to show that these APIs work, avoids
> a lot of duplicate code and makes life easier). Better commenting can
> be added later. If these APIs and patches look okay, we can even
> consider replacing them in other places such as nodeModifyTable.c and
> so on.
>
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
> introduces new table access methods for multi and single inserts. Also
> implements/rearranges the outside code for heap am into these new
> APIs.
> v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
> ---> adds new multi insert table access methods to CREATE TABLE AS,
> CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
> v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
> new single insert table access method to ALTER TABLE rewrite table
> code.
> v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
> new single and multi insert table access method to COPY code.
>
> Thoughts?
>
> [1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
> [2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
> [3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

Added this to commitfest to get it reviewed further.

https://commitfest.postgresql.org/31/2871/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
--000000000000d5c53205b63087b5--