Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rzf9y-005iHs-UA for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Apr 2024 16:07:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rzf9v-00G21P-Pa for pgsql-hackers@arkaria.postgresql.org; Wed, 24 Apr 2024 16:07:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rzf9v-00G21H-Cp for pgsql-hackers@lists.postgresql.org; Wed, 24 Apr 2024 16:07:47 +0000 Received: from mail-yw1-x1135.google.com ([2607:f8b0:4864:20::1135]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rzf9r-002jNM-VJ for pgsql-hackers@postgresql.org; Wed, 24 Apr 2024 16:07:46 +0000 Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-615019cd427so57274527b3.3 for ; Wed, 24 Apr 2024 09:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713974862; x=1714579662; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=EthlymLyFZh1XJRg888fsO2Z4xm8N5acBkQLCbzI1RM=; b=FmK+zZDjL7jSV5EOj9zqx2BhJ8npr9gZAn22qaG3xv82YrKFnBI7RZpaFID698uvT/ 7Cdq9ARjrKqa3mtGE6p2dSGvzBXqJ8fhPCBPtmP6iTs6QKOhkIQdHUMwD6HAp9Ij2pj1 XLZ58wPAtFFGpH4VFVkxWZHiy7GxsfyNlBO9gJ+kUgWXI+7ewosXPCQ+qjwknCUEWGlc XXO0vLoe7n/rwbev9GLkIl36CwEmGC7TjdaEuM4ZqNgaGnWQHMAZtt8F4/rmw4QqORzi vegazfslB68X1/ZIYlEODYkRifid2j81aQ7uUpW6X3xuoWSfzfFuvU5bFFe1FIhMTp8t cw8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713974862; x=1714579662; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EthlymLyFZh1XJRg888fsO2Z4xm8N5acBkQLCbzI1RM=; b=QyTPOzitzSoufuZbdpy6B8KFLWBGvnOtl16/ik90gmh+w7tDn8FiJLA3fg3UQ3wLTL DRAHk2Hz23RXHatuwRFflnR9U9cE52etd600RxhpusF1e7Rc8OvDR7aGmkdcyq1dqmxL QzJsLfBW+bZyehhWD4h1Zoktzg2wySlN+1ea08Hfo7EthVSGsrpO8gMidsH6NQp9MBTP YxawA3YCnnoxDYMVRfJ9jNhHfzAb+Ky8ERiY/hmBxq19Kxyz7VYQX2BE3mF7p9R2irgP gItQj6VHTSINftKb199PR/dnUYzG3QV6p6+l53VFtQ/a6lUAoajRMzya+Or1B0kgLpEq XyGw== X-Forwarded-Encrypted: i=1; AJvYcCX0kkxmox00on8ra08CH706UfamY3b9W3BjAA94IGwHPDwCsCCBUvycawx73l9lRzjphXDaSr0AGB0Z06ePu4mfLc/8PQAtIWafZGcD X-Gm-Message-State: AOJu0YzPldY4JC1weFhpHQxqQjo0nr66apIB9JdiBbj4823xeKHiMhxG ugmdU2xMcH6IvoZGBio4iVRa5PYoD0nCIKbd2GHQQ7ldu11XqDDqjOjgr2DAiJJg1wvAvAH9o96 C2VN6Bz725wJllYzOzNrRgMMmzAo= X-Google-Smtp-Source: AGHT+IGca3KXjDqQpC6GA6mWUf4tcMcX/SDLfOIvR+gOcI2Rwy7W4tzNSXj9T5Uoqp1qhmLakjjI9w3cPLF8afXRKPE= X-Received: by 2002:a05:690c:6785:b0:615:27b6:762f with SMTP id ib5-20020a05690c678500b0061527b6762fmr2478756ywb.49.1713974861996; Wed, 24 Apr 2024 09:07:41 -0700 (PDT) MIME-Version: 1.0 References: <20230603223824.o7iyochli2dwwi7k@alap3.anarazel.de> <6be6f58815dc0844fbe058edf56b4e735a6efc1c.camel@j-davis.com> <2280bf7241119bb88cbe0fe5eb36490cbd04c0c0.camel@j-davis.com> <12f1a2d8dd3b6305c0354f1c701f44b7be5e54eb.camel@j-davis.com> <8633171cb034aafc260fdf37df04b6c779aa1e2f.camel@j-davis.com> In-Reply-To: From: Pavel Stehule Date: Wed, 24 Apr 2024 18:07:03 +0200 Message-ID: Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM To: Bharath Rupireddy Cc: Jeff Davis , Masahiko Sawada , PostgreSQL-development , Andres Freund , Dilip Kumar , Luc Vlaming , Justin Pryzby , Michael Paquier , Matthias van de Meent , Alexander Korotkov Content-Type: multipart/alternative; boundary="000000000000cdb2b40616d9de6a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000cdb2b40616d9de6a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable st 24. 4. 2024 v 14:50 odes=C3=ADlatel Bharath Rupireddy < bharath.rupireddyforpostgres@gmail.com> napsal: > On Wed, Apr 3, 2024 at 1:10=E2=80=AFAM Jeff Davis wro= te: > > > > Here's where I think this API should go: > > > > 1. Have table_modify_begin/end and table_modify_buffer_insert, like > > those that are implemented in your patch. > > I added table_modify_begin, table_modify_buffer_insert, > table_modify_buffer_flush and table_modify_end. Table Access Method (AM) > authors now can define their own buffering strategy and flushing decision= s > based on their tuple storage kinds and various other AM specific factors.= I > also added a default implementation that falls back to single inserts whe= n > no implementation is provided for these AM by AM authors. See the attache= d > v19-0001 patch. > > > 2. Add some kind of flush callback that will be called either while the > > tuples are being flushed or after the tuples are flushed (but before > > they are freed by the AM). (Aside: do we need to call it while the > > tuples are being flushed to get the right visibility semantics for > > after-row triggers?) > > I added a flush callback named TableModifyBufferFlushCallback; when > provided by callers invoked after tuples are flushed to disk from the > buffers but before the AM frees them up. Index insertions and AFTER ROW > INSERT triggers can be executed in this callback. See the v19-0001 patch > for how AM invokes the flush callback, and see either v19-0003 or v19-000= 4 > or v19-0005 for how a caller can supply the callback and required context > to execute index insertions and AR triggers. > > > 3. Add table_modify_buffer_{update|delete} APIs. > > > > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use > > the buffer_insert/update/delete APIs; we don't need a separate merge > > method. This probably requires that the AM maintain 3 separate buffers > > to distinguish different kinds of changes at flush time (obviously > > these can be initialized lazily to avoid overhead when not being used). > > I haven't thought about these things yet. I can only focus on them after > seeing how the attached patches go from here. > > > 4. Some kind of API tweaks to help manage memory when modifying > > pertitioned tables, so that the buffering doesn't get out of control. > > Perhaps just reporting memory usage and allowing the caller to force > > flushes would be enough. > > Heap implementation for thes new Table AMs uses a separate memory context > for all of the operations. Please have a look and let me know if we need > anything more. > > > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is > > fairly straightforward, I believe, and handled by your patch. Indexes > > are (re)built afterward, and no triggers are possible. > > > > 6. Use these new methods for CREATE TABLE ... AS. This is fairly > > straightforward, I believe, and handled by your patch. No indexes or > > triggers are possible. > > I used multi inserts for all of these including TABLE REWRITE commands > such as ALTER TABLE. See the attached v19-0002 patch. Check the testing > section below for benefits. > > FWIW, following are some of the TABLE REWRITE commands that can get > benefitted: > > ALTER TABLE tbl ALTER c1 TYPE bigint; > ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY; > ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2; > ALTER TABLE itest3 ALTER COLUMN a TYPE int; > ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); > ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int; > and so on. > > > 7. Use these new methods for COPY. We have to be careful to avoid > > regressions for the heap method, because it's already managing its own > > buffers. If the AM manages the buffering, then it may require > > additional copying of slots, which could be a disadvantage. To solve > > this, we may need some minor API tweaks to avoid copying when the > > caller guarantees that the memory will not be freed to early, or > > perhaps expose the AM's memory context to copyfrom.c. Another thing to > > consider is that the buffering in copyfrom.c is also used for FDWs, so > > that buffering code path needs to be preserved in copyfrom.c even if > > not used for AMs. > > I modified the COPY FROM code to use the new Table AMs, and performed som= e > tests which show no signs of regression. Check the testing section below > for more details. See the attached v19-0005 patch. With this, > table_multi_insert can be deprecated. > > > 8. Use these new methods for INSERT INTO ... SELECT. One potential > > challenge here is that execution nodes are not always run to > > completion, so we need to be sure that the flush isn't forgotten in > > that case. > > I did that in v19-0003. I did place the table_modify_end call in multiple > places including ExecEndModifyTable. I didn't find any issues with it. > Please have a look and let me know if we need the end call in more places= . > Check the testing section below for benefits. > > > 10. Use these new methods for logical apply. > > I used multi inserts for Logical Replication apply. in v19-0004. Check th= e > testing section below for benefits. > > FWIW, open-source pglogical does have multi insert support, check code > around > https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_app= ly_heap.c#L960 > . > > > 11. Deprecate the multi_insert API. > > I did remove both table_multi_insert and table_finish_bulk_insert in > v19-0006. Perhaps, removing them isn't a great idea, but adding a > deprecation WARNING/ERROR until some more PG releases might be worth > looking at. > > > Thoughts on this plan? Does your patch make sense in v17 as a stepping > > stone, or should we try to make all of these API changes together in > > v18? > > If the design, code and benefits that these new Table AMs bring to the > table look good, I hope to see it for PG 18. > > > Also, a sample AM code would be a huge benefit here. Writing a real AM > > is hard, but perhaps we can at least have an example one to demonstrate > > how to use these APIs? > > The attached patches already have implemented these new Table AMs for > Heap. I don't think we need a separate implementation to demonstrate. If > others feel so, I'm open to thoughts here. > > Having said above, I'd like to reiterate the motivation behind the new > Table AMs for multi and single inserts. > > 1. A scan-like API with state being carried across is thought to be bette= r > as suggested by Andres Freund - > https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx@ala= p3.anarazel.de > . > 2. Allowing a Table AM to optimize operations across multiple inserts, > define its own buffering strategy and take its own flushing decisions bas= ed > on their tuple storage kinds and various other AM specific factors. > 3. Improve performance of various SQL commands with multi inserts for Hea= p > AM. > > The attached v19 patches might need some more detailed comments, some > documentation and some specific tests ensuring the multi inserts for Heap > are kicked-in for various commands. I'm open to thoughts here. > > I did some testing to see how various commands benefit with multi inserts > using these new Table AM for heap. It's not only the improvement in > performance these commands see, but also the amount of WAL that gets > generated reduces greatly. After all, multi inserts optimize the insertio= ns > by writing less WAL. IOW, writing WAL record per page if multiple rows fi= t > into a single data page as opposed to WAL record per row. > > Test case 1: 100 million rows, 2 columns (int and float) > > Command | HEAD (sec) | PATCHED (sec) | Faster by % > | Faster by X > ------------------------------ | ---------- | ------------- | ----------- > | ----------- > CREATE TABLE AS | 121 | 77 | 36.3 > | 1.57 > CREATE MATERIALIZED VIEW | 101 | 49 | 51.4 > | 2.06 > REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2 > | 2.09 > ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6 > | 1.53 > COPY FROM | 71 | 72 | 0 > | 1 > INSERT INTO ... SELECT | 117 | 62 | 47 > | 1.88 > LOGICAL REPLICATION APPLY | 393 | 306 | 22.1 > | 1.28 > > Command | HEAD (WAL in GB) | PATCHED (WAL in GB) | > Reduced by % | Reduced by X > ------------------------------ | ---------------- | ------------------- | > ------------ | ----------- > CREATE TABLE AS | 6.8 | 2.4 | > 64.7 | 2.83 > CREATE MATERIALIZED VIEW | 7.2 | 2.3 | > 68 | 3.13 > REFRESH MATERIALIZED VIEW | 10 | 5.1 | > 49 | 1.96 > ALTER TABLE (TABLE REWRITE) | 8 | 3.2 | > 60 | 2.5 > COPY FROM | 2.9 | 3 | > 0 | 1 > INSERT INTO ... SELECT | 8 | 3 | > 62.5 | 2.66 > LOGICAL REPLICATION APPLY | 7.5 | 2.3 | > 69.3 | 3.26 > > Test case 2: 1 billion rows, 1 column (int) > > Command | HEAD (sec) | PATCHED (sec) | Faster by % > | Faster by X > ------------------------------ | ---------- | ------------- | ----------- > | ----------- > CREATE TABLE AS | 794 | 386 | 51.38 > | 2.05 > CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03 > | 1.78 > REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28 > | 1.62 > ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94 > | 1.66 > COPY FROM | 321 | 330 | -0.02 > | 0.97 > INSERT INTO ... SELECT | 1084 | 586 | 45.94 > | 1.84 > LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52 > | 1.18 > > Command | HEAD (WAL in GB) | PATCHED (WAL in GB) | > Reduced by % | Reduced by X > ------------------------------ | ---------------- | ------------------- | > ------------ | ----------- > CREATE TABLE AS | 60 | 12 | > 80 | 5 > CREATE MATERIALIZED VIEW | 60 | 12 | > 80 | 5 > REFRESH MATERIALIZED VIEW | 60 | 12 | > 80 | 5 > ALTER TABLE (TABLE REWRITE) | 123 | 31 | > 60 | 2.5 > COPY FROM | 12 | 12 | > 0 | 1 > INSERT INTO ... SELECT | 120 | 24 | > 80 | 5 > LOGICAL REPLICATION APPLY | 61 | 12 | > 80.32 | 5 > looks pretty impressive! Pavel > > Test setup: > ./configure --prefix=3D$PWD/pg17/ --enable-tap-tests CFLAGS=3D"-ggdb3 -O2= " > > install.log && make -j 8 install > install.log 2>&1 & > > wal_level=3Dlogical > max_wal_size =3D 256GB > checkpoint_timeout =3D 1h > > Test system is EC2 instance of type c5.4xlarge: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 46 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Vendor ID: GenuineIntel > Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz > CPU family: 6 > Model: 85 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s): 1 > Stepping: 7 > BogoMIPS: 5999.99 > Caches (sum of all): > L1d: 256 KiB (8 instances) > L1i: 256 KiB (8 instances) > L2: 8 MiB (8 instances) > L3: 35.8 MiB (1 instance) > NUMA: > NUMA node(s): 1 > NUMA node0 CPU(s): 0-15 > RAM: > MemTotal: 32036536 kB > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > --000000000000cdb2b40616d9de6a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
st 24. 4. 2024 v=C2=A014:50 odes=C3= =ADlatel Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> napsal:
= On Wed, Apr 3, 2024 at 1:10=E2=80=AFAM Jeff Davis <pgsql@j-davis.com> wrote:
><= br>> Here's where I think this API should go:
>
> 1. Hav= e table_modify_begin/end and table_modify_buffer_insert, like
> those= that are implemented in your patch.

I added table_modify_begin, tab= le_modify_buffer_insert, table_modify_buffer_flush and table_modify_end. Ta= ble Access Method (AM) authors now can define their own buffering strategy = and flushing decisions based on their tuple storage kinds and various other= AM specific factors. I also added a default implementation that falls back= to single inserts when no implementation is provided for these AM by AM au= thors. See the attached v19-0001 patch.

> 2. Add some kind of flu= sh callback that will be called either while the
> tuples are being f= lushed or after the tuples are flushed (but before
> they are freed b= y the AM). (Aside: do we need to call it while the
> tuples are being= flushed to get the right visibility semantics for
> after-row trigge= rs?)

I added a flush callback named TableModifyBufferFlushCallback; = when provided by callers invoked after tuples are flushed to disk from the = buffers but before the AM frees them up. Index insertions and AFTER ROW INS= ERT triggers can be executed in this callback. See the v19-0001 patch for h= ow AM invokes the flush callback, and see either v19-0003 or v19-0004 or v1= 9-0005 for how a caller can supply the callback and required context to exe= cute index insertions and AR triggers.

> 3. Add table_modify_buff= er_{update|delete} APIs.
>
> 9. Use these new methods for DELET= E, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete= APIs; we don't need a separate merge
> method. This probably req= uires that the AM maintain 3 separate buffers
> to distinguish differ= ent kinds of changes at flush time (obviously
> these can be initiali= zed lazily to avoid overhead when not being used).

I haven't tho= ught about these things yet. I can only focus on them after seeing how the = attached patches go from here.

> 4. Some kind of API tweaks to he= lp manage memory when modifying
> pertitioned tables, so that the buf= fering doesn't get out of control.
> Perhaps just reporting memor= y usage and allowing the caller to force
> flushes would be enough.
Heap implementation for thes new Table AMs uses a separate memory con= text for all of the operations. Please have a look and let me know if we ne= ed anything more.

> 5. Use these new methods for CREATE/REFRESH M= ATERIALIZED VIEW. This is
> fairly straightforward, I believe, and ha= ndled by your patch. Indexes
> are (re)built afterward, and no trigge= rs are possible.
>
> 6. Use these new methods for CREATE TABLE = ... AS. This is fairly
> straightforward, I believe, and handled by y= our patch. No indexes or
> triggers are possible.

I used multi= inserts for all of these including TABLE REWRITE commands such as ALTER TA= BLE. See the attached v19-0002 patch. Check the testing section below for b= enefits.

FWIW, following are some of the TABLE REWRITE commands that= can get benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER = TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER M= ATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTE= R COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION A= S (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 100= 00)::int;
and so on.

> 7. Use these new methods for COPY. We h= ave to be careful to avoid
> regressions for the heap method, because= it's already managing its own
> buffers. If the AM manages the b= uffering, then it may require
> additional copying of slots, which co= uld be a disadvantage. To solve
> this, we may need some minor API tw= eaks to avoid copying when the
> caller guarantees that the memory wi= ll not be freed to early, or
> perhaps expose the AM's memory con= text to copyfrom.c. Another thing to
> consider is that the buffering= in copyfrom.c is also used for FDWs, so
> that buffering code path n= eeds to be preserved in copyfrom.c even if
> not used for AMs.
I modified the COPY FROM code to use the new Table AMs, and performed some= tests which show no signs of regression. Check the testing section below f= or more details. See the attached v19-0005 patch. With this, table_multi_in= sert can be deprecated.

> 8. Use these new methods for INSERT INT= O ... SELECT. One potential
> challenge here is that execution nodes = are not always run to
> completion, so we need to be sure that the fl= ush isn't forgotten in
> that case.

I did that in v19-0003= . I did place the table_modify_end call in multiple places including ExecEn= dModifyTable. I didn't find any issues with it. Please have a look and = let me know if we need the end call in more places. Check the testing secti= on below for benefits.

> 10. Use these new methods for logical ap= ply.

I used multi inserts for Logical Replication apply. in v19-0004= . Check the testing section below for benefits.

FWIW, ope= n-source pglogical does have multi insert support, check code around https://github.com/2ndQuadrant/pglogic= al/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960.

> 1= 1. Deprecate the multi_insert API.

I did remove both table_multi_ins= ert and table_finish_bulk_insert in v19-0006. Perhaps, removing them isn= 9;t a great idea, but adding a deprecation WARNING/ERROR until some more PG= releases might be worth looking at.

> Thoughts on this plan? Doe= s your patch make sense in v17 as a stepping
> stone, or should we tr= y to make all of these API changes together in
> v18?

If the d= esign, code and benefits that these new Table AMs bring to the table look g= ood, I hope to see it for PG 18.

> Also, a sample AM code would b= e a huge benefit here. Writing a real AM
> is hard, but perhaps we ca= n at least have an example one to demonstrate
> how to use these APIs= ?

The attached patches already have implemented these new Table AMs = for Heap. I don't think we need a separate implementation to demonstrat= e. If others feel so, I'm open to thoughts here.

Having said abo= ve, I'd like to reiterate the motivation behind the new Table AMs for m= ulti and single inserts.

1. A scan-like API with state being carried= across is thought to be better as suggested by Andres Freund - https://www.postgresql.org/message-id/20200= 924024128.kyk3r5g7dnu3fxxx@alap3.anarazel.de.
2. Allowing a Table AM= to optimize operations across multiple inserts, define its own buffering s= trategy and take its own flushing decisions based on their tuple storage ki= nds and various other AM specific factors.
3. Improve performance of var= ious SQL commands with multi inserts for Heap AM.

The attached v19 p= atches might need some more detailed comments, some documentation and some = specific tests ensuring the multi inserts for Heap are kicked-in for variou= s commands. I'm open to thoughts here.

I did some testing to see= how various commands benefit with multi inserts using these new Table AM f= or heap. It's not only the improvement in performance these commands se= e, but also the amount of WAL that gets generated reduces greatly. After al= l, multi inserts optimize the insertions by writing less WAL. IOW, writing = WAL record per page if multiple rows fit into a single data page as opposed= to WAL record per row.

Test case 1: 100 million rows, 2 columns (in= t and float)

Command =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| HEAD (sec) | PATCHED (sec) | Faster by % | Faster by X
------------= ------------------ | ---------- | ------------- | ----------- | -----------=
CREATE TABLE AS =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | 121 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 77 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0| 36.3 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 1.57
CREATE MATERIALIZED VIEW = =C2=A0 =C2=A0 =C2=A0 | 101 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 49 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0| 51.4 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 2.06
REFR= ESH MATERIALIZED VIEW =C2=A0 =C2=A0 =C2=A0| 113 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | 54 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 52.2 =C2=A0 =C2=A0 =C2=A0 = =C2=A0| 2.09
ALTER TABLE (TABLE REWRITE) =C2=A0 =C2=A0| 124 =C2=A0 =C2= =A0 =C2=A0 =C2=A0| 81 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 34.6 =C2= =A0 =C2=A0 =C2=A0 =C2=A0| 1.53
COPY FROM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 71 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 | 72 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 | 1
INSERT INTO ... SELECT =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 1= 17 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 62 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| 47 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 1.88
LOGICAL REPLICATION APP= LY =C2=A0 =C2=A0 =C2=A0| 393 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 306 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 | 22.1 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 1.28

Comma= nd =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0| HEAD (WAL in GB) | PATCHED (WAL in GB) | Reduced by % | Redu= ced by X
------------------------------ | ---------------- | -----------= -------- | ------------ | -----------
CREATE TABLE AS =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 6.8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0| 2.4 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 | 64.7 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 2.83
CREATE MATERIALIZED VIEW = =C2=A0 =C2=A0 =C2=A0 | 7.2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | 2.3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 68 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 3.13
REFRESH MATERIALIZED VIEW =C2=A0 =C2= =A0 =C2=A0| 10 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 5.1 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 49 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 | 1.96
ALTER TABLE (TABLE REWRITE) =C2=A0 =C2=A0| 8 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 3.2 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 60 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | 2.5
COPY FROM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0| 2.9 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0| 3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | = 0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 1
INSERT INTO ... SELECT = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 8 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0| 3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | 62.5 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 2.66
LOGICAL REPLICATION APP= LY =C2=A0 =C2=A0 =C2=A0| 7.5 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| 2.3 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 69.3 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 | 3.26

Test case 2: 1 billion rows, = 1 column (int)

Command =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0| HEAD (sec) | PATCHED (sec) | Faster by % | Faster by X
------------= ------------------ | ---------- | ------------- | ----------- | -----------=
CREATE TABLE AS =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | 794 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 386 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |= 51.38 =C2=A0 =C2=A0 =C2=A0 | 2.05
CREATE MATERIALIZED VIEW =C2=A0 =C2= =A0 =C2=A0 | 1006 =C2=A0 =C2=A0 =C2=A0 | 563 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 | 44.03 =C2=A0 =C2=A0 =C2=A0 | 1.78
REFRESH MATERIALIZED VIEW =C2= =A0 =C2=A0 =C2=A0| 977 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 603 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 | 38.28 =C2=A0 =C2=A0 =C2=A0 | 1.62
ALTER TABLE (TABLE= REWRITE) =C2=A0 =C2=A0| 1189 =C2=A0 =C2=A0 =C2=A0 | 714 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 | 39.94 =C2=A0 =C2=A0 =C2=A0 | 1.66
COPY FROM =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 321 = =C2=A0 =C2=A0 =C2=A0 =C2=A0| 330 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | -0.02= =C2=A0 =C2=A0 =C2=A0 | 0.97
INSERT INTO ... SELECT =C2=A0 =C2=A0 =C2=A0= =C2=A0 | 1084 =C2=A0 =C2=A0 =C2=A0 | 586 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 | 45.94 =C2=A0 =C2=A0 =C2=A0 | 1.84
LOGICAL REPLICATION APPLY =C2=A0= =C2=A0 =C2=A0| 3530 =C2=A0 =C2=A0 =C2=A0 | 2982 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0| 15.52 =C2=A0 =C2=A0 =C2=A0 | 1.18

Command =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| HEAD= (WAL in GB) | PATCHED (WAL in GB) | Reduced by % | Reduced by X
-------= ----------------------- | ---------------- | ------------------- | --------= ---- | -----------
CREATE TABLE AS =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0| 60 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |= 12 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 80 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 5
CREATE MATERIALIZED VIEW =C2=A0 =C2= =A0 =C2=A0 | 60 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 12 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 80 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 | 5
REFRESH MATERIALIZED VIEW =C2=A0 =C2=A0 =C2= =A0| 60 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 12 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 80 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 | 5
ALTER TABLE (TABLE REWRITE) =C2=A0 =C2=A0| 123 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 31 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 60 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |= 2.5
COPY FROM =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0| 12 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |= 12 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 1
INSERT INTO ... SELECT =C2=A0 = =C2=A0 =C2=A0 =C2=A0 | 120 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= | 24 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 80 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 5
LOGICAL REPLICATION APPLY =C2=A0 =C2= =A0 =C2=A0| 61 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 12 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 80.32 =C2=A0 =C2= =A0 =C2=A0 =C2=A0| 5

loo= ks pretty impressive!

Pavel
=C2=A0

Test setup:
./configure --pre= fix=3D$PWD/pg17/ --enable-tap-tests CFLAGS=3D"-ggdb3 -O2" > in= stall.log && make -j 8 install > install.log 2>&1 &
wal_level=3Dlogical
max_wal_size =3D 256GB
checkpoint_timeout = =3D 1h


Test system is EC2 instance of type c5.4xlarge:
Architecture: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0x86_64=
=C2=A0 CPU op-mode(s): =C2=A0 =C2=A0 =C2=A0 =C2=A032-bit, 64-bit
=C2= =A0 Address sizes: =C2=A0 =C2=A0 =C2=A0 =C2=A0 46 bits physical, 48 bits vi= rtual
=C2=A0 Byte Order: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Little= Endian
CPU(s): =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A016
=C2=A0 On-line CPU(s) list: =C2=A0 0-15
Vendor ID: =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 GenuineIntel
=C2=A0 Model name= : =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Intel(R) Xeon(R) Platinum 8275CL= CPU @ 3.00GHz
=C2=A0 =C2=A0 CPU family: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A06
=C2=A0 =C2=A0 Model: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 85
=C2=A0 =C2=A0 Thread(s) per core: =C2=A02
=C2=A0 =C2=A0 Core(s= ) per socket: =C2=A08
=C2=A0 =C2=A0 Socket(s): =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 1
=C2=A0 =C2=A0 Stepping: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A07
=C2=A0 =C2=A0 BogoMIPS: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A05999.99

Caches (sum of a= ll): =C2=A0 =C2=A0
=C2=A0 L1d: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 256 KiB (8 instances)
=C2=A0 L1i: =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 256 KiB (8 instances)
= =C2=A0 L2: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A08 MiB (8 instances)
=C2=A0 L3: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A035.8 MiB (1 instance)
NUMA: =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 NUMA = node(s): =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A01
=C2=A0 NUMA node0 CPU(s): = =C2=A0 =C2=A0 0-15

RAM:
=C2=A0 MemTo= tal: =C2=A0 =C2=A0 =C2=A0 32036536 kB

--
Bharat= h Rupireddy
PostgreSQL Contributors Team
RDS Open Source DatabasesAmazon Web Services: = https://aws.amazon.com
--000000000000cdb2b40616d9de6a--