public inbox for [email protected]
help / color / mirror / Atom feedFrom: Masahiko Sawada <[email protected]>
To: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: Jan Wieck <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Initial COPY of Logical Replication is too slow
Date: Mon, 30 Mar 2026 17:29:01 -0700
Message-ID: <CAD21AoB_WyfWHcqfanUH1aGtMRwuqvQULOan=2skYtZW7aGNGg@mail.gmail.com> (raw)
In-Reply-To: <OS9PR01MB121494C802D79DAAEA1B1D073F556A@OS9PR01MB12149.jpnprd01.prod.outlook.com>
References: <CAB-JLwbBFNuASyEnZWP0Tck9uNkthBZqi6WoXNevUT6+mV8XmA@mail.gmail.com>
<CAD21AoA6i2ui8FMZeuU_KxX4t-fM8G==zTW2Dp6-goujttrpew@mail.gmail.com>
<CAB-JLwZpp=7c9_r0beWWJxRh2BS_2Vvth8UDv7H57DBeaqggVg@mail.gmail.com>
<CAD21AoDT3sL2COprsRumM9zEpL1Bk5VWboK4V2mRnjGua8xfeA@mail.gmail.com>
<CAD21AoDQM62GOtaTzD_CVMSsFhv6o9c0Au1dSM1QuxeKFkWAKw@mail.gmail.com>
<CAD21AoCz7HjEr3oeb=haK31YHxHZLcvD_wx_a-+xLPKywq++3A@mail.gmail.com>
<TY4PR01MB16907733B75A99117F013AFCA947FA@TY4PR01MB16907.jpnprd01.prod.outlook.com>
<CAD21AoA9YgiY1rVKMPZwB00WU_G4UfzoawY=7hyd7hpvBPcK6w@mail.gmail.com>
<CAA4eK1KoSi60dtakJzn0MxNnHF1Yf4indSAffTjJxQG_31jsgQ@mail.gmail.com>
<CAD21AoB4B3MOxJ7-v9YLjV5fTOtaLRUhX3jN3kqhEi7D7-uY4A@mail.gmail.com>
<[email protected]>
<CAD21AoCmHpKrNg9D3mcOA973CZ5N_dBLxb8pERpSxEeRLSQxpA@mail.gmail.com>
<CAD21AoAEVyxwn_bMWHvcU-Gcz3aUVjAtMbdgfoJ8MZNiLLEh0g@mail.gmail.com>
<CAA4eK1Jkouj=w+PHzMB6v890ES3QOLf=cUTvZmGFr-WMQW2OnA@mail.gmail.com>
<CAD21AoB4_n7+s=uM9apX1JVtvGvgM8ismAx_uMxvDmUXfQULsw@mail.gmail.com>
<CAD21AoBJcxRcaWQot302diaxoDcsnezRhnZa7p8UrPh5AGNeHQ@mail.gmail.com>
<OS9PR01MB121494C802D79DAAEA1B1D073F556A@OS9PR01MB12149.jpnprd01.prod.outlook.com>
On Thu, Mar 26, 2026 at 1:35 AM Hayato Kuroda (Fujitsu)
<[email protected]> wrote:
>
> Dear Sawada-san,
> (Sending again because blocked by some rules)
>
> I ran the performance testing independently for the 0001 patch. Overall performance looked
> very nice, new function spent O(1) time based on the total number of tables.
> It seems good enough.
>
> Source code:
> ----------------
> HEAD (4287c50f) + v4-0001 patch.
>
> Setup:
> ---------
> A database cluster was set up with shared_buffers=100GB. Several tables were
> defined on the public schema, and same number of tables were on the sch1.
> Total number of tables were {50, 500, 5000, 50000}.
> A publication included a schema sch1 and all public tables individually.
>
> Attached script setup the same. The suffix is changed to .txt to pass the rule.
>
> Workload Run:
> --------------------
> I ran two types of SQLs and measured the execution time via \timing metacommand.
> Cases were emulated which tablesync worker would do.
>
> Case 1: old SQL
> ```
> SELECT DISTINCT
> (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
> THEN NULL ELSE gpt.attrs END)
> FROM pg_publication p,
> LATERAL pg_get_publication_tables(p.pubname) gpt,
> pg_class c
> WHERE gpt.relid = 17885 AND c.oid = gpt.relid
> AND p.pubname IN ( 'pub' );
> ```
>
> Case 2: new SQL
> ```
> SELECT DISTINCT
> (CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)
> THEN NULL ELSE gpt.attrs END)
> FROM pg_publication p,
> LATERAL pg_get_publication_tables(p.pubname, 16535) gpt,
> pg_class c
> WHERE c.oid = gpt.relid
> AND p.pubname IN ( 'pub' );
> ```
>
> Result Observations:
> ---------------
> Attached bar graph shows the result. A logarithmic scale is used for the execution
> time (y-axis) to see both small/large scale case. The spent time became approximately
> 10x longer for 500->5000, and 5000->50000, in case of old SQL is used.
> Apart from that, the spent time for the new SQL is mostly the stable based on the
> number of tables.
>
> Detailed Result:
> --------------
> Each cell are the median of 10 runs.
>
> Total tables Execution time for the old SQL was done [ms] Execution time for the old SQL was done [ms]
> 50 5.77 4.19
> 500 15.75 4.28
> 5000 120.39 4.22
> 50000 1741.89 4.60
> 500000 73287.16 4.95
Thank you for doing the performance tests! These observation match the
results of my local performance test.
BTW the new is_table_publishable_in_publication() can be useful other
places too where we check if the particular table is published by the
publication, for example get-rel_sync_entry(). It would be a separate
patch though.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
view thread (51+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Initial COPY of Logical Replication is too slow
In-Reply-To: <CAD21AoB_WyfWHcqfanUH1aGtMRwuqvQULOan=2skYtZW7aGNGg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox