public inbox for [email protected]  
help / color / mirror / Atom feed
From: Masahiko Sawada <[email protected]>
To: Marcos Pegoraro <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Initial COPY of Logical Replication is too slow
Date: Fri, 19 Dec 2025 17:58:59 -0800
Message-ID: <CAD21AoA6i2ui8FMZeuU_KxX4t-fM8G==zTW2Dp6-goujttrpew@mail.gmail.com> (raw)
In-Reply-To: <CAB-JLwbBFNuASyEnZWP0Tck9uNkthBZqi6WoXNevUT6+mV8XmA@mail.gmail.com>
References: <CAB-JLwbBFNuASyEnZWP0Tck9uNkthBZqi6WoXNevUT6+mV8XmA@mail.gmail.com>

Hi,

On Sat, Dec 6, 2025 at 4:19 AM Marcos Pegoraro <[email protected]> wrote:
>
> Subscriber needs to ask publisher about tables and fields to COPY and it uses pg_get_publication_tables for that, and it is too slow when the number of tables is high because on every table it's subscribed it has to run this select.

Yeah, if we pass a publication that a lot of tables belong to to
pg_get_publication_tables(), it could take a long time to return as it
needs to construct many entries.

> We can get the same result with a join on pg_publication_rel.

You changed the query not to use pg_get_publication_tables():

-                                                "SELECT DISTINCT"
-                                                "  (CASE WHEN
(array_length(gpt.attrs, 1) = c.relnatts)"
-                                                "   THEN NULL ELSE
gpt.attrs END)"
-                                                "  FROM pg_publication p,"
-                                                "  LATERAL
pg_get_publication_tables(p.pubname) gpt,"
-                                                "  pg_class c"
-                                                " WHERE gpt.relid =
%u AND c.oid = gpt.relid"
-                                                "   AND p.pubname IN ( %s )",
-                                                lrel->remoteid,
-                                                pub_names->data);
+                       "SELECT CASE WHEN cardinality(r.prattrs) <>
relnatts THEN "
+                                               "r.prattrs END FROM pg_class c "
+                                               "LEFT JOIN LATERAL
(SELECT DISTINCT prattrs FROM "
+                                               "pg_publication_rel r
INNER JOIN pg_publication p "
+                                               "ON p.oid = r.prpubid
WHERE c.oid = r.prrelid AND "
+                                               "pubname in ( %s )) r
ON TRUE WHERE c.oid = %u",
+                                                pub_names->data,
+                                                lrel->remoteid);

Simply replacing pg_get_publication_tables() with joining on
pg_publication_rel doesn't work since pg_get_publication_tables()
cares for several cases, for example where the specified columns are
generated columns and the specified table is a partitioned table etc.
Therefore the patch doesn't pass the regression tests.

I think it would make more sense to introduce a dedicated SQL function
that takes the reloid as well as the list of publications and returns
the relation's the column list and row filter expression while
filtering unnecessary rows inside the function.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





view thread (47+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Initial COPY of Logical Replication is too slow
  In-Reply-To: <CAD21AoA6i2ui8FMZeuU_KxX4t-fM8G==zTW2Dp6-goujttrpew@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox