MIME-Version: 1.0
References: 
 <CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com>
 <CAA4eK1+UL6wVDNzkpHjA7RVLD_8AkrP2tu+RvQ2h5AUjyEe+-Q@mail.gmail.com>
 <5ABD7727-CD22-4112-A186-0E788EE78109@gmail.com>
 <CAFiTN-ucvk8JOiLvjii6VXar6nYJvCQDgzp8_4v55yweUmzdzw@mail.gmail.com>
 <23A24BFF-18A7-4FE9-AAFA-13E1AA207DD0@gmail.com>
 <CAA4eK1KzjxO-qWjWSox6e6AWH4FVU5ZPEgeZ+na=eyov7umutg@mail.gmail.com>
 <D4A3A4D7-2F51-4F0D-9FDE-95E9C318148E@gmail.com>
In-Reply-To: <D4A3A4D7-2F51-4F0D-9FDE-95E9C318148E@gmail.com>
From: GRANT ZHOU <grantzhou@gmail.com>
Date: Wed, 17 Dec 2025 01:32:39 -0800
Message-ID: 
 <CA+FXcm8z5JO0ft9ABcnFtpGEc7_O41_2pHCP3PJZ6_biVh+Pzg@mail.gmail.com>
Subject: Re: Improve logical replication usability when tables lack primary
 keys
To: Chao Li <li.evan.chao@gmail.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>,
 Dilip Kumar <dilipbalaut@gmail.com>,
	Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000a0c69e06462288b3"
Archived-At: 
 <https://www.postgresql.org/message-id/CA%2BFXcm8z5JO0ft9ABcnFtpGEc7_O41_2pHCP3PJZ6_biVh%2BPzg%40mail.gmail.com>
Precedence: bulk

--000000000000a0c69e06462288b3
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Dec 16, 2025 at 4:59=E2=80=AFPM Chao Li <li.evan.chao@gmail.com> wr=
ote:

> > On Dec 15, 2025, at 13:48, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > So, without patch, there is no way we can silently replicate the
> > UPDATE/DELETE. Ideally, users should alter the tables and make RI as
> > FULL in such cases if they don't have PK for such tables. Falling back
> > to FULL for DEFAULT when the table doesn't have PK based on GUC has a
> > downside that it will increase WAL volume by a large amount.
>
> I agree that this downside exists, but it is an inherent cost that users
> must accept if they choose to replicate all tables, including those witho=
ut
> a primary key. In practice, users who opt into such a configuration are
> typically aware of the WAL overhead and make that trade-off consciously.
>
> > I don't know what is a good way to give to users who don't want to do
> > the required setup but if we really want to provide something, it is
> > better to allow such a thing via the publication option instead.
>
> Using a publication-level option could also work. One complication,
> however, is that a table can belong to multiple publications. For example=
,
> if table_a belongs to both pub_a and pub_b, and only pub_a is configured
> with fallback_to_full while pub_b keeps the default behavior
> (fallback_to_none), then the effective behavior for table_a would need to
> remain fallback_to_none, meaning that UPDATE/DELETE would still not be
> allowed if table_a has not a primary key.
>
> > I think it would be good to do such an enhancement if we have more
> > community support and some other users also appreciate such a feature.
> > Otherwise, adding something which is specific to a particular user
> > sounds like a recipe of maintenance burden especially when we already
> > provide a way to achieve the same thing as is required by the user.
>
> Let me elaborate on that point.
>
> My company has a very large user base in China, with over 100K deployment=
s
> across multiple industries. However, there is currently a significant gap
> between this large user population and direct participation in the PG
> community. I joined the company in July this year as a full-time
> contributor to the PG community, and one of my responsibilities is to hel=
p
> bridge this gap and bring real-world user feedback into community
> discussions.
>
> As I mentioned in my earlier email, this requirement comes from
> large-scale deployments. The database owners in these environments have
> operational models that may not always align with what we consider the
> ideal or fully optimized setup, but they are the result of years of
> accumulated practice and operational experience. For these users, the
> proposed feature would significantly simplify their day-to-day operations
> and reduce operational friction.
>

+1 on the importance of addressing these large-scale operational realities.

Beyond the scale issue, I believe there is a noticeable inconsistency
between the documentation's promise of automation and the actual behavior
of Replica Identity.

1. The "Practical Gap" of Schema Automation
According to the documentation for FOR TABLES IN SCHEMA [1], the feature
matches "all tables in the specified list of schemas, including tables
created in the future". This explicitly promises an unattended, automated
workflow for new tables.

However, this promise is immediately broken by the default Replica Identity
rules:
1) New tables are created with REPLICA IDENTITY DEFAULT [2] by standard.
2) For tables without a primary key, DEFAULT identity "cannot support
UPDATE or DELETE operations" and "attempting such operations will result in
an error on the publisher"[3].

This creates a logical trap: The system automatically adds the new table to
the publication (as promised), but then immediately fails on the first
UPDATE operation because the table creates with an incompatible default
identity.
This forces manual intervention (ALTER TABLE) in what is supposed to be an
automated workflow.

2. Regarding the solution:
I support Amit's suggestion of a Publication Option. It avoids the risks of
a global GUC while allowing users to explicitly opt-in to the trade-off
(accepting higher WAL volume) to ensure the automation provided by FOR
TABLES IN SCHEMA is functionally complete.

[1]
https://www.postgresql.org/docs/current/sql-createpublication.html#SQL-CREA=
TEPUBLICATION-PARAMS-FOR-TABLES-IN-SCHEMA
[2]
https://www.postgresql.org/docs/18/sql-altertable.html#SQL-ALTERTABLE-REPLI=
CA-IDENTITY
[3]
https://www.postgresql.org/docs/18/logical-replication-publication.html#LOG=
ICAL-REPLICATION-PUBLICATION-REPLICA-IDENTITY


--
Grant Zhou at Highgo Software

--000000000000a0c69e06462288b3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr">On Tue, Dec 16, 2025 at =
4:59=E2=80=AFPM Chao Li &lt;<a href=3D"mailto:li.evan.chao@gmail.com" targe=
t=3D"_blank">li.evan.chao@gmail.com</a>&gt; wrote:</div><div class=3D"gmail=
_quote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex">
&gt; On Dec 15, 2025, at 13:48, Amit Kapila &lt;<a href=3D"mailto:amit.kapi=
la16@gmail.com" target=3D"_blank">amit.kapila16@gmail.com</a>&gt; wrote:<br=
><br>&gt; So, without patch, there is no way we can silently replicate the<=
br>
&gt; UPDATE/DELETE. Ideally, users should alter the tables and make RI as<b=
r>
&gt; FULL in such cases if they don&#39;t have PK for such tables. Falling =
back<br>
&gt; to FULL for DEFAULT when the table doesn&#39;t have PK based on GUC ha=
s a<br>
&gt; downside that it will increase WAL volume by a large amount.<br>
<br>
I agree that this downside exists, but it is an inherent cost that users mu=
st accept if they choose to replicate all tables, including those without a=
 primary key. In practice, users who opt into such a configuration are typi=
cally aware of the WAL overhead and make that trade-off consciously.<br><br=
>
&gt; I don&#39;t know what is a good way to give to users who don&#39;t wan=
t to do<br>
&gt; the required setup but if we really want to provide something, it is<b=
r>
&gt; better to allow such a thing via the publication option instead.<br>
<br>
Using a publication-level option could also work. One complication, however=
, is that a table can belong to multiple publications. For example, if tabl=
e_a belongs to both pub_a and pub_b, and only pub_a is configured with fall=
back_to_full while pub_b keeps the default behavior (fallback_to_none), the=
n the effective behavior for table_a would need to remain fallback_to_none,=
 meaning that UPDATE/DELETE would still not be allowed if table_a has not a=
 primary key.<br>
<br>
&gt; I think it would be good to do such an enhancement if we have more<br>
&gt; community support and some other users also appreciate such a feature.=
<br>
&gt; Otherwise, adding something which is specific to a particular user<br>
&gt; sounds like a recipe of maintenance burden especially when we already<=
br>
&gt; provide a way to achieve the same thing as is required by the user.<br=
>
<br>
Let me elaborate on that point.<br>
<br>
My company has a very large user base in China, with over 100K deployments =
across multiple industries. However, there is currently a significant gap b=
etween this large user population and direct participation in the PG commun=
ity. I joined the company in July this year as a full-time contributor to t=
he PG community, and one of my responsibilities is to help bridge this gap =
and bring real-world user feedback into community discussions.<br>
<br>
As I mentioned in my earlier email, this requirement comes from large-scale=
 deployments. The database owners in these environments have operational mo=
dels that may not always align with what we consider the ideal or fully opt=
imized setup, but they are the result of years of accumulated practice and =
operational experience. For these users, the proposed feature would signifi=
cantly simplify their day-to-day operations and reduce operational friction=
.<br></blockquote><div><br></div>+1 on the importance of addressing these l=
arge-scale operational realities.<br><br>Beyond the scale issue, I believe =
there is a noticeable inconsistency between the documentation&#39;s promise=
 of automation and the actual behavior of Replica Identity.<br><br>1. The &=
quot;Practical Gap&quot; of Schema Automation<br>According to the documenta=
tion for FOR TABLES IN SCHEMA [1], the feature matches &quot;all tables in =
the specified list of schemas, including tables created in the future&quot;=
. This explicitly promises an unattended, automated workflow for new tables=
.<br><br>However, this promise is immediately broken by the default Replica=
 Identity rules:<br>1) New tables are created with REPLICA IDENTITY DEFAULT=
 [2] by standard. <br>2) For tables without a primary key, DEFAULT identity=
 &quot;cannot support UPDATE or DELETE operations&quot; and &quot;attemptin=
g such operations will result in an error on the publisher&quot;[3].<br><br=
>This creates a logical trap: The system automatically adds the new table t=
o the publication (as promised), but then immediately fails on the first UP=
DATE operation because the table creates with an incompatible default ident=
ity. <br>This forces manual intervention (ALTER TABLE) in what is supposed =
to be an automated workflow.<br><br>2. Regarding the solution:<br>I support=
 Amit&#39;s suggestion of a Publication Option. It avoids the risks of a gl=
obal GUC while allowing users to explicitly opt-in to the trade-off (accept=
ing higher WAL volume) to ensure the automation provided by FOR TABLES IN S=
CHEMA is functionally complete.<br><br>[1] <a href=3D"https://www.postgresq=
l.org/docs/current/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-=
FOR-TABLES-IN-SCHEMA" target=3D"_blank">https://www.postgresql.org/docs/cur=
rent/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-FOR-TABLES-IN-=
SCHEMA</a><br>[2] <a href=3D"https://www.postgresql.org/docs/18/sql-alterta=
ble.html#SQL-ALTERTABLE-REPLICA-IDENTITY" target=3D"_blank">https://www.pos=
tgresql.org/docs/18/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY</a>=
<br><div>[3] <a href=3D"https://www.postgresql.org/docs/18/logical-replicat=
ion-publication.html#LOGICAL-REPLICATION-PUBLICATION-REPLICA-IDENTITY" targ=
et=3D"_blank">https://www.postgresql.org/docs/18/logical-replication-public=
ation.html#LOGICAL-REPLICATION-PUBLICATION-REPLICA-IDENTITY</a>=C2=A0<br><b=
r></div><div>--</div><div>Grant Zhou at Highgo Software=C2=A0</div></div></=
div>
</div>

--000000000000a0c69e06462288b3--