Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\))
Subject: Re: Improve logical replication usability when tables lack primary
 keys
From: Chao Li <li.evan.chao@gmail.com>
In-Reply-To: 
 <CAJpy0uDFWoGZQJ-zPudcs-+BvXaDk5i2LKiRj5Q+bQcNyG=_PA@mail.gmail.com>
Date: Tue, 14 Apr 2026 11:57:48 +0800
Cc: Amit Kapila <amit.kapila16@gmail.com>,
 Euler Taveira <euler@eulerto.com>,
 GRANT ZHOU <grantzhou@gmail.com>,
 "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>,
 Dilip Kumar <dilipbalaut@gmail.com>,
 Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <58FDF77E-17BB-42B8-B42A-0A339ACA0938@gmail.com>
References: 
 <CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com>
 <CAA4eK1+UL6wVDNzkpHjA7RVLD_8AkrP2tu+RvQ2h5AUjyEe+-Q@mail.gmail.com>
 <5ABD7727-CD22-4112-A186-0E788EE78109@gmail.com>
 <CAFiTN-ucvk8JOiLvjii6VXar6nYJvCQDgzp8_4v55yweUmzdzw@mail.gmail.com>
 <23A24BFF-18A7-4FE9-AAFA-13E1AA207DD0@gmail.com>
 <CAA4eK1KzjxO-qWjWSox6e6AWH4FVU5ZPEgeZ+na=eyov7umutg@mail.gmail.com>
 <D4A3A4D7-2F51-4F0D-9FDE-95E9C318148E@gmail.com>
 <TY4PR01MB16907E1C68EA6EBADCE54ABBE94ABA@TY4PR01MB16907.jpnprd01.prod.outlook.com>
 <a9da608f-24be-4213-a712-8592852d37f1@app.fastmail.com>
 <CA+FXcm-YZXJpc6E7XEDTv9Yaic=U7Dwnjj4znxJ4gCxUZMcXww@mail.gmail.com>
 <19dac243-1f46-4720-bdec-bf0f851d03b9@app.fastmail.com>
 <875BBCC0-CF08-4136-8E9E-F03DF75C3A11@gmail.com>
 <CAA4eK1KMOTYGSQdoPBZsJ6T-QEw5eYHG0padQXTViu7LeeFWMw@mail.gmail.com>
 <75C23B7E-8923-4CD1-83A1-F2E6E9B0D5D3@gmail.com>
 <CE70CD5B-8E54-4FEF-AB42-7C604940398D@gmail.com>
 <CAJpy0uDFWoGZQJ-zPudcs-+BvXaDk5i2LKiRj5Q+bQcNyG=_PA@mail.gmail.com>
To: shveta malik <shveta.malik@gmail.com>
Archived-At: 
 <https://www.postgresql.org/message-id/58FDF77E-17BB-42B8-B42A-0A339ACA0938%40gmail.com>
Precedence: bulk


> On Apr 8, 2026, at 19:43, shveta malik <shveta.malik@gmail.com> wrote:
>=20
> On Wed, Feb 4, 2026 at 12:51=E2=80=AFPM Chao Li =
<li.evan.chao@gmail.com> wrote:
>>=20
>>=20
>>=20
>>> On Dec 30, 2025, at 16:07, Chao Li <li.evan.chao@gmail.com> wrote:
>>>=20
>>>=20
>>>=20
>>>> On Dec 22, 2025, at 19:48, Amit Kapila <amit.kapila16@gmail.com> =
wrote:
>>>>=20
>>>> On Fri, Dec 19, 2025 at 1:39=E2=80=AFPM Chao Li =
<li.evan.chao@gmail.com> wrote:
>>>>>=20
>>>>>> On Dec 18, 2025, at 22:49, Euler Taveira <euler@eulerto.com> =
wrote:
>>>>>>=20
>>>>>> On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote:
>>>>>>> On Wed, Dec 17, 2025 at 12:50=E2=80=AFPM Euler Taveira =
<euler@eulerto.com> wrote:
>>>>>>>> Each table needs to say what's its row identifier. The user =
created a table
>>>>>>>> without primary key. Well, create a primary key. There are =
dozens of thousands
>>>>>>>> of objects. Use a script.
>>>>>>> However, I=E2=80=99d like to share a user perspective regarding =
the "use a
>>>>>>> script" approach. The main value of `FOR TABLES IN SCHEMA` is
>>>>>>> *in-database automation*. If users still need to maintain =
external
>>>>>>> scripts to monitor and `ALTER` new tables to prevent replication
>>>>>>> errors, it significantly diminishes the value of that =
automation.
>>>>>>>=20
>>>>>>=20
>>>>>> As I tried to explain in the previous email, the problem with FOR =
ALL TABLES
>>>>>> and FOR TABLES IN SCHEMA syntax is that the is no catalog =
information about the
>>>>>> relations; the list of relations is collected at runtime.
>>>>>>=20
>>>>>> When I suggested "use a script" I was referring to fix the =
logical replication
>>>>>> setup regarding the lack of primary key. There is no need to have =
an automation
>>>>>> outside the database, use an event trigger. If your lazy user =
doesn't create
>>>>>> the primary key, assign REPLICA IDENTITY FULL. Something like
>>>>>>=20
>>>>>> -- This example is far from being a complete solution for fixing =
the lack of
>>>>>> -- primary key in a logical replication scenario.
>>>>>> -- ALTER TABLE should be supported too
>>>>>> CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity()
>>>>>>     RETURNS event_trigger LANGUAGE plpgsql AS $$
>>>>>> DECLARE
>>>>>> obj record;
>>>>>> rec record;
>>>>>> ricnt integer :=3D 0;
>>>>>> BEGIN
>>>>>> FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
>>>>>> LOOP
>>>>>>     IF obj.command_tag =3D 'CREATE TABLE' THEN
>>>>>>         SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid =3D=
 obj.objid AND indisprimary;
>>>>>>         RAISE NOTICE 'ricnt: %', ricnt;
>>>>>>         IF ricnt =3D 0 THEN
>>>>>>             EXECUTE 'ALTER TABLE ' || obj.object_identity || ' =
REPLICA IDENTITY FULL';
>>>>>>         END IF;
>>>>>>     END IF;
>>>>>> END LOOP;
>>>>>> END;
>>>>>> $$;
>>>>>>=20
>>>>>> CREATE EVENT TRIGGER event_trigger_for_replica_identity
>>>>>> ON ddl_command_end
>>>>>> EXECUTE FUNCTION event_trigger_for_replica_identity();
>>>>>>=20
>>>>>> CREATE TABLE event_trigger_test_1 (a int);
>>>>>> \d+ event_trigger_test_1
>>>>>> CREATE TABLE event_trigger_test_2 (a int primary key);
>>>>>> \d+ event_trigger_test_2
>>>>>> CREATE TABLE event_trigger_test_3 (a int, b text not null, =
primary key(b));
>>>>>> \d+ event_trigger_test_3
>>>>>> --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT =
event_trigger_test_3_pkey;
>>>>>> --\d+ event_trigger_test_3
>>>>>>=20
>>>>>> DROP EVENT TRIGGER event_trigger_for_replica_identity;
>>>>>> DROP FUNCTION event_trigger_for_replica_identity;
>>>>>> DROP TABLE event_trigger_test_1, event_trigger_test_2, =
event_trigger_test_3;
>>>>>>=20
>>>>>> =
8<------------------------------------------------------------------------=
----8<
>>>>>>=20
>>>>>>> Additionally, tables without Primary Keys are valid SQL and =
extremely
>>>>>>> common in enterprise environments (e.g., audit logs, data =
warehousing).
>>>>>>> In large-scale deployments, enforcing PKs on every single table =
isn't
>>>>>>> always practical.
>>>>>>>=20
>>>>>>=20
>>>>>> I'm not saying users shouldn't create tables without a primary =
key. I'm arguing
>>>>>> that this decision should take into account what adjustments need =
to be made to
>>>>>> use these tables in logical replication.
>>>>>>=20
>>>>>>>=20
>>>>>>> I think the goal of this proposal is not to change the =
underlying table
>>>>>>> property design, but rather to seek a mechanism (like a =
Publication
>>>>>>> option) to ensure this automation functions safely without =
external
>>>>>>> intervention. It is simply about allowing the database to handle =
these
>>>>>>> valid, common scenarios gracefully when automation is enabled.
>>>>>>>=20
>>>>>>=20
>>>>>> You didn't get it. You already have one property to handle it and =
you are
>>>>>> proposing to add a second property to handle it.
>>>>>>=20
>>>>>> I think you are pursuing the wrong solution. IMO we need a =
solution to enforce
>>>>>> that the logical replication contract is valid. If you create or =
modify a table
>>>>>> that is part of a publication, there is no validation that that =
table complies
>>>>>> with the publication properties (update and delete properties =
should require an
>>>>>> appropriate replica identity). We should close the gaps in both =
publication and
>>>>>> table.
>>>>>>=20
>>>>=20
>>>> If we want, we can ensure that any table added to that specific
>>>> publication (that has an option replica_identy=3D'full') would
>>>> automatically override the default to FULL, if PK is not available.
>>>> This information can be cached to avoid overhead.
>>>>=20
>>>>>=20
>>>>> If I summarize Euler=E2=80=99s position in short words: discipline =
over convenience. I actually strongly agree with that. In PG we =
generally prefer explicit over implicit behavior, and predictability =
over magic.
>>>>>=20
>>>>=20
>>>> You haven't told why we can't consider a custom event trigger as
>>>> suggested by Euler for customers who are not willing to change the =
RI
>>>> default explicitly for each table. I think it is worth considering
>>>> providing a custom solution outside core-postgres for your =
customers
>>>> for this specific case.
>>>=20
>>> Thanks for raising this. Let me clarify why we don=E2=80=99t =
consider a custom event trigger a satisfactory solution in practice, =
even though it is technically possible.
>>>=20
>>> I discussed this with our field teams, and some customers have =
indeed experimented with event-trigger-based solutions before. However, =
they generally don=E2=80=99t prefer them for this use case.
>>>=20
>>> First, the required logic is non-trivial and fragile. The trigger =
would need to track table creation, primary key creation and removal, =
and distinguish between cases where REPLICA IDENTITY FULL was set =
implicitly versus explicitly by the user. Handling all these cases =
correctly makes the solution feel like a workaround rather than a robust =
enforcement mechanism.
>>>=20
>>> Second, event triggers introduce operational risk. They need to be =
installed, monitored, and maintained separately from the core system. If =
a trigger is accidentally dropped, disabled, or modified, the behavior =
silently changes, which is particularly risky for replication semantics.
>>>=20
>>> Third, customers place much higher trust in core PostgreSQL behavior =
than in custom scripts layered on top. Issues caused by core behavior =
are seen as something that can be understood, worked around, or fixed by =
upgrading, whereas failures caused by custom triggers are harder to =
diagnose and are often attributed to the overall solution quality.
>>>=20
>>> For these reasons, while event triggers can work as a stopgap, our =
customers strongly prefer a solution where the replication contract is =
enforced by core PostgreSQL rather than external mechanisms.
>>>=20
>>>>=20
>>>>> Based on the discussion so far, I think we share the following =
design goals:
>>>>>=20
>>>>> 1) Keep replica identity as a table property.
>>>>> 2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls =
in tables without primary keys.
>>>>> 3) Avoid global or implicit behavior changes.
>>>>> 4) Preserve explicit opt-in for higher WAL cost.
>>>>> 5) Keep the logical replication contract explicit and enforceable.
>>>>>=20
>>>>> I=E2=80=99ve been thinking about whether adding a new replica =
identity could meet these goals.
>>>>>=20
>>>>> Today we have four replica identities: DEFAULT (PK, fallback to =
NONE), INDEX, FULL, and NONE.
>>>>>=20
>>>>> What if we introduce a new replica identity, tentatively called =
=E2=80=9CFORCE=E2=80=9D: PK with fallback to FULL. (Let=E2=80=99s keep =
our focus on the design, not argue the name for now.)
>>>>>=20
>>>>> With this approach:
>>>>>=20
>>>>> 1) Replica identity remains a table property.
>>>>> 2) Publication membership is still evaluated at runtime, so FOR =
TABLES IN SCHEMA is not special-cased.
>>>>> 3) No new GUCs are required.
>>>>> 4) The user must explicitly opt in by setting the replica =
identity. Once FORCE is chosen, adding or dropping a primary key later =
does not silently break UPDATE/DELETE replication.
>>>>>=20
>>>>> 5) The logical replication contract remains explicit; the table =
declares that it is safe for UPDATE/DELETE replication even without a =
PK, at the cost of higher WAL volume.
>>>>>=20
>>>>> This feels like a small, explicit extension of the existing RI =
semantics. Notably, REPLICA IDENTITY DEFAULT already has conditional =
behavior (PK fallback
>>>>> to NONE), so conditional RI behavior is not new, this would just =
make a different fallback explicit and user-chosen.
>>>>>=20
>>>>> After that, we could consider a database-level =
default_replica_identity setting, applied at table creation time, for =
environments that want this behavior consistently. But that would only =
make sense if we first agree on the table-level mechanism.
>>>>>=20
>>>>=20
>>>> I don't much like the database-level option as it expects a new
>>>> default to be introduced. I think the internal working will almost =
be
>>>> same as the option at publication-level.
>>>=20
>>> That=E2=80=99s fair. I agree that a database-level option wouldn=E2=80=
=99t be fundamentally different from a publication-level solution and =
would likely share most of the same internal mechanics.
>>>=20
>>> At this point nothing is decided yet; we=E2=80=99re still exploring =
different approaches and trying to understand the trade-offs.
>>>=20
>>> I have a question to better understand how a publication-level =
approach would behave in edge cases.
>>>=20
>>> Since replica identity is defined on tables and a table can belong =
to multiple publications, how should UPDATE/DELETE be handled if the =
same table is added to two publications with different expectations?
>>>=20
>>> For example, suppose a table without a PK is added to:
>>> - pub_a, which does not require FULL (or effectively falls back to =
NONE)
>>> - pub_b, which requires FULL for UPDATE/DELETE
>>>=20
>>> In this case, should UPDATE/DELETE on the table be allowed at all, =
and if so, based on which publication=E2=80=99s semantics? What do you =
think?
>>>=20
>>> Best regards,
>>> --
>>> Chao Li (Evan)
>>> HighGo Software Co., Ltd.
>>> https://www.highgo.com/
>>=20
>>=20
>> Hi Amit,
>>=20
>> Following your suggestion, I implemented a PoC that adds a new =
publication parameter (tentatively named fallbackfull) to make the =
DEFAULT =E2=86=92 FULL fallback behavior per-publication. I=E2=80=99m =
not attached to the parameter name =E2=80=94 if we decide to go with the =
publication approach, I=E2=80=99m happy to adjust naming based on =
feedback.
>>=20
>> After playing with this implementation for a couple of days, I ran =
into a few concerns:
>>=20
>> 1. Protocol extension required
>>=20
>> If the DEFAULT =E2=86=92 FULL fallback is triggered, the subscriber =
needs to know whether the corresponding publication has fallbackfull =
enabled in order to decide how to apply UPDATE/DELETE. That means we=E2=80=
=99d need to extend the logical replication protocol, e.g., by adding a =
new field to the RELATION message to carry the fallbackfull flag.
>=20
> If the DEFAULT =E2=86=92 FULL fallback is triggered, can we send =
'full'  to
> the subscriber? I think subscriber need not to know if FULl was set by
> user or is FORCED and the protocol need not be extended.
>=20
>>=20
>> 2. Impact on decoding plugins
>>=20
>> Decoding plugins would need to understand this new flag. In my PoC, I =
updated pgoutput, but there may be third-party plugins that would also =
need changes. That feels like a compatibility risk.
>=20
> If we make the change described in pt1, I think the protocol and
> subscriber sides will not need to change, but pgoutput would still
> need to be updated when sending the RELATION metadata. Specifically,
> instead of using the table=E2=80=99s configured replica identity =
directly, it
> should use the effective replica identity.
>=20
> pq_sendbyte(out, rel->rd_rel->relreplident -->effect _RI);
>=20
> So IIUC, third-party plugins may also need such change.
>=20
>>=20
>> 3. Potential data-integrity issues
>>=20
>> This is the most concerning part to me.
>>=20
>> Consider a table t1 with REPLICA IDENTITY DEFAULT and no primary key, =
included in publication p1. By design, UPDATE/DELETE on t1 are not =
allowed.
>>=20
>> However, a user could work around this by creating a dummy =
publication, adding t1 to it, and setting fallbackfull =3D true on that =
publication. This would effectively enable UPDATE/DELETE on t1.
>>=20
>> Later, if the owner of p1 decides to enable fallbackfull on p1 to =
replicate t1, the subscriber of p1 may already be out of sync due to the =
earlier updates/deletes performed via the dummy publication. At that =
point, subsequent UPDATE/DELETE replication may fail or behave =
incorrectly.
>>=20
>> =46rom this perspective, allowing fallbackfull at the publication =
level seems to open the door to cross-publication interference and data =
divergence.
>>=20
>> Given these concerns, I=E2=80=99m leaning toward keeping fallbackfull =
as a per-table option rather than a per-publication one. Curious to hear =
your thoughts.
>>=20
>=20
> I see your point. One possible approach could be that when any
> publication is altered to enable the fallback option, the effective
> replica identity for the affected table is promoted to FULL across all
> publications.
>=20
> i.e. for t1:
> If RI_DEFAULT + no PK + pub->fallbackfull=3Dtrue;
> effective_replica_identity for t1 =3D FULL (across all pubs)
>=20
> But then this brought us to the point that if we are deciding
> effective_replica_identity for a table irrespective of publication,
> then why not to make it table property?
>=20

This is true, but going back to the motivation for this feature, users =
do not want just another table-level property, because they can already =
do ALTER TABLE t REPLICA IDENTITY FULL today. A table-level property =
would therefore not help much. What users want from this feature is a =
mechanism that allows new tables to fall back from DEFAULT to FULL =
automatically.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/