public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: Euler Taveira <[email protected]>
Cc: GRANT ZHOU <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: Dilip Kumar <[email protected]>
Cc: Postgres hackers <[email protected]>
Subject: Re: Improve logical replication usability when tables lack primary keys
Date: Fri, 19 Dec 2025 16:08:39 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <CAEoWx2mMorbMwjKbT4YCsjDyL3r9Mp+z0bbK57VZ+OkJTgJQVQ@mail.gmail.com>
	<CAA4eK1+UL6wVDNzkpHjA7RVLD_8AkrP2tu+RvQ2h5AUjyEe+-Q@mail.gmail.com>
	<[email protected]>
	<CAFiTN-ucvk8JOiLvjii6VXar6nYJvCQDgzp8_4v55yweUmzdzw@mail.gmail.com>
	<[email protected]>
	<CAA4eK1KzjxO-qWjWSox6e6AWH4FVU5ZPEgeZ+na=eyov7umutg@mail.gmail.com>
	<[email protected]>
	<TY4PR01MB16907E1C68EA6EBADCE54ABBE94ABA@TY4PR01MB16907.jpnprd01.prod.outlook.com>
	<[email protected]>
	<CA+FXcm-YZXJpc6E7XEDTv9Yaic=U7Dwnjj4znxJ4gCxUZMcXww@mail.gmail.com>
	<[email protected]>



> On Dec 18, 2025, at 22:49, Euler Taveira <[email protected]> wrote:
> 
> On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote:
>> On Wed, Dec 17, 2025 at 12:50 PM Euler Taveira <[email protected]> wrote:
>>> Each table needs to say what's its row identifier. The user created a table
>>> without primary key. Well, create a primary key. There are dozens of thousands
>>> of objects. Use a script.
>> However, I’d like to share a user perspective regarding the "use a 
>> script" approach. The main value of `FOR TABLES IN SCHEMA` is 
>> *in-database automation*. If users still need to maintain external 
>> scripts to monitor and `ALTER` new tables to prevent replication 
>> errors, it significantly diminishes the value of that automation.
>> 
> 
> As I tried to explain in the previous email, the problem with FOR ALL TABLES
> and FOR TABLES IN SCHEMA syntax is that the is no catalog information about the
> relations; the list of relations is collected at runtime.
> 
> When I suggested "use a script" I was referring to fix the logical replication
> setup regarding the lack of primary key. There is no need to have an automation
> outside the database, use an event trigger. If your lazy user doesn't create
> the primary key, assign REPLICA IDENTITY FULL. Something like
> 
> -- This example is far from being a complete solution for fixing the lack of
> -- primary key in a logical replication scenario.
> -- ALTER TABLE should be supported too
> CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity()
>        RETURNS event_trigger LANGUAGE plpgsql AS $$
> DECLARE
>    obj record;
>    rec record;
>    ricnt integer := 0;
> BEGIN
>    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
>    LOOP
>        IF obj.command_tag = 'CREATE TABLE' THEN
>            SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid = obj.objid AND indisprimary;
>            RAISE NOTICE 'ricnt: %', ricnt;
>            IF ricnt = 0 THEN
>                EXECUTE 'ALTER TABLE ' || obj.object_identity || ' REPLICA IDENTITY FULL';
>            END IF;
>        END IF;
>    END LOOP;
> END;
> $$;
> 
> CREATE EVENT TRIGGER event_trigger_for_replica_identity
>    ON ddl_command_end
>    EXECUTE FUNCTION event_trigger_for_replica_identity();
> 
> CREATE TABLE event_trigger_test_1 (a int);
> \d+ event_trigger_test_1
> CREATE TABLE event_trigger_test_2 (a int primary key);
> \d+ event_trigger_test_2
> CREATE TABLE event_trigger_test_3 (a int, b text not null, primary key(b));
> \d+ event_trigger_test_3
> --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT event_trigger_test_3_pkey;
> --\d+ event_trigger_test_3
> 
> DROP EVENT TRIGGER event_trigger_for_replica_identity;
> DROP FUNCTION event_trigger_for_replica_identity;
> DROP TABLE event_trigger_test_1, event_trigger_test_2, event_trigger_test_3;
> 
> 8<----------------------------------------------------------------------------8<
> 
>> Additionally, tables without Primary Keys are valid SQL and extremely 
>> common in enterprise environments (e.g., audit logs, data warehousing). 
>> In large-scale deployments, enforcing PKs on every single table isn't 
>> always practical. 
>> 
> 
> I'm not saying users shouldn't create tables without a primary key. I'm arguing
> that this decision should take into account what adjustments need to be made to
> use these tables in logical replication.
> 
>> 
>> I think the goal of this proposal is not to change the underlying table 
>> property design, but rather to seek a mechanism (like a Publication 
>> option) to ensure this automation functions safely without external 
>> intervention. It is simply about allowing the database to handle these 
>> valid, common scenarios gracefully when automation is enabled.
>> 
> 
> You didn't get it. You already have one property to handle it and you are
> proposing to add a second property to handle it.
> 
> I think you are pursuing the wrong solution. IMO we need a solution to enforce
> that the logical replication contract is valid. If you create or modify a table
> that is part of a publication, there is no validation that that table complies
> with the publication properties (update and delete properties should require an
> appropriate replica identity). We should close the gaps in both publication and
> table.
> 

If I summarize Euler’s position in short words: discipline over convenience. I actually strongly agree with that. In PG we generally prefer explicit over implicit behavior, and predictability over magic.

Based on the discussion so far, I think we share the following design goals:

1) Keep replica identity as a table property.
2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls in tables without primary keys.
3) Avoid global or implicit behavior changes.
4) Preserve explicit opt-in for higher WAL cost.
5) Keep the logical replication contract explicit and enforceable.

I’ve been thinking about whether adding a new replica identity could meet these goals.

Today we have four replica identities: DEFAULT (PK, fallback to NONE), INDEX, FULL, and NONE.

What if we introduce a new replica identity, tentatively called “FORCE”: PK with fallback to FULL. (Let’s keep our focus on the design, not argue the name for now.)

With this approach:

1) Replica identity remains a table property.
2) Publication membership is still evaluated at runtime, so FOR TABLES IN SCHEMA is not special-cased.
3) No new GUCs are required.
4) The user must explicitly opt in by setting the replica identity. Once FORCE is chosen, adding or dropping a primary key later does not silently break UPDATE/DELETE replication.
5) The logical replication contract remains explicit; the table declares that it is safe for UPDATE/DELETE replication even without a PK, at the cost of higher WAL volume.

This feels like a small, explicit extension of the existing RI semantics. Notably, REPLICA IDENTITY DEFAULT already has conditional behavior (PK fallback
to NONE), so conditional RI behavior is not new, this would just make a different fallback explicit and user-chosen.

After that, we could consider a database-level default_replica_identity setting, applied at table creation time, for environments that want this behavior consistently. But that would only make sense if we first agree on the table-level mechanism.

I’m interested in whether this direction aligns better with the goals above.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









view thread (26+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Improve logical replication usability when tables lack primary keys
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox