Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCUko-0022Vy-0S for pgsql-hackers@arkaria.postgresql.org; Tue, 14 Apr 2026 03:47:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wCUkm-009gH4-1V for pgsql-hackers@arkaria.postgresql.org; Tue, 14 Apr 2026 03:47:57 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCUkm-009gGv-0A for pgsql-hackers@lists.postgresql.org; Tue, 14 Apr 2026 03:47:57 +0000 Received: from mail-pj1-x102a.google.com ([2607:f8b0:4864:20::102a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wCUkk-00000000uUt-1G44 for pgsql-hackers@lists.postgresql.org; Tue, 14 Apr 2026 03:47:56 +0000 Received: by mail-pj1-x102a.google.com with SMTP id 98e67ed59e1d1-35fb0bb27e7so1236802a91.1 for ; Mon, 13 Apr 2026 20:47:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776138474; x=1776743274; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JMJOEg/ayGhaMcLHgYGzQ0zdpke1KKiPjIsnut2z1ws=; b=owlnD1oHwppXFrExaMt/gvwBcsnSaPWXq132lteurcfbawvAR4je8Cj04PhL0OAGMJ bm6M4tU6PY6ogiw0GQGzp24FA0wbEG4JbKwDuGSMQe4K594eE5ily11eyl67zukzLjTm H+eNpkI3cMUrm6Q0pIeZcnWJrLnSqrgfjRynNn2vmbplhB1PnjMi9IqBYd1POhWMjDx8 Y/7uP9ymVc8polAwr54WPFwqBDS8mu2p3S5WNmtRyXMAV0ZUaxfALE+mzVMyrS4nPod7 zYyBFoINE19QINuOT1cEW0Tn5q656T8ZSf6dYXxLLdr87U8IYvZuIx4mAUDcsd+mEEt8 6jEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776138474; x=1776743274; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=JMJOEg/ayGhaMcLHgYGzQ0zdpke1KKiPjIsnut2z1ws=; b=HTDKntSwO6zJp6adJfDFBrzmfo+iA8Plf46gDFVMIgPkkoe3p7R/k2a73p1O0wYQOn B1kLJaj8V+YL33wTfl6fbTT84nWBGZZaeIoVfFRT0xI0Wo+qA7QGc6LdgcTo6IMd5qsH 6JozQBZrtCS76woLFfX49lgp1oh1TlgUWhykvubnrQFgdnJN55L08me9Xd7KaSq3rOt8 KRSWFdaCwZplYujzVfPIr2i8VL5wrMyGt+NPzLZC+0kU37Vc1xQFnoW3S+2tFC6BdFHe c0fVVSj9I7nMp+PTHh694qhYiGwt9tfHK1/3qT+be+Smfd8LRHQ4y7XwErpRK+Pus+q3 XZwg== X-Forwarded-Encrypted: i=1; AFNElJ+2I8Gpi4FVjSw/9ZwoHOxuZV1/Oaz0E16VZdYx/EsHzW7pa7Vxzb8r7atZqVU38xUVe5WRt4MjK/iXI9TC@lists.postgresql.org X-Gm-Message-State: AOJu0Yxc2dpESG/uNj/QLuDmnIglDmjDjs0IE54FOQhrwpWG+JGWpJLv ZzVFLKA5gbHiLJ+5YMKVhSgr86TsDY+G2wHG9RkmO1JwESTHaZhcYV3J X-Gm-Gg: AeBDieuFq+QoW6PTMdp87n8hmzUxvFSmx8eDba/NupP2GPr2WVfjO6VKEifUrMvKcvg BpvnaQDlBrhNHIM01GfNZ6bXhAVxV1DZi3RXlrv0RKha7sHoLRjqKKLBVi+Zt5mkj62oIaRIGDQ hO3X96C4wcVYRtNkk8OTLKcydh47ksWkKqpFVDvBOen0Q1FNyEvl2piu2YS2qZ1HGQ6OdN8WPO4 AGYHQDISwPpbUxJWJNYzcHc4pw7P8dWci2N3AgvXzaJHOc+RlxUxbu7JkslrJM8OzpV+lcyi/s6 yWnrqvj22e3FYyCM+tEpM+RZywPWaJqZp1NgqDBpJGtTOZkmJIj6dxqfoEi3srxAb+06HwFl/bA Bp4EUWoD29YnuHrGDDdUJMzl9+6lzNoUsryblP+UJhPYaGH2e9G/fLuKE+nUP8xXgpvySa3S4uR R1URy/e9RBrBsXw/B+J2f5E9Ch/ESrzmo= X-Received: by 2002:a17:90b:4a04:b0:35e:3afb:a3ff with SMTP id 98e67ed59e1d1-35e42541708mr15901640a91.3.1776138473744; Mon, 13 Apr 2026 20:47:53 -0700 (PDT) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-35fc6cc689esm608172a91.0.2026.04.13.20.47.50 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Apr 2026 20:47:53 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: Improve logical replication usability when tables lack primary keys From: Chao Li In-Reply-To: Date: Tue, 14 Apr 2026 11:47:12 +0800 Cc: Euler Taveira , GRANT ZHOU , "houzj.fnst@fujitsu.com" , Amit Kapila , Dilip Kumar , Postgres hackers Content-Transfer-Encoding: quoted-printable Message-Id: <6BD39478-8F32-48BD-9595-678E5B4BEE8C@gmail.com> References: <5ABD7727-CD22-4112-A186-0E788EE78109@gmail.com> <23A24BFF-18A7-4FE9-AAFA-13E1AA207DD0@gmail.com> <19dac243-1f46-4720-bdec-bf0f851d03b9@app.fastmail.com> <875BBCC0-CF08-4136-8E9E-F03DF75C3A11@gmail.com> To: shveta malik X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Apr 8, 2026, at 18:22, shveta malik wrote: >=20 > On Fri, Dec 19, 2025 at 1:39=E2=80=AFPM Chao Li = wrote: >>=20 >>=20 >>=20 >>> On Dec 18, 2025, at 22:49, Euler Taveira wrote: >>>=20 >>> On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote: >>>> On Wed, Dec 17, 2025 at 12:50=E2=80=AFPM Euler Taveira = wrote: >>>>> Each table needs to say what's its row identifier. The user = created a table >>>>> without primary key. Well, create a primary key. There are dozens = of thousands >>>>> of objects. Use a script. >>>> However, I=E2=80=99d like to share a user perspective regarding the = "use a >>>> script" approach. The main value of `FOR TABLES IN SCHEMA` is >>>> *in-database automation*. If users still need to maintain external >>>> scripts to monitor and `ALTER` new tables to prevent replication >>>> errors, it significantly diminishes the value of that automation. >>>>=20 >>>=20 >>> As I tried to explain in the previous email, the problem with FOR = ALL TABLES >>> and FOR TABLES IN SCHEMA syntax is that the is no catalog = information about the >>> relations; the list of relations is collected at runtime. >>>=20 >>> When I suggested "use a script" I was referring to fix the logical = replication >>> setup regarding the lack of primary key. There is no need to have an = automation >>> outside the database, use an event trigger. If your lazy user = doesn't create >>> the primary key, assign REPLICA IDENTITY FULL. Something like >>>=20 >>> -- This example is far from being a complete solution for fixing the = lack of >>> -- primary key in a logical replication scenario. >>> -- ALTER TABLE should be supported too >>> CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity() >>> RETURNS event_trigger LANGUAGE plpgsql AS $$ >>> DECLARE >>> obj record; >>> rec record; >>> ricnt integer :=3D 0; >>> BEGIN >>> FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() >>> LOOP >>> IF obj.command_tag =3D 'CREATE TABLE' THEN >>> SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid =3D = obj.objid AND indisprimary; >>> RAISE NOTICE 'ricnt: %', ricnt; >>> IF ricnt =3D 0 THEN >>> EXECUTE 'ALTER TABLE ' || obj.object_identity || ' = REPLICA IDENTITY FULL'; >>> END IF; >>> END IF; >>> END LOOP; >>> END; >>> $$; >>>=20 >>> CREATE EVENT TRIGGER event_trigger_for_replica_identity >>> ON ddl_command_end >>> EXECUTE FUNCTION event_trigger_for_replica_identity(); >>>=20 >>> CREATE TABLE event_trigger_test_1 (a int); >>> \d+ event_trigger_test_1 >>> CREATE TABLE event_trigger_test_2 (a int primary key); >>> \d+ event_trigger_test_2 >>> CREATE TABLE event_trigger_test_3 (a int, b text not null, primary = key(b)); >>> \d+ event_trigger_test_3 >>> --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT = event_trigger_test_3_pkey; >>> --\d+ event_trigger_test_3 >>>=20 >>> DROP EVENT TRIGGER event_trigger_for_replica_identity; >>> DROP FUNCTION event_trigger_for_replica_identity; >>> DROP TABLE event_trigger_test_1, event_trigger_test_2, = event_trigger_test_3; >>>=20 >>> = 8<------------------------------------------------------------------------= ----8< >>>=20 >>>> Additionally, tables without Primary Keys are valid SQL and = extremely >>>> common in enterprise environments (e.g., audit logs, data = warehousing). >>>> In large-scale deployments, enforcing PKs on every single table = isn't >>>> always practical. >>>>=20 >>>=20 >>> I'm not saying users shouldn't create tables without a primary key. = I'm arguing >>> that this decision should take into account what adjustments need to = be made to >>> use these tables in logical replication. >>>=20 >>>>=20 >>>> I think the goal of this proposal is not to change the underlying = table >>>> property design, but rather to seek a mechanism (like a Publication >>>> option) to ensure this automation functions safely without external >>>> intervention. It is simply about allowing the database to handle = these >>>> valid, common scenarios gracefully when automation is enabled. >>>>=20 >>>=20 >>> You didn't get it. You already have one property to handle it and = you are >>> proposing to add a second property to handle it. >>>=20 >>> I think you are pursuing the wrong solution. IMO we need a solution = to enforce >>> that the logical replication contract is valid. If you create or = modify a table >>> that is part of a publication, there is no validation that that = table complies >>> with the publication properties (update and delete properties should = require an >>> appropriate replica identity). We should close the gaps in both = publication and >>> table. >>>=20 >>=20 >> If I summarize Euler=E2=80=99s position in short words: discipline = over convenience. I actually strongly agree with that. In PG we = generally prefer explicit over implicit behavior, and predictability = over magic. >>=20 >> Based on the discussion so far, I think we share the following design = goals: >>=20 >> 1) Keep replica identity as a table property. >> 2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls in = tables without primary keys. >> 3) Avoid global or implicit behavior changes. >> 4) Preserve explicit opt-in for higher WAL cost. >> 5) Keep the logical replication contract explicit and enforceable. >>=20 >> I=E2=80=99ve been thinking about whether adding a new replica = identity could meet these goals. >>=20 >> Today we have four replica identities: DEFAULT (PK, fallback to = NONE), INDEX, FULL, and NONE. >>=20 >> What if we introduce a new replica identity, tentatively called = =E2=80=9CFORCE=E2=80=9D: PK with fallback to FULL. (Let=E2=80=99s keep = our focus on the design, not argue the name for now.) >>=20 >> With this approach: >>=20 >> 1) Replica identity remains a table property. >> 2) Publication membership is still evaluated at runtime, so FOR = TABLES IN SCHEMA is not special-cased. >> 3) No new GUCs are required. >> 4) The user must explicitly opt in by setting the replica identity. = Once FORCE is chosen, adding or dropping a primary key later does not = silently break UPDATE/DELETE replication. >> 5) The logical replication contract remains explicit; the table = declares that it is safe for UPDATE/DELETE replication even without a = PK, at the cost of higher WAL volume. >>=20 >> This feels like a small, explicit extension of the existing RI = semantics. Notably, REPLICA IDENTITY DEFAULT already has conditional = behavior (PK fallback >> to NONE), so conditional RI behavior is not new, this would just make = a different fallback explicit and user-chosen. >=20 >=20 Hi Shveta, Thanks for your feedback. > Even though I like the idea of keeping replica identity as a table > property, this raises a question about the need for a new FORCE > option. With this approach, the user must explicitly opt REPLICA > IDENTITY FORCE for tables without a primary key (pt 4 in your design). > But if the user is already expected to take an explicit action, they > could simply set REPLICA IDENTITY FULL themselves at table creation > time or via ALTER TABLE. This is the tricky part. The need for this feature comes from some = complicated operational models in practice. Table creation is often = handled by an application team, which focuses on data logic and usually = does not care about replication details. Data replication, on the other = hand, is maintained by a different team, which does not care what data = the tables contain, but only needs to ensure that all data is replicated = correctly. In that kind of setup, we cannot expect REPLICA IDENTITY to = be configured properly at table creation time. Such operational models also imply that the database operations team is = usually the central database team, with superuser privileges, while an = application typically has privileges only on the tables it owns. The = central team can set configurations at the database or cluster level. In = most cases, what they want is to replicate all tables by default, except = for a small number of tables that are explicitly excluded. Actually, this feature request is coming from the replication team. = Today, they have to manually identify newly created tables, usually by = running some script, and then configure REPLICA IDENTITY for them. That = takes effort and is error-prone. So what they want is a DEFAULT-to-FULL = fallback when a table has no primary key. That=E2=80=99s why my initial proposal was to add a GUC to control the = fallback behavior, which would be convenient for the central database = team. > =46rom what I understood, the original > requirement was to ensure that tables without a primary key do not > silently fail replication, and that this is handled automatically > without manual intervention. In contrast, the FORCE approach still > requires users to explicitly configure replica identity for each > table, which seems to reintroduce the same operational burden. >=20 > For example, instead of: > ALTER TABLE t1 REPLICA IDENTITY FORCE; > the user could simply do: > ALTER TABLE t1 REPLICA IDENTITY FULL >=20 > By the time, the user is explicitly configuring replica identity, they > would already need to understand the replication semantics, so it's > not clear what additional benefit FORCE provides over FULL. Am I > missing something here? >=20 > thanks > Shveta -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/