Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPGfI-000bgo-1v for pgsql-hackers@arkaria.postgresql.org; Tue, 19 May 2026 09:23:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPGfG-0043R0-1o for pgsql-hackers@arkaria.postgresql.org; Tue, 19 May 2026 09:23:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPGfG-0043Qs-0b for pgsql-hackers@lists.postgresql.org; Tue, 19 May 2026 09:23:03 +0000 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wPGfE-00000000LNP-4A7v for pgsql-hackers@lists.postgresql.org; Tue, 19 May 2026 09:23:02 +0000 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-368f25ff4c4so1590317a91.2 for ; Tue, 19 May 2026 02:23:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779182578; cv=none; d=google.com; s=arc-20240605; b=AWORIOVm2ziiv6UHCMB8zJX3TQ1QW+GBgjSD6wZVhRjtbD/bCVDy17zOuMyM4gBgU1 rsXMk/s/e/APIiBvkRjgg3f/YvjfKOAP/bhPIWeERj1YESh5reVkeKuMqJS7QPaDL0Ob c9r3mCdGJFUn5sv6mGTSa0Q9d4ofYhgtPqBSqCEQtPD7gLXSqPXoEpcbcWKowcgD36mc v/3ZPUSMSEHMWRPd8U0FSqGsI4HViJSTc2xFS8Fav/qssQspHg2kDjC9fe0T8RMUoUy7 u/79Dcx7usPk0lpWVTBhMe8dm9lvfesoll34ct4TPk4l+ZB5c/FaBQ+56DrHw/Sg9IzK Uv1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=flZ3sJWwjRyF97UmfZAuQLwMzVeKGMpLTJZAiUaP9po=; fh=QotKQ8x1WM9taE72VAQbb1TK0dq8Po+S+5YOXT36cno=; b=JlKqW3mnEzrxQUiQqNSP6Hb4r+twjEKXmLuObDYtfgP4vzlk0owVufaSHbNHag17IW jAc1LG9LZGvtGF967RvKcNt6GL8ajIht+cq3yG58ma5+J2KBD6Z7XvkSAx+0X9qoQZYB wxyYWUmmMsivBhyrykbnH8TkM8uIWLaEamhSK/Y98KTOl4QB2cr8rev7v3SEPoG4FIU7 MCxFibBn2jMtQOZpSfWNIA6PbeH8LjPh69tEw+60hpOrnYNV2jeINfe7jXB00knF4aov VTTnZ4VGlrLgWR7/Zciq5exewsxs+AgeUckUOZFRBqWJ43NZKvhlM/axeVdcXesP8bu8 iz4Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779182578; x=1779787378; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=flZ3sJWwjRyF97UmfZAuQLwMzVeKGMpLTJZAiUaP9po=; b=AvBAWRckgXYI3FMbdNpzbVNvNxCd42amg7nOnGcZY9gtlHAlq5X+kPgklJ4kPa1w22 nYAybrG8PO36oaqHV7MFnPs39dbcGMLhvoTXl1i/5N1bTDsUku/8kpiSw/2KdivwHyTO EAm4qE5AWsIsgy6DM7Ycv8EBXAKWejn4Ib8bl3hHVLzPst33uwYppTpgNd7+bWpGaF4N XbvXu8XGxqhotsp/bVzxry+2cgx2Ctff/+IBlj2bHtN0LaSZSWXcJ//CBKHxm9uHIVmn EzYffS3medBOOSWcGa7BTzGm5FSMpTqZr5NTI5imbJHvrNSe2wJmPTENM96ndttaOhAT VBfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779182578; x=1779787378; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=flZ3sJWwjRyF97UmfZAuQLwMzVeKGMpLTJZAiUaP9po=; b=HVyFt5FHy4OUkY4/yb5T8O5BoHkWK8v66fYCd/xQ2JU5MNdALeuhRGdJFF8UrAMbNs fmfOV6mRzyj6un5180HvZDmh7jWh5zINsYNN7UYSjlsG9+8BQR62+kIJTLAwfbEUj0HH /9Y5zcONqvLTzFyPn5BiWfqkyvtCn8PrlI+63O6kl9OSaZzxiKh4qY2j8WrQEPvrMy7X eG0msGAnG40cGx0BGyJjmdl9dZmCVOf0Q+pXIArg40fAHsYWKOfvV7RMXcnR/vMYPyqV 1+duKgPhhF9vHFEJ3Bw5iVzCunDFwylNAZywcyh6Yi+fqVwTzOJj95SutlLxAmAbT//O CEzQ== X-Gm-Message-State: AOJu0YyE08PBLi8R3n65SXltmhN4zWy0Fi4oT1BJZVmgGmFikKahbdlK /AWhZcSWVEwe2Iawd1vafQVTqxRTPkI7HI0P5oJ6rQZcjL994fcHIeSN4piL/ApTpA3pSnbXvqP KlcjB/8Z5wLxjgCKe8l/DS+b8AspcLF4= X-Gm-Gg: Acq92OHXEa/0YV3E3OJco32bRF5y7Q2L+StEPzZHFSlgvETI0t8WAdoFRlVj6V0yJCK hHUdIbZ/Jdr+OMjBrKEYYENFn6jMBmZ3SIaDvtH3C8Ulsju7zI47EH2W6RwY5BIdx3glpmABkC2 eGtFQZdBMsVXv2qF4IA/v9LK2DJbG12qBGpvgw3IOHeHLosEm6a1Ku1CgPlrLBpxZldsGjQRuxG o5i8Ke9NtPtgoAKXDN+7aaqPgKDg/L31t/6YGrQtRO3IlNjEJX+7/p/DziJjr3vYTp4a9WrB3Y5 TgMuPKiZvZujkUhLg8Chq5uKKUl02PknF/DFQx1AHYSs8MLosEr0mo921WwSEBCy X-Received: by 2002:a17:90b:1810:b0:366:4782:1375 with SMTP id 98e67ed59e1d1-36951ca5e48mr19385439a91.22.1779182578039; Tue, 19 May 2026 02:22:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: shveta malik Date: Tue, 19 May 2026 14:52:46 +0530 X-Gm-Features: AVHnY4Kjm58wK5aD7mfNGQPJ0p2fjcbvOk0eDqotfjmXgDJFNRTP657aLpGUAlY Message-ID: Subject: Re: Improve conflict detection when replication origins are reused To: Nisha Moond Cc: PostgreSQL Hackers , shveta malik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, May 14, 2026 at 8:35=E2=80=AFAM Nisha Moond wrote: > > Hi hackers, > > While reviewing the issue reported at [1] and the proposed solutions > at [2], I noticed a related problem: false negative conflict detection > when a 'ReplOriginId' gets reused. > > In logical replication, conflict detection relies on the tuple=E2=80=99s > replication origin ('roident'). The problem is that if a subscription > is dropped and a new subscription later reuses the same origin ID, the > apply worker may incorrectly treat incoming changes as =E2=80=9Cits own= =E2=80=9D > changes and skip conflict detection. > > A simple example: > 1. Create subscription sub1 with 'roident =3D 1' > 2. Replicate some rows into table 't1' > 3. Drop 'sub1' > 4. Create another subscription 'sub2' > 5. `sub2` reuses 'roident =3D 1' > 6. New updates arrive for rows previously written by 'sub1' > At this point, conflict detection sees: > tuple_origin =3D=3D current_origin > > and incorrectly assumes the row was written by the current > subscription instance, so no 'update_origin_differ' conflict is > raised. > > This may look harmless in this simple setup, but it becomes > problematic if the new subscription is connected to a different > publisher, because real conflicts can then be silently missed. > > I explored two possible approaches to solve this: > > Approach 1. Zero out old origin IDs in commit_ts data when dropping a > subscription > ---------------------- > - When a subscription is dropped and its replication origin becomes > free, scan all 'commit_ts' SLRU entries and replace that old origin ID > with 'InvalidRepOriginId (0)'. > - So rows previously written by the old subscription would no longer > appear to belong to any active replication origin. > - A new subscription reusing the same 'roident' will always conflict > with origin '0'. > > Pros: > - Fixes the stale-origin problem completely and may also help solve > the tablesync-origin issue discussed in [1] > - No additional checks needed during conflict detection > > Cons: > - Requires scanning the entire 'commit_ts' SLRU during DROP > SUBSCRIPTION, so it can become very expensive on large systems > - Not crash-safe currently(patch): > - if the server crashes midway, some entries may still contain the > old origin ID > - after restart, reused origins can again lead to missed conflicts > - Making this fully crash-safe would likely require WAL logging or > recovery-time reprocessing. > > Approach 2. Store replication origin creation time > ---------------------- > - Add a creation timestamp for each replication origin > - During conflict check: > if tuple_origin !=3D current_origin > -> existing behavior > if tuple_origin =3D=3D current_origin > -> compare tuple commit timestamp with origin creation time > if tuple_commit_ts <=3D origin_creation_time > -> treat as an origin reuse case and raise conflict > > Pros: > ------- > - No additional processing during DROP SUBSCRIPTION > - Lightweight runtime check (just one timestamp comparison) > - Naturally crash-safe since origin creation is WAL-logged already > > Cons: > - Requires a catalog schema change > - The <=3D comparison can produce false-positive conflicts for rows > committed at the exact same microsecond as origin creation > - May require additional handling for upgraded origins > > IMO, the second approach currently looks more practical because it > avoids the heavy SLRU scan and crash-recovery complexity. > I find Approach 2 the most practical. I explored other ideas but none seem completely reliable or worth the effort to justify this use-case. A few ideas I considered are: 1) We could modify replorigin_create to exhaust the full range of IDs sequentially before reusing them. But this is not a reliable solution. It would make the bug much harder to hit, but a busy system could still eventually exhaust the 2-byte limit of 65K IDs, after which the problem may reappear. 2) Using LSN Matching instead of timestamp. To completely eliminate the edge case where a timestamp results in a false-positive case, we could track the origin_creation_lsn and compare it against the tuple's commit LSN. IIUC, it would require extending commit_ts to include 8-byte of commit-lsn which might not be a good idea. So this idea may also not be desirable unless there is an existing way to extract commit-lsn (which I am not aware of) without extending the commit-ts structure? thanks Shveta