MIME-Version: 1.0
References: 
 <CABdArM5k7wCxq+L2r9-mLOoPXe_iHgdeRMXc31oxvzSj=xPjFQ@mail.gmail.com>
In-Reply-To: 
 <CABdArM5k7wCxq+L2r9-mLOoPXe_iHgdeRMXc31oxvzSj=xPjFQ@mail.gmail.com>
From: shveta malik <shveta.malik@gmail.com>
Date: Tue, 19 May 2026 14:52:46 +0530
Message-ID: 
 <CAJpy0uD-pjcDfd2z-E-LF_newxj47zYLktDsQX_H+JTCzEk4cw@mail.gmail.com>
Subject: Re: Improve conflict detection when replication origins are reused
To: Nisha Moond <nisha.moond412@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
	shveta malik <shvetamalik@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAJpy0uD-pjcDfd2z-E-LF_newxj47zYLktDsQX_H%2BJTCzEk4cw%40mail.gmail.com>
Precedence: bulk

On Thu, May 14, 2026 at 8:35=E2=80=AFAM Nisha Moond <nisha.moond412@gmail.c=
om> wrote:
>
> Hi hackers,
>
> While reviewing the issue reported at [1] and the proposed solutions
> at [2], I noticed a related problem: false negative conflict detection
> when a 'ReplOriginId' gets reused.
>
> In logical replication, conflict detection relies on the tuple=E2=80=99s
> replication origin ('roident'). The problem is that if a subscription
> is dropped and a new subscription later reuses the same origin ID, the
> apply worker may incorrectly treat incoming changes as =E2=80=9Cits own=
=E2=80=9D
> changes and skip conflict detection.
>
> A simple example:
>   1. Create subscription sub1 with 'roident =3D 1'
>   2. Replicate some rows into table 't1'
>   3. Drop 'sub1'
>   4. Create another subscription 'sub2'
>   5. `sub2` reuses 'roident =3D 1'
>   6. New updates arrive for rows previously written by 'sub1'
>   At this point, conflict detection sees:
>       tuple_origin =3D=3D current_origin
>
> and incorrectly assumes the row was written by the current
> subscription instance, so no 'update_origin_differ' conflict is
> raised.
>
> This may look harmless in this simple setup, but it becomes
> problematic if the new subscription is connected to a different
> publisher, because real conflicts can then be silently missed.
>
> I explored two possible approaches to solve this:
>
> Approach 1. Zero out old origin IDs in commit_ts data when dropping a
> subscription
> ----------------------
>  - When a subscription is dropped and its replication origin becomes
> free, scan all 'commit_ts' SLRU entries and replace that old origin ID
> with 'InvalidRepOriginId (0)'.
>  - So rows previously written by the old subscription would no longer
> appear to belong to any active replication origin.
>  - A new subscription reusing the same 'roident' will always conflict
> with origin '0'.
>
> Pros:
>  - Fixes the stale-origin problem completely and may also help solve
> the tablesync-origin issue discussed in [1]
>  - No additional checks needed during conflict detection
>
> Cons:
>  - Requires scanning the entire 'commit_ts' SLRU during DROP
> SUBSCRIPTION, so it can become very expensive on large systems
>  - Not crash-safe currently(patch):
>     - if the server crashes midway, some entries may still contain the
> old origin ID
>     - after restart, reused origins can again lead to missed conflicts
>  - Making this fully crash-safe would likely require WAL logging or
> recovery-time reprocessing.
>
> Approach 2. Store replication origin creation time
> ----------------------
>  - Add a creation timestamp for each replication origin
>  - During conflict check:
>     if tuple_origin !=3D current_origin
>         -> existing behavior
>     if tuple_origin =3D=3D current_origin
>         -> compare tuple commit timestamp with origin creation time
>         if tuple_commit_ts <=3D origin_creation_time
>             -> treat as an origin reuse case and raise conflict
>
> Pros:
> -------
>  - No additional processing during DROP SUBSCRIPTION
>  - Lightweight runtime check (just one timestamp comparison)
>  - Naturally crash-safe since origin creation is WAL-logged already
>
> Cons:
>  - Requires a catalog schema change
>  - The <=3D comparison can produce false-positive conflicts for rows
> committed at the exact same microsecond as origin creation
>  -  May require additional handling for upgraded origins
>
> IMO, the second approach currently looks more practical because it
> avoids the heavy SLRU scan and crash-recovery complexity.
>

I find Approach 2 the most practical. I explored other ideas but none
seem completely reliable or worth the effort to justify this use-case.
A few ideas I considered are:

1) We could modify replorigin_create to exhaust the full range of IDs
sequentially before reusing them. But this is not a reliable solution.
It would make the bug much harder to hit, but a busy system could
still eventually exhaust the 2-byte limit of 65K IDs, after which the
problem may reappear.

2) Using LSN Matching instead of timestamp. To completely eliminate
the edge case where a timestamp results in a false-positive case, we
could track the origin_creation_lsn and compare it against the tuple's
commit LSN. IIUC, it would require extending commit_ts to include
8-byte of commit-lsn which might not be a good idea. So this idea may
also not be desirable unless there is an existing way to extract
commit-lsn (which I am not aware of) without extending the commit-ts
structure?

thanks
Shveta