public inbox for [email protected]  
help / color / mirror / Atom feed
From: Zhijie Hou (Fujitsu) <[email protected]>
To: Zhijie Hou (Fujitsu) <[email protected]>
Cc: Amit Kapila <[email protected]>
Cc: vignesh C <[email protected]>
Cc: [email protected] <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Masahiko Sawada <[email protected]>
Cc: Dilip Kumar <[email protected]>
Subject: RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18
Date: Tue, 7 Apr 2026 08:13:38 +0000
Message-ID: <TYRPR01MB14195A04472A71EB78F35E42B945AA@TYRPR01MB14195.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <TYRPR01MB1419594753654605E54773C16945EA@TYRPR01MB14195.jpnprd01.prod.outlook.com>
References: <[email protected]>
	<CALDaNm3Y6Y4Mub6QC8fZKnNy5jZspELQYCoQF_FL2Zwzweu=og@mail.gmail.com>
	<CAA4eK1LxGXR7jOAKh0B8N362S-Q3b6GhBxxcV_HxUaicEPq5Cg@mail.gmail.com>
	<CAD21AoDUKQHyy07gTrwsxHTwXAURYnzUYAsf6PxHHv2x1UdFog@mail.gmail.com>
	<CAFiTN-vcp7mVT7=rvTpf1uqEQ+rxzDoHd+eJu7u541X9ivG9zQ@mail.gmail.com>
	<CAD21AoCRReJHoLkQBMZztjO7C3Cste9w-PS_SG7VtBW1c3cR9w@mail.gmail.com>
	<TYRPR01MB1419594753654605E54773C16945EA@TYRPR01MB14195.jpnprd01.prod.outlook.com>

On Friday, April 3, 2026 3:24 PM Zhijie Hou (Fujitsu) <[email protected]> wrote:
> On Saturday, January 10, 2026 8:57 AM Masahiko Sawada
> <[email protected]> wrote:
> >
> > On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <[email protected]>
> wrote:
> > >
> > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada
> > <[email protected]> wrote:
> > > > Can we somehow
> > > > share the apply worker's origin with tablesync workers so that
> > > > they can refer to the same origin ID? Or can we invent special
> > > > origin IDs (e.g., > 0x00FF) that are the same as the normal origin
> > > > ID except for being ignored by the conflict detection system?
> > >
> > > How will this distinguish between the initial sync is done from the
> > > publisher node we are getting the update vs the initial sync is done
> > > from some other node?  Can we always ignore conflict checking for
> > > initial synced data or do we just want to ignore if the  initial
> > > sync is done from the same node?
> >
> > I imagined the former idea; always ignore conflict checking, so we
> > don't need to distinguish them. IOW we treat the changes via the
> > initial tablesync as if the changes made by the normal backend process
> > (who doesn't use replication origin) while using the replication
> > tracking ability of the replication origin.
> 
> I think for changes made by backend process without setting up the origin, the
> apply worker still treat that as a conflict change when applying the remote
> changes as that's necessary to local vs. remote updates.
> 
> I personally prefer to let the tablesync worker share the apply worker's origin
> ID while keeping a separate origin for progress tracking. Currently, the worker
> first calls replorigin_session_setup() and then stores the origin ID in
> replorigin_xact_state. The natural implementation is for the tablesync worker
> to still set up its own origin for tracking, but assign the apply worker's origin ID
> to the global state. This gives us per‑tablesync progress tracking while
> ensuring that changes from both workers appear to come from the same
> origin.
> 

After further analysis, I think the approach I mentioned earlier is unsafe. When
replaying the commit record during recovery, if only the main apply origin ID is
present, we cannot recover the progress status for each tablesync origin. The
idea of using a special origin ID for all tablesync origins suffers from the
same problem, e.g., progress cannot be recovered when replaying commit WAL
records.

I have been trying to find a way to fix this issue within the proposed
approaches, but I haven't been able to come up with a better solution for now.

One attempt was to continue WAL‑logging the tablesync's own origin ID, but only
store the main origin ID in the commit timestamp module. However, this also has
a problem during recovery: it cannot identify which main origin corresponds to a
given tablesync origin recorded in the commit WAL record. (One might think we
could store this top‑level relationship in the catalog, but since catalogs are
not accessible during recovery, that approach would not work.) Consequently, we
cannot restore the same origin ID in the commit timestamp module during recovery
as was present during normal commit.

The remaining idea: storing the origin ID in pg_subscription_rel and teaching
the apply worker to skip reporting origin_differs if the origin of the update
matches the one stored in pg_subscription_rel, seems worth considering, if we
cannot find an easier solution. There was a concern about performance, but since
we could cache those tablesync origins in a local hash table and consult it
during conflict detection, the performance impact might not be significant.

That said, I may have missed some points. I will continue to think about this
and try to update the patch later.

Best Regards,
Hou zj


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18
  In-Reply-To: <TYRPR01MB14195A04472A71EB78F35E42B945AA@TYRPR01MB14195.jpnprd01.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox