public inbox for [email protected]help / color / mirror / Atom feed
Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 4+ messages / 3 participants [nested] [flat]
* Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 @ 2026-01-09 04:46 Dilip Kumar <[email protected]> 2026-01-10 00:56 ` Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Masahiko Sawada <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Dilip Kumar @ 2026-01-09 04:46 UTC (permalink / raw) To: Masahiko Sawada <[email protected]>; +Cc: Amit Kapila <[email protected]>; vignesh C <[email protected]>; [email protected]; [email protected] On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada <[email protected]> wrote: > > On Mon, Dec 29, 2025 at 10:55 PM Amit Kapila <[email protected]> wrote: > > > > On Mon, Dec 29, 2025 at 4:26 PM vignesh C <[email protected]> wrote: > > > > > > On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form > > > <[email protected]> wrote: > > > > > > > > > > This can occur in the following scenario: commit timestamp tracking is > > > enabled on the subscriber; the same table exists on both publisher and > > > subscriber; a publication is created on the publisher with initial > > > data; and a subscription is created on the subscriber with origin = > > > none. During the initial table synchronization, the row is inserted > > > using a tablesync replication origin, which is dropped once > > > synchronization completes. If the row is updated on the publisher > > > after the initial sync, the apply worker attempts to update a row that > > > was inserted using a different replication origin(tablesync origin), > > > resulting in an origin mismatch. > > > > > > The conflict is logged and logical replication continues normally. No > > > crash occurs, and the log entry is informational rather than > > > indicative of a failure. > > > > > > > I agree with this analysis. > > > > > These messages can be safely ignored for now. > > > > > > We are currently evaluating possible improvements to handle this > > > scenario more gracefully and to avoid reporting these conflicts in the > > > future. > > > > > > > One idea to safely ignore these LOGs is we could modify the state > > management in the catalog pg_subscription_rel to store originID. When > > a tablesync worker completes, instead of just deleting the origin and > > setting the relation state to ready, it could record the origin_id it > > used into pg_subscription_rel. When the apply worker encounters an > > origin mismatch, it checks pg_subscription_rel for that specific > > table. If the "old" origin ID matches the one recorded during the sync > > phase, the worker knows the row is "ours" and suppresses the log. Now, > > as the origin ID could be reused, we could additionally store local > > timestamp along with originId in pg_subscription_rel. Then, we can > > suppress the log if: row_origin_id == srsuboriginid AND > > row_commit_time <= srsubsynctime. > > It sounds very costly. IIUC we would need these checks for every first > update to tuples loaded via initial table sync. Can we somehow share > the apply worker's origin with tablesync workers so that they can > refer to the same origin ID? Or can we invent special origin IDs > (e.g., > 0x00FF) that are the same as the normal origin ID except for > being ignored by the conflict detection system? How will this distinguish between the initial sync is done from the publisher node we are getting the update vs the initial sync is done from some other node? Can we always ignore conflict checking for initial synced data or do we just want to ignore if the initial sync is done from the same node? -- Regards, Dilip Kumar Google ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 2026-01-09 04:46 Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Dilip Kumar <[email protected]> @ 2026-01-10 00:56 ` Masahiko Sawada <[email protected]> 2026-04-03 07:24 ` RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Zhijie Hou (Fujitsu) <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Masahiko Sawada @ 2026-01-10 00:56 UTC (permalink / raw) To: Dilip Kumar <[email protected]>; +Cc: Amit Kapila <[email protected]>; vignesh C <[email protected]>; [email protected]; [email protected] On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <[email protected]> wrote: > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada <[email protected]> wrote: > > > > On Mon, Dec 29, 2025 at 10:55 PM Amit Kapila <[email protected]> wrote: > > > > > > On Mon, Dec 29, 2025 at 4:26 PM vignesh C <[email protected]> wrote: > > > > > > > > On Mon, 22 Dec 2025 at 19:00, PG Bug reporting form > > > > <[email protected]> wrote: > > > > > > > > > > > > > This can occur in the following scenario: commit timestamp tracking is > > > > enabled on the subscriber; the same table exists on both publisher and > > > > subscriber; a publication is created on the publisher with initial > > > > data; and a subscription is created on the subscriber with origin = > > > > none. During the initial table synchronization, the row is inserted > > > > using a tablesync replication origin, which is dropped once > > > > synchronization completes. If the row is updated on the publisher > > > > after the initial sync, the apply worker attempts to update a row that > > > > was inserted using a different replication origin(tablesync origin), > > > > resulting in an origin mismatch. > > > > > > > > The conflict is logged and logical replication continues normally. No > > > > crash occurs, and the log entry is informational rather than > > > > indicative of a failure. > > > > > > > > > > I agree with this analysis. > > > > > > > These messages can be safely ignored for now. > > > > > > > > We are currently evaluating possible improvements to handle this > > > > scenario more gracefully and to avoid reporting these conflicts in the > > > > future. > > > > > > > > > > One idea to safely ignore these LOGs is we could modify the state > > > management in the catalog pg_subscription_rel to store originID. When > > > a tablesync worker completes, instead of just deleting the origin and > > > setting the relation state to ready, it could record the origin_id it > > > used into pg_subscription_rel. When the apply worker encounters an > > > origin mismatch, it checks pg_subscription_rel for that specific > > > table. If the "old" origin ID matches the one recorded during the sync > > > phase, the worker knows the row is "ours" and suppresses the log. Now, > > > as the origin ID could be reused, we could additionally store local > > > timestamp along with originId in pg_subscription_rel. Then, we can > > > suppress the log if: row_origin_id == srsuboriginid AND > > > row_commit_time <= srsubsynctime. > > > > It sounds very costly. IIUC we would need these checks for every first > > update to tuples loaded via initial table sync. Can we somehow share > > the apply worker's origin with tablesync workers so that they can > > refer to the same origin ID? Or can we invent special origin IDs > > (e.g., > 0x00FF) that are the same as the normal origin ID except for > > being ignored by the conflict detection system? > > How will this distinguish between the initial sync is done from the > publisher node we are getting the update vs the initial sync is done > from some other node? Can we always ignore conflict checking for > initial synced data or do we just want to ignore if the initial sync > is done from the same node? I imagined the former idea; always ignore conflict checking, so we don't need to distinguish them. IOW we treat the changes via the initial tablesync as if the changes made by the normal backend process (who doesn't use replication origin) while using the replication tracking ability of the replication origin. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com ^ permalink raw reply [nested|flat] 4+ messages in thread
* RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 2026-01-09 04:46 Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Dilip Kumar <[email protected]> 2026-01-10 00:56 ` Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Masahiko Sawada <[email protected]> @ 2026-04-03 07:24 ` Zhijie Hou (Fujitsu) <[email protected]> 2026-04-07 08:13 ` RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Zhijie Hou (Fujitsu) <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Zhijie Hou (Fujitsu) @ 2026-04-03 07:24 UTC (permalink / raw) To: Masahiko Sawada <[email protected]>; Dilip Kumar <[email protected]>; +Cc: Amit Kapila <[email protected]>; vignesh C <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]> On Saturday, January 10, 2026 8:57 AM Masahiko Sawada <[email protected]> wrote: > > On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <[email protected]> wrote: > > > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada > <[email protected]> wrote: > > > Can we somehow > > > share the apply worker's origin with tablesync workers so that they > > > can refer to the same origin ID? Or can we invent special origin IDs > > > (e.g., > 0x00FF) that are the same as the normal origin ID except > > > for being ignored by the conflict detection system? > > > > How will this distinguish between the initial sync is done from the > > publisher node we are getting the update vs the initial sync is done > > from some other node? Can we always ignore conflict checking for > > initial synced data or do we just want to ignore if the initial sync > > is done from the same node? > > I imagined the former idea; always ignore conflict checking, so we don't need > to distinguish them. IOW we treat the changes via the initial tablesync as if > the changes made by the normal backend process (who doesn't use > replication origin) while using the replication tracking ability of the replication > origin. I think for changes made by backend process without setting up the origin, the apply worker still treat that as a conflict change when applying the remote changes as that's necessary to local vs. remote updates. I personally prefer to let the tablesync worker share the apply worker's origin ID while keeping a separate origin for progress tracking. Currently, the worker first calls replorigin_session_setup() and then stores the origin ID in replorigin_xact_state. The natural implementation is for the tablesync worker to still set up its own origin for tracking, but assign the apply worker's origin ID to the global state. This gives us per‑tablesync progress tracking while ensuring that changes from both workers appear to come from the same origin. A small patch demonstrating this approach is attached. Best Regards, Hou zj Attachments: [application/octet-stream] v1-0001-write-tablesync-changes-with-the-subscription-ori.patch (4.7K, 2-v1-0001-write-tablesync-changes-with-the-subscription-ori.patch) download | inline diff: From 3047c7473df8f3be43859c1bec74c99fba80ecf0 Mon Sep 17 00:00:00 2001 From: Zhijie Hou <[email protected]> Date: Fri, 3 Apr 2026 13:44:40 +0800 Subject: [PATCH v1] write tablesync changes with the subscription origin ID During initial table synchronization, tablesync workers were writing tuples with the per-table tablesync origin identity. Later, when the leader apply worker processed UPDATE/DELETE for those tuples, conflict checks could see an origin mismatch and report benign update_origin_differs or delete_origin_differs noise for changes from the same subscription. Fix this by keeping the tablesync origin for per-table sync progress tracking and resume but stamp tuple writes with the subscription-level apply origin identity. This ensures conflict detection sees tablesync and apply changes as coming from the same subscription. --- src/backend/replication/logical/tablesync.c | 27 ++++++++++++++++++--- src/test/subscription/t/029_on_error.pl | 9 +++++++ 2 files changed, 33 insertions(+), 3 deletions(-) diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index f49a4852ecb..4aa39341e8e 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -1226,6 +1226,7 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) AclResult aclresult; WalRcvExecResult *res; char originname[NAMEDATALEN]; + char applyoriginname[NAMEDATALEN]; ReplOriginId originid; UserContext ucxt; bool must_use_password; @@ -1285,12 +1286,17 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC || MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY); - /* Assign the origin tracking record name. */ + /* Assign the origin tracking record names. */ ReplicationOriginNameForLogicalRep(MySubscription->oid, MyLogicalRepWorker->relid, originname, sizeof(originname)); + ReplicationOriginNameForLogicalRep(MySubscription->oid, + InvalidOid, + applyoriginname, + sizeof(applyoriginname)); + if (MyLogicalRepWorker->relstate == SUBREL_STATE_DATASYNC) { /* @@ -1320,7 +1326,15 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) */ originid = replorigin_by_name(originname, false); replorigin_session_setup(originid, 0); - replorigin_xact_state.origin = originid; + + /* + * Tablesync keeps its own origin for replication progress, but writes + * must be tagged with the subscription-level apply origin so conflict + * detection sees tablesync and apply changes as coming from the same + * subscription. + */ + replorigin_xact_state.origin = replorigin_by_name(applyoriginname, false); + *origin_startpos = replorigin_session_get_progress(false); CommitTransactionCommand(); @@ -1407,7 +1421,14 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) UnlockRelationOid(ReplicationOriginRelationId, RowExclusiveLock); replorigin_session_setup(originid, 0); - replorigin_xact_state.origin = originid; + + /* + * Tablesync keeps its own origin for replication progress, but writes + * must be tagged with the subscription-level apply origin so conflict + * detection sees tablesync and apply changes as coming from the same + * subscription. + */ + replorigin_xact_state.origin = replorigin_by_name(applyoriginname, false); /* * If the user did not opt to run as the owner of the subscription diff --git a/src/test/subscription/t/029_on_error.pl b/src/test/subscription/t/029_on_error.pl index 7d68759b6cd..85d3478f44f 100644 --- a/src/test/subscription/t/029_on_error.pl +++ b/src/test/subscription/t/029_on_error.pl @@ -146,6 +146,8 @@ COMMIT; test_skip_lsn($node_publisher, $node_subscriber, "(2, NULL)", "2", "test skipping transaction"); +my $log_location = -s $node_subscriber->logfile; + # Test for PREPARE and COMMIT PREPARED. Update the data and PREPARE the # transaction, raising an error on the subscriber due to violation of the # unique constraint on tbl. Then skip the transaction. @@ -160,6 +162,13 @@ COMMIT PREPARED 'gtx'; test_skip_lsn($node_publisher, $node_subscriber, "(3, NULL)", "3", "test skipping prepare and commit prepared "); +# Check that no update_origin_differs conflicts are raised +my $logfile = slurp_file($node_subscriber->logfile(), $log_location); +unlike( + $logfile, + qr/conflict detected on relation "public.tbl": conflict=update_origin_differs.*/, + 'modifying the row copied by tablesync should not cause update_origin_differs conflict'); + # Test for STREAM COMMIT. Insert enough rows to tbl to exceed the 64kB # limit, also raising an error on the subscriber during applying spooled # changes for the same reason. Then skip the transaction. -- 2.53.0.windows.2 ^ permalink raw reply [nested|flat] 4+ messages in thread
* RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 2026-01-09 04:46 Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Dilip Kumar <[email protected]> 2026-01-10 00:56 ` Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Masahiko Sawada <[email protected]> 2026-04-03 07:24 ` RE: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Zhijie Hou (Fujitsu) <[email protected]> @ 2026-04-07 08:13 ` Zhijie Hou (Fujitsu) <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Zhijie Hou (Fujitsu) @ 2026-04-07 08:13 UTC (permalink / raw) To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: Amit Kapila <[email protected]>; vignesh C <[email protected]>; [email protected] <[email protected]>; [email protected] <[email protected]>; Masahiko Sawada <[email protected]>; Dilip Kumar <[email protected]> On Friday, April 3, 2026 3:24 PM Zhijie Hou (Fujitsu) <[email protected]> wrote: > On Saturday, January 10, 2026 8:57 AM Masahiko Sawada > <[email protected]> wrote: > > > > On Thu, Jan 8, 2026 at 8:46 PM Dilip Kumar <[email protected]> > wrote: > > > > > > On Fri, Jan 9, 2026 at 4:17 AM Masahiko Sawada > > <[email protected]> wrote: > > > > Can we somehow > > > > share the apply worker's origin with tablesync workers so that > > > > they can refer to the same origin ID? Or can we invent special > > > > origin IDs (e.g., > 0x00FF) that are the same as the normal origin > > > > ID except for being ignored by the conflict detection system? > > > > > > How will this distinguish between the initial sync is done from the > > > publisher node we are getting the update vs the initial sync is done > > > from some other node? Can we always ignore conflict checking for > > > initial synced data or do we just want to ignore if the initial > > > sync is done from the same node? > > > > I imagined the former idea; always ignore conflict checking, so we > > don't need to distinguish them. IOW we treat the changes via the > > initial tablesync as if the changes made by the normal backend process > > (who doesn't use replication origin) while using the replication > > tracking ability of the replication origin. > > I think for changes made by backend process without setting up the origin, the > apply worker still treat that as a conflict change when applying the remote > changes as that's necessary to local vs. remote updates. > > I personally prefer to let the tablesync worker share the apply worker's origin > ID while keeping a separate origin for progress tracking. Currently, the worker > first calls replorigin_session_setup() and then stores the origin ID in > replorigin_xact_state. The natural implementation is for the tablesync worker > to still set up its own origin for tracking, but assign the apply worker's origin ID > to the global state. This gives us per‑tablesync progress tracking while > ensuring that changes from both workers appear to come from the same > origin. > After further analysis, I think the approach I mentioned earlier is unsafe. When replaying the commit record during recovery, if only the main apply origin ID is present, we cannot recover the progress status for each tablesync origin. The idea of using a special origin ID for all tablesync origins suffers from the same problem, e.g., progress cannot be recovered when replaying commit WAL records. I have been trying to find a way to fix this issue within the proposed approaches, but I haven't been able to come up with a better solution for now. One attempt was to continue WAL‑logging the tablesync's own origin ID, but only store the main origin ID in the commit timestamp module. However, this also has a problem during recovery: it cannot identify which main origin corresponds to a given tablesync origin recorded in the commit WAL record. (One might think we could store this top‑level relationship in the catalog, but since catalogs are not accessible during recovery, that approach would not work.) Consequently, we cannot restore the same origin ID in the commit timestamp module during recovery as was present during normal commit. The remaining idea: storing the origin ID in pg_subscription_rel and teaching the apply worker to skip reporting origin_differs if the origin of the update matches the one stored in pg_subscription_rel, seems worth considering, if we cannot find an easier solution. There was a concern about performance, but since we could cache those tablesync origins in a local hash table and consult it during conflict detection, the performance impact might not be significant. That said, I may have missed some points. I will continue to think about this and try to update the patch later. Best Regards, Hou zj ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-04-07 08:13 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-01-09 04:46 Re: BUG #19360: Bug Report: Logical Replication initial sync fails with "conflict=update_origin_differs" PG12 toPG18 Dilip Kumar <[email protected]> 2026-01-10 00:56 ` Masahiko Sawada <[email protected]> 2026-04-03 07:24 ` Zhijie Hou (Fujitsu) <[email protected]> 2026-04-07 08:13 ` Zhijie Hou (Fujitsu) <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox