public inbox for [email protected]  
help / color / mirror / Atom feed
Fix stats reporting delays in logical parallel apply worker
7+ messages / 3 participants
[nested] [flat]

* Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 03:01  Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-04-17 03:01 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>

Hi,

When implementing another feature, I noticed that parallel apply workers
currently do not report statistics while idle in their main loop. This can cause
stats from the last processed transaction to be arbitrarily delayed, especially
when there are long gaps between streamed transactions.

The issue is demonstrated in 0002, where a TAP test fails when attempting to
collect stats from a parallel apply worker that has no subsequent transaction to
trigger a stats report.

0001 fixes this issue by forcing a stats report when the worker is idle in the
main loop, matching the behavior already present in LogicalRepApplyLoop() for
regular logical apply workers.

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v1-0002-Test-the-stats-report-in-parallel-apply-worker.patch (1.3K, 2-v1-0002-Test-the-stats-report-in-parallel-apply-worker.patch)
  download | inline diff:
From 0fa996424feeb17032bf44e2c5d5bbb44d65228b Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 15 Apr 2026 16:59:53 +0800
Subject: [PATCH v1 2/2] Test the stats report in parallel apply worker

---
 src/test/subscription/t/026_stats.pl | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 5d457060a02..178b3b71e49 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -136,6 +136,15 @@ sub create_sub_pub_w_errors
 	# Truncate test table so that apply worker can continue.
 	$node_subscriber->safe_psql($db, qq(TRUNCATE $table_name));
 
+	# Force the publisher to stream the changes to the subscriber immediately so
+	# that the delete_missing conflict can be tested in the parallel apply
+	# worker.
+	$node_publisher->append_conf('postgresql.conf',
+		"debug_logical_replication_streaming = immediate");
+	$node_publisher->reload;
+
+	$node_subscriber->safe_psql($db, qq(ALTER SUBSCRIPTION $sub_name SET (streaming = parallel)));
+
 	# Delete data from the test table on the publisher. This delete operation
 	# should be skipped on the subscriber since the table is already empty.
 	$node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
-- 
2.53.0.windows.2



  [application/octet-stream] v1-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch (1.4K, 3-v1-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch)
  download | inline diff:
From d6df6255fdc49e26e65637826358de2a7ce37bc1 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 15 Apr 2026 17:30:03 +0800
Subject: [PATCH v1 1/2] Fix stats reporting delays in parallel apply worker

Parallel apply workers currently do not flush statistics while idle in their
main loop. This can cause stats from the last processed transaction to be
arbitrarily delayed, especially when there are long gaps between streamed
transactions.

Fix this by forcing a stats report when the worker is idle in the main loop,
ensuring timely visibility of accumulated statistics.
---
 src/backend/replication/logical/applyparallelworker.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/replication/logical/applyparallelworker.c b/src/backend/replication/logical/applyparallelworker.c
index 798e8d85b3e..aaa21a5a94a 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -815,6 +815,15 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
 
 				if (rc & WL_LATCH_SET)
 					ResetLatch(MyLatch);
+
+				/*
+				 * Force stats reporting to avoid long delays. There can be long
+				 * idle gaps before the leader assigns the next transaction, and
+				 * the only opportunity to report stats during such gaps is
+				 * here.
+				 */
+				if ((rc & WL_TIMEOUT) && !IsTransactionState())
+					pgstat_report_stat(true);
 			}
 		}
 		else
-- 
2.53.0.windows.2



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* RE: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 03:35  Zhijie Hou (Fujitsu) <[email protected]>
  parent: Zhijie Hou (Fujitsu) <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-04-17 03:35 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; PostgreSQL Hackers <[email protected]>; +Cc: Amit Kapila <[email protected]>

On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <[email protected]> wrote:
> Hi,
> 
> When implementing another feature, I noticed that parallel apply workers
> currently do not report statistics while idle in their main loop. This can cause
> stats from the last processed transaction to be arbitrarily delayed, especially
> when there are long gaps between streamed transactions.
> 
> The issue is demonstrated in 0002, where a TAP test fails when attempting to
> collect stats from a parallel apply worker that has no subsequent transaction
> to
> trigger a stats report.
> 
> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> main loop, matching the behavior already present in LogicalRepApplyLoop()
> for
> regular logical apply workers.

Regarding 0002, I realized that the streaming option is now set to 'parallel' by
default so can avoid adjusting the option again. The test needs to be adjusted
to increase the worker limit so that a parallel worker can start. Here are the
updated patches.

Best Regards,
Hou zj


Attachments:

  [application/octet-stream] v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch (1.4K, 2-v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch)
  download | inline diff:
From d6df6255fdc49e26e65637826358de2a7ce37bc1 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 15 Apr 2026 17:30:03 +0800
Subject: [PATCH v2 1/2] Fix stats reporting delays in parallel apply worker

Parallel apply workers currently do not report statistics while idle in their
main loop. This can cause stats from the last processed transaction to be
arbitrarily delayed, especially when there are long gaps between streamed
transactions.

Fix this by forcing a stats report when the worker is idle in the main loop,
ensuring timely visibility of accumulated statistics.
---
 src/backend/replication/logical/applyparallelworker.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/replication/logical/applyparallelworker.c b/src/backend/replication/logical/applyparallelworker.c
index 798e8d85b3e..aaa21a5a94a 100644
--- a/src/backend/replication/logical/applyparallelworker.c
+++ b/src/backend/replication/logical/applyparallelworker.c
@@ -815,6 +815,15 @@ LogicalParallelApplyLoop(shm_mq_handle *mqh)
 
 				if (rc & WL_LATCH_SET)
 					ResetLatch(MyLatch);
+
+				/*
+				 * Force stats reporting to avoid long delays. There can be long
+				 * idle gaps before the leader assigns the next transaction, and
+				 * the only opportunity to report stats during such gaps is
+				 * here.
+				 */
+				if ((rc & WL_TIMEOUT) && !IsTransactionState())
+					pgstat_report_stat(true);
 			}
 		}
 		else
-- 
2.53.0.windows.2



  [application/octet-stream] v2-0002-Test-the-stats-report-in-parallel-apply-worker.patch (2.0K, 3-v2-0002-Test-the-stats-report-in-parallel-apply-worker.patch)
  download | inline diff:
From bc74d2584adbb601aa6a0fd5f023190f7bfaad27 Mon Sep 17 00:00:00 2001
From: Zhijie Hou <[email protected]>
Date: Wed, 15 Apr 2026 16:59:53 +0800
Subject: [PATCH v2 2/2] Test the stats report in parallel apply worker

---
 src/test/subscription/t/026_stats.pl | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 5d457060a02..cfd8d9b9d00 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -16,6 +16,9 @@ $node_publisher->start;
 # Create subscriber node.
 my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
 $node_subscriber->init;
+
+# Increase the worker limit to allow a parallel apply worker for this test.
+$node_subscriber->append_conf('postgresql.conf', "max_logical_replication_workers = 5");
 $node_subscriber->start;
 
 
@@ -136,6 +139,13 @@ sub create_sub_pub_w_errors
 	# Truncate test table so that apply worker can continue.
 	$node_subscriber->safe_psql($db, qq(TRUNCATE $table_name));
 
+	# Force the publisher to stream the changes to the subscriber immediately so
+	# that the delete_missing conflict can be tested in the parallel apply
+	# worker.
+	$node_publisher->append_conf('postgresql.conf',
+		"debug_logical_replication_streaming = immediate");
+	$node_publisher->reload;
+
 	# Delete data from the test table on the publisher. This delete operation
 	# should be skipped on the subscriber since the table is already empty.
 	$node_publisher->safe_psql($db, qq(DELETE FROM $table_name;));
@@ -151,6 +161,12 @@ sub create_sub_pub_w_errors
 	  or die
 	  qq(Timed out while waiting for delete_missing conflict for subscription '$sub_name');
 
+	# Reset debug_logical_replication_streaming to allow subsequent tests to
+	# verify non-streaming behavior.
+	$node_publisher->append_conf('postgresql.conf',
+		"debug_logical_replication_streaming = buffered");
+	$node_publisher->reload;
+
 	return ($pub_name, $sub_name);
 }
 
-- 
2.53.0.windows.2



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 07:19  Amit Kapila <[email protected]>
  parent: Zhijie Hou (Fujitsu) <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Amit Kapila @ 2026-04-17 07:19 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> When implementing another feature, I noticed that parallel apply workers
> currently do not report statistics while idle in their main loop. This can cause
> stats from the last processed transaction to be arbitrarily delayed, especially
> when there are long gaps between streamed transactions.
>
> The issue is demonstrated in 0002, where a TAP test fails when attempting to
> collect stats from a parallel apply worker that has no subsequent transaction to
> trigger a stats report.
>
> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> main loop, matching the behavior already present in LogicalRepApplyLoop() for
> regular logical apply workers.
>

LGTM. We should backpatch this change.

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 07:40  Chao Li <[email protected]>
  parent: Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Chao Li @ 2026-04-17 07:40 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Amit Kapila <[email protected]>



> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <[email protected]> wrote:
> 
> On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu) <[email protected]> wrote:
>> Hi,
>> 
>> When implementing another feature, I noticed that parallel apply workers
>> currently do not report statistics while idle in their main loop. This can cause
>> stats from the last processed transaction to be arbitrarily delayed, especially
>> when there are long gaps between streamed transactions.
>> 
>> The issue is demonstrated in 0002, where a TAP test fails when attempting to
>> collect stats from a parallel apply worker that has no subsequent transaction
>> to
>> trigger a stats report.
>> 
>> 0001 fixes this issue by forcing a stats report when the worker is idle in the
>> main loop, matching the behavior already present in LogicalRepApplyLoop()
>> for
>> regular logical apply workers.
> 
> Regarding 0002, I realized that the streaming option is now set to 'parallel' by
> default so can avoid adjusting the option again. The test needs to be adjusted
> to increase the worker limit so that a parallel worker can start. Here are the
> updated patches.
> 
> Best Regards,
> Hou zj
> <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-0002-Test-the-stats-report-in-parallel-apply-worker.patch>

I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT together, so we can do “else if (rc & WL_TIMEOUT) && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the WL_TIMEOUT check, which could be slightly more efficient.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* RE: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 09:20  Zhijie Hou (Fujitsu) <[email protected]>
  parent: Chao Li <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Zhijie Hou (Fujitsu) @ 2026-04-17 09:20 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Amit Kapila <[email protected]>

On Friday, April 17, 2026 3:41 PM Chao Li <[email protected]> wrote:
> 
> > On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <[email protected]>
> wrote:
> >
> > On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >> Hi,
> >>
> >> When implementing another feature, I noticed that parallel apply workers
> >> currently do not report statistics while idle in their main loop. This can
> cause
> >> stats from the last processed transaction to be arbitrarily delayed,
> especially
> >> when there are long gaps between streamed transactions.
> >>
> >> The issue is demonstrated in 0002, where a TAP test fails when attempting
> to
> >> collect stats from a parallel apply worker that has no subsequent
> transaction
> >> to
> >> trigger a stats report.
> >>
> >> 0001 fixes this issue by forcing a stats report when the worker is idle in the
> >> main loop, matching the behavior already present in
> LogicalRepApplyLoop()
> >> for
> >> regular logical apply workers.
> >
> > Regarding 0002, I realized that the streaming option is now set to 'parallel'
> by
> > default so can avoid adjusting the option again. The test needs to be
> adjusted
> > to increase the worker limit so that a parallel worker can start. Here are the
> > updated patches.
> >
> > Best Regards,
> > Hou zj
> > <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-
> 0002-Test-the-stats-report-in-parallel-apply-worker.patch>
> 
> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT
> together, so we can do “else if (rc & WL_TIMEOUT)
> && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the
> WL_TIMEOUT check, which could be slightly more efficient.

I'm not sure we should assume that WaitLatch will set only one flag at a time.
even if that assumption holds for this specific case, handling bit flags this way looks a bit odd.
AFAICS, we don't use this style elsewhere in the code.
Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c)
all use if ... if logic.

Best Regards,
Hou zj


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-17 09:30  Chao Li <[email protected]>
  parent: Zhijie Hou (Fujitsu) <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Chao Li @ 2026-04-17 09:30 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Amit Kapila <[email protected]>



> On Apr 17, 2026, at 17:20, Zhijie Hou (Fujitsu) <[email protected]> wrote:
> 
> On Friday, April 17, 2026 3:41 PM Chao Li <[email protected]> wrote:
>> 
>>> On Apr 17, 2026, at 11:35, Zhijie Hou (Fujitsu) <[email protected]>
>> wrote:
>>> 
>>> On Friday, April 17, 2026 11:01 AM Zhijie Hou (Fujitsu)
>> <[email protected]> wrote:
>>>> Hi,
>>>> 
>>>> When implementing another feature, I noticed that parallel apply workers
>>>> currently do not report statistics while idle in their main loop. This can
>> cause
>>>> stats from the last processed transaction to be arbitrarily delayed,
>> especially
>>>> when there are long gaps between streamed transactions.
>>>> 
>>>> The issue is demonstrated in 0002, where a TAP test fails when attempting
>> to
>>>> collect stats from a parallel apply worker that has no subsequent
>> transaction
>>>> to
>>>> trigger a stats report.
>>>> 
>>>> 0001 fixes this issue by forcing a stats report when the worker is idle in the
>>>> main loop, matching the behavior already present in
>> LogicalRepApplyLoop()
>>>> for
>>>> regular logical apply workers.
>>> 
>>> Regarding 0002, I realized that the streaming option is now set to 'parallel'
>> by
>>> default so can avoid adjusting the option again. The test needs to be
>> adjusted
>>> to increase the worker limit so that a parallel worker can start. Here are the
>>> updated patches.
>>> 
>>> Best Regards,
>>> Hou zj
>>> <v2-0001-Fix-stats-reporting-delays-in-parallel-apply-work.patch><v2-
>> 0002-Test-the-stats-report-in-parallel-apply-worker.patch>
>> 
>> I think WaitLatch will never return WL_LATCH_SET and WL_TIMEOUT
>> together, so we can do “else if (rc & WL_TIMEOUT)
>> && !IsTransactionState())”, so that upon WL_LATCH_SET, it skips the
>> WL_TIMEOUT check, which could be slightly more efficient.
> 
> I'm not sure we should assume that WaitLatch will set only one flag at a time.
> even if that assumption holds for this specific case, handling bit flags this way looks a bit odd.
> AFAICS, we don't use this style elsewhere in the code.
> Currently, users of WL_TIMEOUT (in basebackup_throttle.c, walreceiver.c, worker.c)
> all use if ... if logic.
> 
> Best Regards,
> Hou zj

WL_TIMEOUT is not a real event. If we look at the code of WaitLatch:
```
   if (WaitEventSetWait(LatchWaitSet,
                   (wakeEvents & WL_TIMEOUT) ? timeout : -1,
                   &event, 1,
                   wait_event_info) == 0)
      return WL_TIMEOUT;
   else
      return event.events;
```
WL_TIMEOUT won’t be union with other events at all.

Anyway, that’s not a big concern.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/









^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: Fix stats reporting delays in logical parallel apply worker
@ 2026-04-20 08:23  Amit Kapila <[email protected]>
  parent: Amit Kapila <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Amit Kapila @ 2026-04-20 08:23 UTC (permalink / raw)
  To: Zhijie Hou (Fujitsu) <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On Fri, Apr 17, 2026 at 12:49 PM Amit Kapila <[email protected]> wrote:
>
> On Fri, Apr 17, 2026 at 8:31 AM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > When implementing another feature, I noticed that parallel apply workers
> > currently do not report statistics while idle in their main loop. This can cause
> > stats from the last processed transaction to be arbitrarily delayed, especially
> > when there are long gaps between streamed transactions.
> >
> > The issue is demonstrated in 0002, where a TAP test fails when attempting to
> > collect stats from a parallel apply worker that has no subsequent transaction to
> > trigger a stats report.
> >
> > 0001 fixes this issue by forcing a stats report when the worker is idle in the
> > main loop, matching the behavior already present in LogicalRepApplyLoop() for
> > regular logical apply workers.
> >
>
> LGTM. We should backpatch this change.
>

Pushed now.

-- 
With Regards,
Amit Kapila.





^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2026-04-20 08:23 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-17 03:01 Fix stats reporting delays in logical parallel apply worker Zhijie Hou (Fujitsu) <[email protected]>
2026-04-17 03:35 ` Zhijie Hou (Fujitsu) <[email protected]>
2026-04-17 07:40   ` Chao Li <[email protected]>
2026-04-17 09:20     ` Zhijie Hou (Fujitsu) <[email protected]>
2026-04-17 09:30       ` Chao Li <[email protected]>
2026-04-17 07:19 ` Amit Kapila <[email protected]>
2026-04-20 08:23   ` Amit Kapila <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox