public inbox for [email protected]  
help / color / mirror / Atom feed
BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
12+ messages / 4 participants
[nested] [flat]

* BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2023-10-16 09:00  PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: PG Bug reporting form @ 2023-10-16 09:00 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following bug has been logged on the website:

Bug reference:      18158
Logged by:          Alexander Lakhin
Email address:      [email protected]
PostgreSQL version: 16.0
Operating system:   Ubuntu 22.04
Description:        

When running check-world in a loop on a rather slow device, I observed
an interesting case. All tests passed successfully, but in the system
journal I found a message about postgres crash during the run.
I reproduced the issue and found in src/test/recovery/tmp_check/log/
012_subtransactions_primary.log:
TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c",
Line: 616, PID: 23770

My analysis shows that this Assert failed in a backend (walsender, in my
case) when it has pending stats entries on a server shutdown.
In absence of such entries pgstat_report_stats() exits before the Assert:
	/* Don't expend a clock check if nothing to do */
	if (dlist_is_empty(&pgStatPending) &&
		!have_iostats &&
		!have_slrustats &&
		!pgstat_have_pending_wal())
	{
		Assert(pending_since == 0);
		return 0;
	}
	/*
	 * There should never be stats to report once stats are shut down. Can't
	 * assert that before the checks above, as there is an unconditional
	 * pgstat_report_stat() call in pgstat_shutdown_hook() - which at least
	 * the process that ran pgstat_before_server_shutdown() will still call.
	 */
	Assert(!pgStatLocal.shmem->is_shutdown);

With the following modification in pgstat_flush_pending_entries():
+if (nowait && (rand() % 10 == 0))
+        did_flush = false;
+else
+{
         did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
+}

the issue reproduced easily:
make -s check -C src/test/recovery/ PROVE_TESTS="t/012_subtransactions.pl"
grep TRAP -r src/test/recovery/tmp_check/log

# +++ tap check in src/test/recovery +++
t/012_subtransactions.pl .. ok    
All tests successful.
Files=1, Tests=12,  3 wallclock secs ( 0.01 usr  0.00 sys +  0.19 cusr  0.27
csys =  0.47 CPU)
Result: PASS
src/test/recovery/tmp_check/log/012_subtransactions_primary.log:
TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c",
Line: 616, PID: 2410126

The stack trace from the log:
postgres: primary: walsender law [local] streaming
0/3019C78(ExceptionalCondition+0xbb)[0x5616c7b58b89]
postgres: primary: walsender law [local] streaming
0/3019C78(pgstat_report_stat+0xef)[0x5616c7991ff1]
postgres: primary: walsender law [local] streaming
0/3019C78(+0x6a9e57)[0x5616c7991e57]
postgres: primary: walsender law [local] streaming
0/3019C78(shmem_exit+0xa6)[0x5616c792d852]
postgres: primary: walsender law [local] streaming
0/3019C78(+0x6456f6)[0x5616c792d6f6]
postgres: primary: walsender law [local] streaming
0/3019C78(proc_exit+0x19)[0x5616c792d63a]
postgres: primary: walsender law [local] streaming
0/3019C78(+0x607cec)[0x5616c78efcec]
postgres: primary: walsender law [local] streaming
0/3019C78(+0x606e47)[0x5616c78eee47]
postgres: primary: walsender law [local] streaming
0/3019C78(+0x6043c8)[0x5616c78ec3c8]
postgres: primary: walsender law [local] streaming
0/3019C78(exec_replication_command+0x4cc)[0x5616c78edfa5]
postgres: primary: walsender law [local] streaming
0/3019C78(PostgresMain+0x7df)[0x5616c796d54b]
...



^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-05-12 20:00  Alexander Lakhin <[email protected]>
  parent: PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Alexander Lakhin @ 2026-05-12 20:00 UTC (permalink / raw)
  To: [email protected]

Hello hackers,

16.10.2023 12:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      18158
> Logged by:          Alexander Lakhin
> Email address:[email protected]
> PostgreSQL version: 16.0
> Operating system:   Ubuntu 22.04
> Description:
> ...
> With the following modification in pgstat_flush_pending_entries():
> +if (nowait && (rand() % 10 == 0))
> +        did_flush = false;
> +else
> +{
>           did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
> +}
>
> the issue reproduced easily:
> make -s check -C src/test/recovery/ PROVE_TESTS="t/012_subtransactions.pl"
> grep TRAP -r src/test/recovery/tmp_check/log
>
> # +++ tap check in src/test/recovery +++
> t/012_subtransactions.pl .. ok
> All tests successful.
> Files=1, Tests=12,  3 wallclock secs ( 0.01 usr  0.00 sys +  0.19 cusr  0.27
> csys =  0.47 CPU)
> Result: PASS
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:
> TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c",
> Line: 616, PID: 2410126

It looks like skink produced this failure yesterday [1]:
241/316 recovery - postgresql:recovery/027_stream_regress ERROR           4374.08s   exit status 1
...
stderr:
#   Failed test 'check contents of pg_stat_statements on regression database'
#   at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 177.
#          got: 'CREATE|f
# SELECT|t'
#     expected: 'CREATE|t
# DELETE|t
# INSERT|t
# SELECT|t
# UPDATE|t'
# Looks like you failed 1 test of 9.

  pgsql.build/testrun/recovery/027_stream_regress/log/027_stream_regress_primary.log

2026-05-11 14:09:34.397 CEST [4064132][walsender][40/0:0] LOG: released physical replication slot "standby_1"
TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "../pgsql/src/backend/utils/activity/pgstat.c", Line: 612, 
PID: 4064132
postgres: primary: walsender bf [local] streaming 0/15B4FC98(ExceptionalCondition+0x5f) [0x45ae268]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(pgstat_report_stat+0x14d) [0x4492fe2]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x49315c) [0x449315c]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(shmem_exit+0x78) [0x4448f00]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x449020) [0x4449020]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(proc_exit+0x22) [0x44490c1]
postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x3f5dbd) [0x43f5dbd]
...
2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] LOG:  server process (PID 4064132) was terminated by signal 6: 
Aborted
2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] DETAIL: Failed process was running: START_REPLICATION SLOT 
"standby_1" 0/3000000 TIMELINE 1
2026-05-11 14:09:34.631 CEST [4063470][postmaster][:0] LOG: terminating any other active server processes

And perhaps (I can't find the full log now) one year ago [2]:
227/305 postgresql:recovery / recovery/027_stream_regress ERROR           3364.50s   exit status 1

[01:19:39.781](0.108s) not ok 9 - check contents of pg_stat_statements on regression database
[01:19:39.781](0.000s) #   Failed test 'check contents of pg_stat_statements on regression database'
#   at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 173.
[01:19:39.781](0.000s) #          got: 'CREATE|f
# SELECT|t'
#     expected: 'CREATE|t
# DELETE|t
# INSERT|t
# SELECT|t
# UPDATE|t'

I've reproduced such failures with the above modification applied, just
running:
for i in {1..20}; do PROVE_TESTS="t/027*" make -s check -C src/test/recovery/ || break; done

Reproduced on REL_17_STABLE and REL_16_STABLE (starting from dd8008e8e,
which updated 027_stream_regress.pl).

Not reproduced on REL_18_STABLE after 87a6690cc (coming from [3]).

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-05-11%2010%3A25%3A22
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-18%2023%3A39%3A44
[3] https://www.postgresql.org/message-id/yegnemsijlrhocsfgs7gs7irnczjgkom6fmk2a5u2b66pbvzwi%40ph2h3wppc...

Best regards,
Alexander

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-05-13 00:46  Xuneng Zhou <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Xuneng Zhou @ 2026-05-13 00:46 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: [email protected]

Hi Alexander,

On Wed, May 13, 2026 at 4:00 AM Alexander Lakhin <[email protected]> wrote:
>
> Hello hackers,
>
> 16.10.2023 12:00, PG Bug reporting form wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18158
> Logged by:          Alexander Lakhin
> Email address:      [email protected]
> PostgreSQL version: 16.0
> Operating system:   Ubuntu 22.04
> Description:
> ...
> With the following modification in pgstat_flush_pending_entries():
> +if (nowait && (rand() % 10 == 0))
> +        did_flush = false;
> +else
> +{
>          did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
> +}
>
> the issue reproduced easily:
> make -s check -C src/test/recovery/ PROVE_TESTS="t/012_subtransactions.pl"
> grep TRAP -r src/test/recovery/tmp_check/log
>
> # +++ tap check in src/test/recovery +++
> t/012_subtransactions.pl .. ok
> All tests successful.
> Files=1, Tests=12,  3 wallclock secs ( 0.01 usr  0.00 sys +  0.19 cusr  0.27
> csys =  0.47 CPU)
> Result: PASS
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:
> TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "pgstat.c",
> Line: 616, PID: 2410126
>
>
> It looks like skink produced this failure yesterday [1]:
> 241/316 recovery - postgresql:recovery/027_stream_regress                               ERROR           4374.08s   exit status 1
> ...
> stderr:
> #   Failed test 'check contents of pg_stat_statements on regression database'
> #   at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 177.
> #          got: 'CREATE|f
> # SELECT|t'
> #     expected: 'CREATE|t
> # DELETE|t
> # INSERT|t
> # SELECT|t
> # UPDATE|t'
> # Looks like you failed 1 test of 9.
>
>  pgsql.build/testrun/recovery/027_stream_regress/log/027_stream_regress_primary.log
>
> 2026-05-11 14:09:34.397 CEST [4064132][walsender][40/0:0] LOG:  released physical replication slot "standby_1"
> TRAP: failed Assert("!pgStatLocal.shmem->is_shutdown"), File: "../pgsql/src/backend/utils/activity/pgstat.c", Line: 612, PID: 4064132
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(ExceptionalCondition+0x5f) [0x45ae268]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(pgstat_report_stat+0x14d) [0x4492fe2]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x49315c) [0x449315c]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(shmem_exit+0x78) [0x4448f00]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x449020) [0x4449020]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(proc_exit+0x22) [0x44490c1]
> postgres: primary: walsender bf [local] streaming 0/15B4FC98(+0x3f5dbd) [0x43f5dbd]
> ...
> 2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] LOG:  server process (PID 4064132) was terminated by signal 6: Aborted
> 2026-05-11 14:09:34.629 CEST [4063470][postmaster][:0] DETAIL:  Failed process was running: START_REPLICATION SLOT "standby_1" 0/3000000 TIMELINE 1
> 2026-05-11 14:09:34.631 CEST [4063470][postmaster][:0] LOG:  terminating any other active server processes
>
> And perhaps (I can't find the full log now) one year ago [2]:
> 227/305 postgresql:recovery / recovery/027_stream_regress                               ERROR           3364.50s   exit status 1
>
> [01:19:39.781](0.108s) not ok 9 - check contents of pg_stat_statements on regression database
> [01:19:39.781](0.000s) #   Failed test 'check contents of pg_stat_statements on regression database'
> #   at /home/bf/bf-build/skink/REL_17_STABLE/pgsql/src/test/recovery/t/027_stream_regress.pl line 173.
> [01:19:39.781](0.000s) #          got: 'CREATE|f
> # SELECT|t'
> #     expected: 'CREATE|t
> # DELETE|t
> # INSERT|t
> # SELECT|t
> # UPDATE|t'
>
> I've reproduced such failures with the above modification applied, just
> running:
> for i in {1..20}; do PROVE_TESTS="t/027*" make -s check -C src/test/recovery/ || break; done
>
> Reproduced on REL_17_STABLE and REL_16_STABLE (starting from dd8008e8e,
> which updated 027_stream_regress.pl).
>
> Not reproduced on REL_18_STABLE after 87a6690cc (coming from [3]).
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-05-11%2010%3A25%3A22
> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-18%2023%3A39%3A44
> [3] https://www.postgresql.org/message-id/yegnemsijlrhocsfgs7gs7irnczjgkom6fmk2a5u2b66pbvzwi%40ph2h3wppc...
>
> Best regards,
> Alexander

Thanks for reporting this. It appears like backpatching an equivalent
of 87a6690cc69 to REL_17/REL_16 would resolve this issue.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-05-13 06:55  Michael Paquier <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Michael Paquier @ 2026-05-13 06:55 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

On Wed, May 13, 2026 at 08:46:25AM +0800, Xuneng Zhou wrote:
> Thanks for reporting this. It appears like backpatching an equivalent
> of 87a6690cc69 to REL_17/REL_16 would resolve this issue.

FWIW, I fear such a change in the stable branches based on its
invasiveness, particularly because the area of the code dealing with
shutdown sequences has been reworked a lot lately as far as I recall.

I am seriously wondering if we should just let it go and remove the
assertion on v15~v17.  It's not perfect, of course, but I value much
more this assertion in terms of any future work that can be done with
pgstats, not in terms of what we could detect for bug fixes.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-05-14 07:19  Xuneng Zhou <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Xuneng Zhou @ 2026-05-14 07:19 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

On Wed, May 13, 2026 at 2:55 PM Michael Paquier <[email protected]> wrote:
>
> On Wed, May 13, 2026 at 08:46:25AM +0800, Xuneng Zhou wrote:
> > Thanks for reporting this. It appears like backpatching an equivalent
> > of 87a6690cc69 to REL_17/REL_16 would resolve this issue.
>
> FWIW, I fear such a change in the stable branches based on its
> invasiveness, particularly because the area of the code dealing with
> shutdown sequences has been reworked a lot lately as far as I recall.
>
> I am seriously wondering if we should just let it go and remove the
> assertion on v15~v17.  It's not perfect, of course, but I value much
> more this assertion in terms of any future work that can be done with
> pgstats, not in terms of what we could detect for bug fixes.

I agree that applying the change wholesale could be too invasive. If
we decide not to address the ordering issue, removing this assertion
seems reasonable, since the non-assert path does not actually provide
the required guarantee.

-- 
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-05 02:14  Michael Paquier <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Michael Paquier @ 2026-06-05 02:14 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

On Thu, May 14, 2026 at 03:19:34PM +0800, Xuneng Zhou wrote:
> I agree that applying the change wholesale could be too invasive. If
> we decide not to address the ordering issue, removing this assertion
> seems reasonable, since the non-assert path does not actually provide
> the required guarantee.

Attached is a patch for 15~17 that is able to achieve this goal:
- Lift the assertion on shutdown for WAL senders, but keep it
everywhere else.
- Secondary trick in pgstat_shutdown_hook to cope with the fact that
some stats make still be around at shutdown for a WAL sender.

I cannot reproduce the failure re-using the trick sent by Alexander at
the top of the thread.

Thoughts or objections?
--
Michael

From 62bae7f32ea095e0130af762bdf6ef39a7c5d388 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Fri, 5 Jun 2026 10:42:47 +0900
Subject: [PATCH] Lift shutdown assertion in pgstats for WAL senders

Backpatch-through: 15-17
---
 src/backend/utils/activity/pgstat.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 33025cf4aad3..b7cf6fa88ff4 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -97,6 +97,7 @@
 #include "lib/dshash.h"
 #include "pgstat.h"
 #include "port/atomics.h"
+#include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
@@ -516,8 +517,12 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
-	/* there shouldn't be any pending changes left */
-	Assert(dlist_is_empty(&pgStatPending));
+	/*
+	 * There shouldn't be any pending changes left, unless this is a WAL
+	 * sender that would shut down after the checkpointer has flushed the
+	 * stats.
+	 */
+	Assert(dlist_is_empty(&pgStatPending) || am_walsender);
 	dlist_init(&pgStatPending);
 
 	pgstat_detach_shmem();
@@ -608,7 +613,13 @@ pgstat_report_stat(bool force)
 	 * assert that before the checks above, as there is an unconditional
 	 * pgstat_report_stat() call in pgstat_shutdown_hook() - which at least
 	 * the process that ran pgstat_before_server_shutdown() will still call.
+	 *
+	 * WAL senders would be shut down after the checkpointer and may still
+	 * have stats.  Skip them.
 	 */
+	if (pgStatLocal.shmem->is_shutdown && am_walsender)
+		return 0;
+
 	Assert(!pgStatLocal.shmem->is_shutdown);
 
 	if (force)
-- 
2.54.0



Attachments:

  [text/plain] 0001-Lift-shutdown-assertion-in-pgstats-for-WAL-senders.patch (1.7K, 2-0001-Lift-shutdown-assertion-in-pgstats-for-WAL-senders.patch)
  download | inline diff:
From 62bae7f32ea095e0130af762bdf6ef39a7c5d388 Mon Sep 17 00:00:00 2001
From: Michael Paquier <[email protected]>
Date: Fri, 5 Jun 2026 10:42:47 +0900
Subject: [PATCH] Lift shutdown assertion in pgstats for WAL senders

Backpatch-through: 15-17
---
 src/backend/utils/activity/pgstat.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 33025cf4aad3..b7cf6fa88ff4 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -97,6 +97,7 @@
 #include "lib/dshash.h"
 #include "pgstat.h"
 #include "port/atomics.h"
+#include "replication/walsender.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
@@ -516,8 +517,12 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
-	/* there shouldn't be any pending changes left */
-	Assert(dlist_is_empty(&pgStatPending));
+	/*
+	 * There shouldn't be any pending changes left, unless this is a WAL
+	 * sender that would shut down after the checkpointer has flushed the
+	 * stats.
+	 */
+	Assert(dlist_is_empty(&pgStatPending) || am_walsender);
 	dlist_init(&pgStatPending);
 
 	pgstat_detach_shmem();
@@ -608,7 +613,13 @@ pgstat_report_stat(bool force)
 	 * assert that before the checks above, as there is an unconditional
 	 * pgstat_report_stat() call in pgstat_shutdown_hook() - which at least
 	 * the process that ran pgstat_before_server_shutdown() will still call.
+	 *
+	 * WAL senders would be shut down after the checkpointer and may still
+	 * have stats.  Skip them.
 	 */
+	if (pgStatLocal.shmem->is_shutdown && am_walsender)
+		return 0;
+
 	Assert(!pgStatLocal.shmem->is_shutdown);
 
 	if (force)
-- 
2.54.0



  [application/pgp-signature] signature.asc (833B, 3-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-05 05:00  Alexander Lakhin <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 2 replies; 12+ messages in thread

From: Alexander Lakhin @ 2026-06-05 05:00 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; Xuneng Zhou <[email protected]>; +Cc: [email protected]

Hello Michael,

05.06.2026 05:14, Michael Paquier wrote:
> On Thu, May 14, 2026 at 03:19:34PM +0800, Xuneng Zhou wrote:
>> I agree that applying the change wholesale could be too invasive. If
>> we decide not to address the ordering issue, removing this assertion
>> seems reasonable, since the non-assert path does not actually provide
>> the required guarantee.
> Attached is a patch for 15~17 that is able to achieve this goal:
> - Lift the assertion on shutdown for WAL senders, but keep it
> everywhere else.
> - Secondary trick in pgstat_shutdown_hook to cope with the fact that
> some stats make still be around at shutdown for a WAL sender.
>
> I cannot reproduce the failure re-using the trick sent by Alexander at
> the top of the thread.

Please try the following:
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -107,2 +107,3 @@

+#include "common/pg_prng.h"

@@ -1212,3 +1213,8 @@ pgstat_flush_pending_entries(bool nowait)
          /* flush the stats, if possible */
+if (nowait && (pg_prng_double(&pg_global_prng_state) < 0.1))
+        did_flush = false;
+else
+{
          did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
+}

As far as I can see, that rand() gives the same sequence on macOS each run,
while on Linux I see different sequences.

The patch works for me test-wise.

Thank you for the fix!

Best regards,
Alexander





^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-05 05:03  Michael Paquier <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  1 sibling, 0 replies; 12+ messages in thread

From: Michael Paquier @ 2026-06-05 05:03 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; [email protected]

On Fri, Jun 05, 2026 at 08:00:00AM +0300, Alexander Lakhin wrote:
> As far as I can see, that rand() gives the same sequence on macOS each run,
> while on Linux I see different sequences.

Aye.  I have used that before sending the patch, reproducing the
failure before-patch, getting things to work after-patch.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-06 00:13  Michael Paquier <[email protected]>
  parent: Alexander Lakhin <[email protected]>
  1 sibling, 1 reply; 12+ messages in thread

From: Michael Paquier @ 2026-06-06 00:13 UTC (permalink / raw)
  To: Alexander Lakhin <[email protected]>; +Cc: Xuneng Zhou <[email protected]>; [email protected]

On Fri, Jun 05, 2026 at 08:00:00AM +0300, Alexander Lakhin wrote:
> Thank you for the fix!

Applied as of 4801610f7c66 in the v15~17 range.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-06 12:45  Xuneng Zhou <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Xuneng Zhou @ 2026-06-06 12:45 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

Hi Michael,

On Sat, Jun 6, 2026 at 8:13 AM Michael Paquier <[email protected]> wrote:
>
> On Fri, Jun 05, 2026 at 08:00:00AM +0300, Alexander Lakhin wrote:
> > Thank you for the fix!
>
> Applied as of 4801610f7c66 in the v15~17 range.

Sorry for being late. One nit for the patch -- would it be better to
use AmWalSenderProcess() for identification to save the include?
+#include "replication/walsender.h"

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-08 05:48  Michael Paquier <[email protected]>
  parent: Xuneng Zhou <[email protected]>
  0 siblings, 1 reply; 12+ messages in thread

From: Michael Paquier @ 2026-06-08 05:48 UTC (permalink / raw)
  To: Xuneng Zhou <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

On Sat, Jun 06, 2026 at 08:45:26PM +0800, Xuneng Zhou wrote:
> Sorry for being late. One nit for the patch -- would it be better to
> use AmWalSenderProcess() for identification to save the include?
> +#include "replication/walsender.h"

In the startup sequence, MyBackendType is updated to B_WAL_SENDER a
couple of lines after am_walsender.  One or the other don't really
matter; they're both right.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 12+ messages in thread

* Re: BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending
@ 2026-06-08 08:43  Xuneng Zhou <[email protected]>
  parent: Michael Paquier <[email protected]>
  0 siblings, 0 replies; 12+ messages in thread

From: Xuneng Zhou @ 2026-06-08 08:43 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: Alexander Lakhin <[email protected]>; [email protected]

On Mon, Jun 8, 2026 at 1:49 PM Michael Paquier <[email protected]> wrote:
>
> On Sat, Jun 06, 2026 at 08:45:26PM +0800, Xuneng Zhou wrote:
> > Sorry for being late. One nit for the patch -- would it be better to
> > use AmWalSenderProcess() for identification to save the include?
> > +#include "replication/walsender.h"
>
> In the startup sequence, MyBackendType is updated to B_WAL_SENDER a
> couple of lines after am_walsender.  One or the other don't really
> matter; they're both right.

Yeah, both of them work well.  To use AmWalSenderProcess(), we would
need to include src/include/miscadmin.h, whereas using am_walsender
requires including “replication/walsender.h”. I’m just wondering
whether using AmWalSenderProcess() would result in less coupling
between pgstat and replication.


--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.






^ permalink  raw  reply  [nested|flat] 12+ messages in thread


end of thread, other threads:[~2026-06-08 08:43 UTC | newest]

Thread overview: 12+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 09:00 BUG #18158: Assert in pgstat_report_stat() fails when a backend shutting down with stats pending PG Bug reporting form <[email protected]>
2026-05-12 20:00 ` Alexander Lakhin <[email protected]>
2026-05-13 00:46   ` Xuneng Zhou <[email protected]>
2026-05-13 06:55     ` Michael Paquier <[email protected]>
2026-05-14 07:19       ` Xuneng Zhou <[email protected]>
2026-06-05 02:14         ` Michael Paquier <[email protected]>
2026-06-05 05:00           ` Alexander Lakhin <[email protected]>
2026-06-05 05:03             ` Michael Paquier <[email protected]>
2026-06-06 00:13             ` Michael Paquier <[email protected]>
2026-06-06 12:45               ` Xuneng Zhou <[email protected]>
2026-06-08 05:48                 ` Michael Paquier <[email protected]>
2026-06-08 08:43                   ` Xuneng Zhou <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox