public inbox for [email protected]
help / color / mirror / Atom feedAdd wait events for server logging destination writes
9+ messages / 5 participants
[nested] [flat]
* Add wait events for server logging destination writes
@ 2026-05-31 08:50 신성준 <[email protected]>
0 siblings, 3 replies; 9+ messages in thread
From: 신성준 @ 2026-05-31 08:50 UTC (permalink / raw)
To: [email protected]; +Cc: Kirk Wolak <[email protected]>; Andrey Borodin <[email protected]>; Andreas Karlsson <[email protected]>; Nikolay Samokhvalov <[email protected]>
Hi hackers,
The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:
- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)
If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.
Attached is a short series that adds two WaitEventIO events and reports
them around those writes:
IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.
It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.
I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.
- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%
- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%
On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.
Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.
Thanks,
Seongjun Shin
Attachments:
[application/octet-stream] v1-0001-Add-wait-events-for-server-logging-destination-wr.patch (3.6K, 2-v1-0001-Add-wait-events-for-server-logging-destination-wr.patch)
download | inline diff:
From 9c5907b11e5f20f9424757f474f8b9cb8c3b4266 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v1 1/2] Add wait events for server logging destination writes
When a backend writes to the syslogger pipe in write_pipe_chunks() or
to stderr in write_console(), the underlying write(2) can block once
the pipe buffer fills up or the output device is slow. These blocking
syscalls were not instrumented, so pg_stat_activity reported
wait_event IS NULL during that time. Many monitoring tools interpret
NULL as on-CPU work, which made heavy-logging stalls hard to
attribute.
Add two new WaitEventIO events and report them around the relevant
write(2) calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
The instrumentation is limited to the leaf write call. It uses only
the existing pgstat_report_wait_start()/end() inline helpers, which
are allocation-free and safe to call before MyProc is set up, so this
remains safe to invoke from within error reporting paths.
---
src/backend/utils/activity/wait_event_names.txt | 2 ++
src/backend/utils/error/elog.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5537a2d2530..ce33807c3fe 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -253,6 +253,8 @@ SLRU_WRITE "Waiting for a write of an SLRU page."
SNAPBUILD_READ "Waiting for a read of a serialized historical catalog snapshot."
SNAPBUILD_SYNC "Waiting for a serialized historical catalog snapshot to reach durable storage."
SNAPBUILD_WRITE "Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE "Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE "Waiting for a write to the syslogger pipe."
TIMELINE_HISTORY_FILE_SYNC "Waiting for a timeline history file received via streaming replication to reach durable storage."
TIMELINE_HISTORY_FILE_WRITE "Waiting for a write of a timeline history file received via streaming replication."
TIMELINE_HISTORY_READ "Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index aa530d3685e..fa38d6c6df8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2662,7 +2662,9 @@ write_console(const char *line, int len)
* We ignore any error from write() here. We have no useful way to report
* it ... certainly whining on stderr isn't likely to be productive.
*/
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
rc = write(fileno(stderr), line, len);
+ pgstat_report_wait_end();
(void) rc;
}
@@ -3503,7 +3505,9 @@ write_pipe_chunks(char *data, int len, int dest)
/* no need to set PIPE_PROTO_IS_LAST yet */
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_end();
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
@@ -3513,7 +3517,9 @@ write_pipe_chunks(char *data, int len, int dest)
p.proto.flags |= PIPE_PROTO_IS_LAST;
p.proto.len = len;
memcpy(p.proto.data, data, len);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+ pgstat_report_wait_end();
(void) rc;
}
--
2.50.1 (Apple Git-155)
[application/octet-stream] v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch (1.3K, 3-v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch)
download | inline diff:
From 4d67d15c56da804b541426f5583f98c8b77cb19f Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v1 2/2] Report StderrWrite wait event around WriteConsoleW()
on Windows
On Windows, write_console() emits log messages to the console with
WriteConsoleW() rather than write(2). Like the write(2) path, this
call can block on a slow console, leaving wait_event IS NULL in
pg_stat_activity.
Wrap WriteConsoleW() with the StderrWrite wait event introduced in
the previous patch, so the Windows console path is instrumented
consistently with the stderr write(2) path.
---
src/backend/utils/error/elog.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index fa38d6c6df8..d19ec6a31e8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2634,11 +2634,14 @@ write_console(const char *line, int len)
DWORD written;
stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
{
+ pgstat_report_wait_end();
pfree(utf16);
return;
}
+ pgstat_report_wait_end();
/*
* In case WriteConsoleW() failed, fall back to writing the
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-05-31 10:42 신성준 <[email protected]>
parent: 신성준 <[email protected]>
2 siblings, 1 reply; 9+ messages in thread
From: 신성준 @ 2026-05-31 10:42 UTC (permalink / raw)
To: [email protected]; +Cc: Kirk Wolak <[email protected]>; Andrey Borodin <[email protected]>; Andreas Karlsson <[email protected]>; Nikolay Samokhvalov <[email protected]>
Hi,
cfbot caught a build failure on v1, in the SanityCheck task on Linux
and Windows: elog.c uses pgstat_report_wait_start()/end() and the
WAIT_EVENT_* constants but didn't include utils/wait_event.h. It only
built here because of an accidental transitive include on my machine;
on the CI images the declarations weren't visible.
v2 fixes that by adding the missing #include "utils/wait_event.h" to
elog.c, folded into 0001 so that patch builds on its own. No other
changes; the wait events and the reported write paths are the same as
in v1.
v2-0001 adds the two events and covers the write(2) paths.
v2-0002 covers the Windows WriteConsoleW() path, split out as before.
Applies cleanly on current master; full build passes locally.
Thanks,
Seongjun Shin
2026년 5월 31일 (일) 오후 5:50, 신성준 <[email protected]>님이 작성:
>
> Hi hackers,
>
> The write(2) calls that flush server log output aren't covered by wait
> events. When a backend logs something, the writes go out in:
>
> - write_pipe_chunks(): write(2) to the syslogger pipe
> - write_console(): write(2) to stderr (WriteConsoleW() on Windows)
>
> If one of those blocks -- syslogger pipe full, slow console, slow log
> device -- pg_stat_activity just shows wait_event = NULL until it
> returns. Since NULL usually reads as "on CPU", a backend stuck writing
> logs looks like it's doing work, so logging-related stalls are easy to
> miss.
>
> Attached is a short series that adds two WaitEventIO events and reports
> them around those writes:
>
> IO / SysloggerWrite - write(2) to the syslogger pipe
> IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
>
> 0001 adds the events and covers the write(2) paths. 0002 does the
> Windows WriteConsoleW() path, split out since it's platform-specific.
>
> It only wraps the leaf write call and uses the existing
> pgstat_report_wait_start()/end() helpers, so it stays allocation-free
> and safe to call from inside the error-reporting path.
>
> I did a quick before/after to make sure the events show up: 8 backends
> each emitting large RAISE LOG lines, sampling wait_event from
> pg_stat_activity every 50 ms for 20 s.
>
> - logging_collector = on (syslogger pipe):
> master: NULL 100.0% (2184/2184)
> patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%
>
> - logging_collector = off (stderr):
> master: NULL 100.0% (2144/2144)
> patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%
>
> On master that wait time is just invisible; with the patch it lands on
> the new events. I can send the scripts and raw samples if anyone wants
> to reproduce it.
>
> Applies on current master. A couple of things I'm unsure about and
> would appreciate input on: whether the event names fit the surrounding
> conventions, and whether splitting the Windows path into its own patch
> is the right call.
>
> Thanks,
> Seongjun Shin
Attachments:
[application/octet-stream] v2-0001-Add-wait-events-for-server-logging-destination-wr.patch (3.8K, 2-v2-0001-Add-wait-events-for-server-logging-destination-wr.patch)
download | inline diff:
From e800c1de7de40a6de42d23742e899223567935ef Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v2 1/2] Add wait events for server logging destination writes
When a backend writes to the syslogger pipe in write_pipe_chunks() or
to stderr in write_console(), the underlying write(2) can block once
the pipe buffer fills up or the output device is slow. These blocking
syscalls were not instrumented, so pg_stat_activity reported
wait_event IS NULL during that time. Many monitoring tools interpret
NULL as on-CPU work, which made heavy-logging stalls hard to
attribute.
Add two new WaitEventIO events and report them around the relevant
write(2) calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
The instrumentation is limited to the leaf write call. It uses only
the existing pgstat_report_wait_start()/end() inline helpers, which
are allocation-free and safe to call before MyProc is set up, so this
remains safe to invoke from within error reporting paths.
---
src/backend/utils/activity/wait_event_names.txt | 2 ++
src/backend/utils/error/elog.c | 7 +++++++
2 files changed, 9 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5537a2d2530..ce33807c3fe 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -253,6 +253,8 @@ SLRU_WRITE "Waiting for a write of an SLRU page."
SNAPBUILD_READ "Waiting for a read of a serialized historical catalog snapshot."
SNAPBUILD_SYNC "Waiting for a serialized historical catalog snapshot to reach durable storage."
SNAPBUILD_WRITE "Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE "Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE "Waiting for a write to the syslogger pipe."
TIMELINE_HISTORY_FILE_SYNC "Waiting for a timeline history file received via streaming replication to reach durable storage."
TIMELINE_HISTORY_FILE_WRITE "Waiting for a write of a timeline history file received via streaming replication."
TIMELINE_HISTORY_READ "Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index aa530d3685e..d4c41fa1263 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -84,6 +84,7 @@
#include "utils/memutils.h"
#include "utils/ps_status.h"
#include "utils/varlena.h"
+#include "utils/wait_event.h"
/* In this module, access gettext() via err_gettext() */
@@ -2662,7 +2663,9 @@ write_console(const char *line, int len)
* We ignore any error from write() here. We have no useful way to report
* it ... certainly whining on stderr isn't likely to be productive.
*/
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
rc = write(fileno(stderr), line, len);
+ pgstat_report_wait_end();
(void) rc;
}
@@ -3503,7 +3506,9 @@ write_pipe_chunks(char *data, int len, int dest)
/* no need to set PIPE_PROTO_IS_LAST yet */
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_end();
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
@@ -3513,7 +3518,9 @@ write_pipe_chunks(char *data, int len, int dest)
p.proto.flags |= PIPE_PROTO_IS_LAST;
p.proto.len = len;
memcpy(p.proto.data, data, len);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+ pgstat_report_wait_end();
(void) rc;
}
--
2.50.1 (Apple Git-155)
[application/octet-stream] v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch (1.3K, 3-v2-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch)
download | inline diff:
From 821352a11872392547c0ac6790878ef02d0284b4 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v2 2/2] Report StderrWrite wait event around WriteConsoleW()
on Windows
On Windows, write_console() emits log messages to the console with
WriteConsoleW() rather than write(2). Like the write(2) path, this
call can block on a slow console, leaving wait_event IS NULL in
pg_stat_activity.
Wrap WriteConsoleW() with the StderrWrite wait event introduced in
the previous patch, so the Windows console path is instrumented
consistently with the stderr write(2) path.
---
src/backend/utils/error/elog.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index d4c41fa1263..9ab75b99f5c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2635,11 +2635,14 @@ write_console(const char *line, int len)
DWORD written;
stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
{
+ pgstat_report_wait_end();
pfree(utf16);
return;
}
+ pgstat_report_wait_end();
/*
* In case WriteConsoleW() failed, fall back to writing the
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-05-31 16:53 Kirk Wolak <[email protected]>
parent: 신성준 <[email protected]>
2 siblings, 0 replies; 9+ messages in thread
From: Kirk Wolak @ 2026-05-31 16:53 UTC (permalink / raw)
To: 신성준 <[email protected]>; +Cc: [email protected]; Andrey Borodin <[email protected]>; Andreas Karlsson <[email protected]>; Nikolay Samokhvalov <[email protected]>
On Sun, May 31, 2026 at 4:50 AM 신성준 <[email protected]> wrote:
> Hi hackers,
>
> The write(2) calls that flush server log output aren't covered by wait
> events. When a backend logs something, the writes go out in:
>
> - write_pipe_chunks(): write(2) to the syslogger pipe
> - write_console(): write(2) to stderr (WriteConsoleW() on Windows)
>
> If one of those blocks -- syslogger pipe full, slow console, slow log
> device -- pg_stat_activity just shows wait_event = NULL until it
> returns. Since NULL usually reads as "on CPU", a backend stuck writing
> logs looks like it's doing work, so logging-related stalls are easy to
> miss.
>
> Attached is a short series that adds two WaitEventIO events and reports
> them around those writes:
>
> IO / SysloggerWrite - write(2) to the syslogger pipe
> IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
>
> 0001 adds the events and covers the write(2) paths. 0002 does the
> Windows WriteConsoleW() path, split out since it's platform-specific.
>
> It only wraps the leaf write call and uses the existing
> pgstat_report_wait_start()/end() helpers, so it stays allocation-free
> and safe to call from inside the error-reporting path.
>
> I did a quick before/after to make sure the events show up: 8 backends
> each emitting large RAISE LOG lines, sampling wait_event from
> pg_stat_activity every 50 ms for 20 s.
>
> - logging_collector = on (syslogger pipe):
> master: NULL 100.0% (2184/2184)
> patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%
>
> - logging_collector = off (stderr):
> master: NULL 100.0% (2144/2144)
> patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%
>
> On master that wait time is just invisible; with the patch it lands on
> the new events. I can send the scripts and raw samples if anyone wants
> to reproduce it.
>
+1
Nice. We have too many waits that are registered as CPU.
>
>
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-01 04:37 Michael Paquier <[email protected]>
parent: 신성준 <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: Michael Paquier @ 2026-06-01 04:37 UTC (permalink / raw)
To: 신성준 <[email protected]>; +Cc: [email protected]; Kirk Wolak <[email protected]>; Andrey Borodin <[email protected]>; Andreas Karlsson <[email protected]>; Nikolay Samokhvalov <[email protected]>
On Sun, May 31, 2026 at 07:42:41PM +0900, 신성준 wrote:
> cfbot caught a build failure on v1, in the SanityCheck task on Linux
> and Windows: elog.c uses pgstat_report_wait_start()/end() and the
> WAIT_EVENT_* constants but didn't include utils/wait_event.h. It only
> built here because of an accidental transitive include on my machine;
> on the CI images the declarations weren't visible.
>
> v2 fixes that by adding the missing #include "utils/wait_event.h" to
> elog.c, folded into 0001 so that patch builds on its own. No other
> changes; the wait events and the reported write paths are the same as
> in v1.
>
> v2-0001 adds the two events and covers the write(2) paths.
> v2-0002 covers the Windows WriteConsoleW() path, split out as before.
>
> Applies cleanly on current master; full build passes locally.
Hmm. Usually we split the event numbers so as there is one for each
code path, but here we are just dealing with the same routine that
sends chunks. Using the same numbers seem fine by me.
If others have any thoughts or comments, feel free.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-01 05:49 Kyotaro Horiguchi <[email protected]>
parent: 신성준 <[email protected]>
2 siblings, 1 reply; 9+ messages in thread
From: Kyotaro Horiguchi @ 2026-06-01 05:49 UTC (permalink / raw)
To: [email protected]; +Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Hello.
At Sun, 31 May 2026 17:50:08 +0900, 신성준 <[email protected]> wrote in
> Attached is a short series that adds two WaitEventIO events and reports
> them around those writes:
>
> IO / SysloggerWrite - write(2) to the syslogger pipe
> IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
>
> 0001 adds the events and covers the write(2) paths. 0002 does the
> Windows WriteConsoleW() path, split out since it's platform-specific.
Should we also consider instrumenting ReportEventW()/ReportEventA()?
They seem to be another Windows-specific logging output path.
Also, if the intention is to cover all places where logging output can
block, I wonder whether the syslog() calls should be covered as
well. If they are intentionally excluded, perhaps a short comment
explaining the rationale would be useful.
Regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-06 16:25 신성준 <[email protected]>
parent: Kyotaro Horiguchi <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: 신성준 @ 2026-06-06 16:25 UTC (permalink / raw)
To: [email protected]; +Cc: [email protected]; Kyotaro Horiguchi <[email protected]>; [email protected]; [email protected]; [email protected]
Hi,
Thanks Kirk, glad it's useful.
Kyotaro Horiguchi wrote:
> Should we also consider instrumenting ReportEventW()/ReportEventA()?
> They seem to be another Windows-specific logging output path.
>
> Also, if the intention is to cover all places where logging output
> can block, I wonder whether the syslog() calls should be covered as
> well.
Good points -- both are blocking output paths and there's no real
reason to leave them out, so v3 instruments them rather than excluding
them with a comment. The intent is exactly to cover the places where
logging output can block, so this makes the series consistent.
v3 adds two more WaitEventIO events:
IO / SyslogWrite - syslog(3) in write_syslog()
IO / EventlogWrite - ReportEventW()/ReportEventA() in write_eventlog()
Same approach as before: the wait is reported only around the leaf
call, using the existing pgstat_report_wait_start()/end() helpers, so
it stays allocation-free and safe on the error-reporting path, and the
series still touches just elog.c and wait_event_names.txt.
This also matches Michael's point on v2 -- each event covers a routine
rather than a single call site, so SyslogWrite wraps the syslog(3)
calls in write_syslog() and EventlogWrite wraps both ReportEvent
variants in write_eventlog(), the same way SysloggerWrite already
covers the two writes in write_pipe_chunks().
As before, 0001 is the portable part and 0002 is the Windows part
(WriteConsoleW plus the new EventlogWrite).
One caveat: EventlogWrite is Windows-only, so I couldn't get a runtime
before/after for it here -- I've only confirmed it builds (cfbot's
Windows task should cover that). The other events still show up in the
sampling I posted earlier. If someone on Windows can exercise the
event-log path I'd appreciate a confirmation.
Applies cleanly on current master; full build passes locally on both
Autoconf and Meson, with no new warnings.
Thanks,
Seongjun Shin
2026년 6월 1일 (월) 오후 2:49, Kyotaro Horiguchi <[email protected]>님이 작성:
>
> Hello.
>
> At Sun, 31 May 2026 17:50:08 +0900, 신성준 <[email protected]> wrote in
> > Attached is a short series that adds two WaitEventIO events and reports
> > them around those writes:
> >
> > IO / SysloggerWrite - write(2) to the syslogger pipe
> > IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
> >
> > 0001 adds the events and covers the write(2) paths. 0002 does the
> > Windows WriteConsoleW() path, split out since it's platform-specific.
>
> Should we also consider instrumenting ReportEventW()/ReportEventA()?
> They seem to be another Windows-specific logging output path.
>
> Also, if the intention is to cover all places where logging output can
> block, I wonder whether the syslog() calls should be covered as
> well. If they are intentionally excluded, perhaps a short comment
> explaining the rationale would be useful.
>
> Regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
Attachments:
[application/octet-stream] v3-0001-Add-wait-events-for-server-logging-destination-wr.patch (4.6K, 2-v3-0001-Add-wait-events-for-server-logging-destination-wr.patch)
download | inline diff:
From f04f12afc75e692516d3d5042ec93b4833dbfe61 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v3 1/2] Add wait events for server logging destination writes
When a backend writes server log output, the underlying call can
block: write(2) to the syslogger pipe or to stderr once the pipe
buffer fills up or the output device is slow, and syslog(3) when the
system logger is slow. These blocking calls were not instrumented, so
pg_stat_activity reported wait_event IS NULL during that time. Many
monitoring tools interpret NULL as on-CPU work, which made
heavy-logging stalls hard to attribute.
Add three new WaitEventIO events and report them around the relevant
calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
IO / SyslogWrite - syslog(3) inside write_syslog().
The instrumentation is limited to the leaf write/syslog call. It uses
only the existing pgstat_report_wait_start()/end() inline helpers,
which are allocation-free and safe to call before MyProc is set up, so
this remains safe to invoke from within error reporting paths.
---
src/backend/utils/activity/wait_event_names.txt | 3 +++
src/backend/utils/error/elog.c | 11 +++++++++++
2 files changed, 14 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5537a2d2530..fa528df03f1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -253,6 +253,9 @@ SLRU_WRITE "Waiting for a write of an SLRU page."
SNAPBUILD_READ "Waiting for a read of a serialized historical catalog snapshot."
SNAPBUILD_SYNC "Waiting for a serialized historical catalog snapshot to reach durable storage."
SNAPBUILD_WRITE "Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE "Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE "Waiting for a write to the syslogger pipe."
+SYSLOG_WRITE "Waiting for a write to the system logger (syslog)."
TIMELINE_HISTORY_FILE_SYNC "Waiting for a timeline history file received via streaming replication to reach durable storage."
TIMELINE_HISTORY_FILE_WRITE "Waiting for a write of a timeline history file received via streaming replication."
TIMELINE_HISTORY_READ "Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index aa530d3685e..22d2cf8e79f 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -84,6 +84,7 @@
#include "utils/memutils.h"
#include "utils/ps_status.h"
#include "utils/varlena.h"
+#include "utils/wait_event.h"
/* In this module, access gettext() via err_gettext() */
@@ -2466,10 +2467,12 @@ write_syslog(int level, const char *line)
chunk_nr++;
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOG_WRITE);
if (syslog_sequence_numbers)
syslog(level, "[%lu-%d] %s", seq, chunk_nr, buf);
else
syslog(level, "[%d] %s", chunk_nr, buf);
+ pgstat_report_wait_end();
line += buflen;
len -= buflen;
@@ -2478,10 +2481,12 @@ write_syslog(int level, const char *line)
else
{
/* message short enough */
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOG_WRITE);
if (syslog_sequence_numbers)
syslog(level, "[%lu] %s", seq, line);
else
syslog(level, "%s", line);
+ pgstat_report_wait_end();
}
}
#endif /* HAVE_SYSLOG */
@@ -2662,7 +2667,9 @@ write_console(const char *line, int len)
* We ignore any error from write() here. We have no useful way to report
* it ... certainly whining on stderr isn't likely to be productive.
*/
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
rc = write(fileno(stderr), line, len);
+ pgstat_report_wait_end();
(void) rc;
}
@@ -3503,7 +3510,9 @@ write_pipe_chunks(char *data, int len, int dest)
/* no need to set PIPE_PROTO_IS_LAST yet */
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_end();
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
@@ -3513,7 +3522,9 @@ write_pipe_chunks(char *data, int len, int dest)
p.proto.flags |= PIPE_PROTO_IS_LAST;
p.proto.len = len;
memcpy(p.proto.data, data, len);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+ pgstat_report_wait_end();
(void) rc;
}
--
2.50.1 (Apple Git-155)
[application/octet-stream] v3-0002-Add-wait-events-for-Windows-specific-logging-outp.patch (3.5K, 3-v3-0002-Add-wait-events-for-Windows-specific-logging-outp.patch)
download | inline diff:
From 414d91c9ad35754a6dbdeb7d459157aa53ca4f6a Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v3 2/2] Add wait events for Windows-specific logging output
paths
On Windows, log output goes through paths other than write(2) that can
also block: write_console() emits to the console with WriteConsoleW(),
and write_eventlog() writes to the Windows event log with
ReportEventW()/ReportEventA(). Like the write(2) path, these calls can
block and previously left wait_event IS NULL in pg_stat_activity.
Wrap WriteConsoleW() with the StderrWrite event introduced in the
previous patch, and add a new EventlogWrite event for the event log
path:
IO / EventlogWrite - ReportEventW()/ReportEventA() inside
write_eventlog().
This instruments the Windows logging paths consistently with the
write(2) and syslog(3) paths. It is split out from the previous patch
since it is platform-specific.
---
src/backend/utils/activity/wait_event_names.txt | 1 +
src/backend/utils/error/elog.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index fa528df03f1..59ec023dbf1 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -223,6 +223,7 @@ DATA_FILE_TRUNCATE "Waiting for a relation data file to be truncated."
DATA_FILE_WRITE "Waiting for a write to a relation data file."
DSM_ALLOCATE "Waiting for a dynamic shared memory segment to be allocated."
DSM_FILL_ZERO_WRITE "Waiting to fill a dynamic shared memory backing file with zeroes."
+EVENTLOG_WRITE "Waiting for a write to the Windows event log."
LOCK_FILE_ADDTODATADIR_READ "Waiting for a read while adding a line to the data directory lock file."
LOCK_FILE_ADDTODATADIR_SYNC "Waiting for data to reach durable storage while adding a line to the data directory lock file."
LOCK_FILE_ADDTODATADIR_WRITE "Waiting for a write while adding a line to the data directory lock file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 22d2cf8e79f..0679cabd245 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2574,6 +2574,7 @@ write_eventlog(int level, const char *line, int len)
utf16 = pgwin32_message_to_UTF16(line, len, NULL);
if (utf16)
{
+ pgstat_report_wait_start(WAIT_EVENT_EVENTLOG_WRITE);
ReportEventW(evtHandle,
eventlevel,
0,
@@ -2583,12 +2584,14 @@ write_eventlog(int level, const char *line, int len)
0,
(LPCWSTR *) &utf16,
NULL);
+ pgstat_report_wait_end();
/* XXX Try ReportEventA() when ReportEventW() fails? */
pfree(utf16);
return;
}
}
+ pgstat_report_wait_start(WAIT_EVENT_EVENTLOG_WRITE);
ReportEventA(evtHandle,
eventlevel,
0,
@@ -2598,6 +2601,7 @@ write_eventlog(int level, const char *line, int len)
0,
&line,
NULL);
+ pgstat_report_wait_end();
}
#endif /* WIN32 */
@@ -2639,11 +2643,14 @@ write_console(const char *line, int len)
DWORD written;
stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
{
+ pgstat_report_wait_end();
pfree(utf16);
return;
}
+ pgstat_report_wait_end();
/*
* In case WriteConsoleW() failed, fall back to writing the
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-06 16:52 신성준 <[email protected]>
parent: 신성준 <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: 신성준 @ 2026-06-06 16:52 UTC (permalink / raw)
To: [email protected]; +Cc: [email protected]; Kyotaro Horiguchi <[email protected]>; [email protected]; [email protected]; [email protected]
Hi,
cfbot flagged v3 as needing a rebase -- it stopped applying after the
recent changes to elog.c (the switch to a const WCHAR pointer in
write_eventlog(), pgindent, etc.) and the new COPY pipe wait events in
wait_event_names.txt.
v4 is the same change rebased over current master, no functional
difference from v3. The only real conflict was in write_eventlog(),
where the EventlogWrite wrapping now sits on top of the const
utf16_const pointer; everything else merged cleanly.
Still applies as two patches:
v4-0001 - portable part (SysloggerWrite, StderrWrite, SyslogWrite)
v4-0002 - Windows part (WriteConsoleW plus EventlogWrite)
Builds clean on current master with both Autoconf and Meson, no new
warnings.
Thanks,
Seongjun Shin
2026년 6월 7일 (일) 오전 1:25, 신성준 <[email protected]>님이 작성:
>
> Hi,
>
> Thanks Kirk, glad it's useful.
>
> Kyotaro Horiguchi wrote:
> > Should we also consider instrumenting ReportEventW()/ReportEventA()?
> > They seem to be another Windows-specific logging output path.
> >
> > Also, if the intention is to cover all places where logging output
> > can block, I wonder whether the syslog() calls should be covered as
> > well.
>
> Good points -- both are blocking output paths and there's no real
> reason to leave them out, so v3 instruments them rather than excluding
> them with a comment. The intent is exactly to cover the places where
> logging output can block, so this makes the series consistent.
>
> v3 adds two more WaitEventIO events:
>
> IO / SyslogWrite - syslog(3) in write_syslog()
> IO / EventlogWrite - ReportEventW()/ReportEventA() in write_eventlog()
>
> Same approach as before: the wait is reported only around the leaf
> call, using the existing pgstat_report_wait_start()/end() helpers, so
> it stays allocation-free and safe on the error-reporting path, and the
> series still touches just elog.c and wait_event_names.txt.
>
> This also matches Michael's point on v2 -- each event covers a routine
> rather than a single call site, so SyslogWrite wraps the syslog(3)
> calls in write_syslog() and EventlogWrite wraps both ReportEvent
> variants in write_eventlog(), the same way SysloggerWrite already
> covers the two writes in write_pipe_chunks().
>
> As before, 0001 is the portable part and 0002 is the Windows part
> (WriteConsoleW plus the new EventlogWrite).
>
> One caveat: EventlogWrite is Windows-only, so I couldn't get a runtime
> before/after for it here -- I've only confirmed it builds (cfbot's
> Windows task should cover that). The other events still show up in the
> sampling I posted earlier. If someone on Windows can exercise the
> event-log path I'd appreciate a confirmation.
>
> Applies cleanly on current master; full build passes locally on both
> Autoconf and Meson, with no new warnings.
>
> Thanks,
> Seongjun Shin
>
> 2026년 6월 1일 (월) 오후 2:49, Kyotaro Horiguchi <[email protected]>님이 작성:
> >
> > Hello.
> >
> > At Sun, 31 May 2026 17:50:08 +0900, 신성준 <[email protected]> wrote in
> > > Attached is a short series that adds two WaitEventIO events and reports
> > > them around those writes:
> > >
> > > IO / SysloggerWrite - write(2) to the syslogger pipe
> > > IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
> > >
> > > 0001 adds the events and covers the write(2) paths. 0002 does the
> > > Windows WriteConsoleW() path, split out since it's platform-specific.
> >
> > Should we also consider instrumenting ReportEventW()/ReportEventA()?
> > They seem to be another Windows-specific logging output path.
> >
> > Also, if the intention is to cover all places where logging output can
> > block, I wonder whether the syslog() calls should be covered as
> > well. If they are intentionally excluded, perhaps a short comment
> > explaining the rationale would be useful.
> >
> > Regards.
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center
Attachments:
[application/octet-stream] v4-0001-Add-wait-events-for-server-logging-destination-wr.patch (4.6K, 2-v4-0001-Add-wait-events-for-server-logging-destination-wr.patch)
download | inline diff:
From cf7542da3afb01f0eb61031a28c7c8d44f3dc04d Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v4 1/2] Add wait events for server logging destination writes
When a backend writes server log output, the underlying call can
block: write(2) to the syslogger pipe or to stderr once the pipe
buffer fills up or the output device is slow, and syslog(3) when the
system logger is slow. These blocking calls were not instrumented, so
pg_stat_activity reported wait_event IS NULL during that time. Many
monitoring tools interpret NULL as on-CPU work, which made
heavy-logging stalls hard to attribute.
Add three new WaitEventIO events and report them around the relevant
calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
IO / SyslogWrite - syslog(3) inside write_syslog().
The instrumentation is limited to the leaf write/syslog call. It uses
only the existing pgstat_report_wait_start()/end() inline helpers,
which are allocation-free and safe to call before MyProc is set up, so
this remains safe to invoke from within error reporting paths.
---
src/backend/utils/activity/wait_event_names.txt | 3 +++
src/backend/utils/error/elog.c | 11 +++++++++++
2 files changed, 14 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 560659f9568..d4a0f6361ad 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -258,6 +258,9 @@ SLRU_WRITE "Waiting for a write of an SLRU page."
SNAPBUILD_READ "Waiting for a read of a serialized historical catalog snapshot."
SNAPBUILD_SYNC "Waiting for a serialized historical catalog snapshot to reach durable storage."
SNAPBUILD_WRITE "Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE "Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE "Waiting for a write to the syslogger pipe."
+SYSLOG_WRITE "Waiting for a write to the system logger (syslog)."
TIMELINE_HISTORY_FILE_SYNC "Waiting for a timeline history file received via streaming replication to reach durable storage."
TIMELINE_HISTORY_FILE_WRITE "Waiting for a write of a timeline history file received via streaming replication."
TIMELINE_HISTORY_READ "Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 50c53b571a0..8d659b096f4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -89,6 +89,7 @@
#include "utils/pg_locale.h"
#include "utils/ps_status.h"
#include "utils/varlena.h"
+#include "utils/wait_event.h"
/* In this module, access gettext() via err_gettext() */
@@ -2890,10 +2891,12 @@ write_syslog(int level, const char *line)
chunk_nr++;
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOG_WRITE);
if (syslog_sequence_numbers)
syslog(level, "[%lu-%d] %s", seq, chunk_nr, buf);
else
syslog(level, "[%d] %s", chunk_nr, buf);
+ pgstat_report_wait_end();
line += buflen;
len -= buflen;
@@ -2902,10 +2905,12 @@ write_syslog(int level, const char *line)
else
{
/* message short enough */
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOG_WRITE);
if (syslog_sequence_numbers)
syslog(level, "[%lu] %s", seq, line);
else
syslog(level, "%s", line);
+ pgstat_report_wait_end();
}
}
#endif /* HAVE_SYSLOG */
@@ -3090,7 +3095,9 @@ write_console(const char *line, int len)
* We ignore any error from write() here. We have no useful way to report
* it ... certainly whining on stderr isn't likely to be productive.
*/
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
rc = write(fileno(stderr), line, len);
+ pgstat_report_wait_end();
(void) rc;
}
@@ -3937,7 +3944,9 @@ write_pipe_chunks(char *data, int len, int dest)
/* no need to set PIPE_PROTO_IS_LAST yet */
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_end();
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
@@ -3947,7 +3956,9 @@ write_pipe_chunks(char *data, int len, int dest)
p.proto.flags |= PIPE_PROTO_IS_LAST;
p.proto.len = len;
memcpy(p.proto.data, data, len);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+ pgstat_report_wait_end();
(void) rc;
}
--
2.50.1 (Apple Git-155)
[application/octet-stream] v4-0002-Add-wait-events-for-Windows-specific-logging-outp.patch (3.5K, 3-v4-0002-Add-wait-events-for-Windows-specific-logging-outp.patch)
download | inline diff:
From fed7ad64e8113132e86d045917bb18b69c843aa6 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v4 2/2] Add wait events for Windows-specific logging output
paths
On Windows, log output goes through paths other than write(2) that can
also block: write_console() emits to the console with WriteConsoleW(),
and write_eventlog() writes to the Windows event log with
ReportEventW()/ReportEventA(). Like the write(2) path, these calls can
block and previously left wait_event IS NULL in pg_stat_activity.
Wrap WriteConsoleW() with the StderrWrite event introduced in the
previous patch, and add a new EventlogWrite event for the event log
path:
IO / EventlogWrite - ReportEventW()/ReportEventA() inside
write_eventlog().
This instruments the Windows logging paths consistently with the
write(2) and syslog(3) paths. It is split out from the previous patch
since it is platform-specific.
---
src/backend/utils/activity/wait_event_names.txt | 1 +
src/backend/utils/error/elog.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index d4a0f6361ad..bcfe9049734 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -228,6 +228,7 @@ DATA_FILE_TRUNCATE "Waiting for a relation data file to be truncated."
DATA_FILE_WRITE "Waiting for a write to a relation data file."
DSM_ALLOCATE "Waiting for a dynamic shared memory segment to be allocated."
DSM_FILL_ZERO_WRITE "Waiting to fill a dynamic shared memory backing file with zeroes."
+EVENTLOG_WRITE "Waiting for a write to the Windows event log."
LOCK_FILE_ADDTODATADIR_READ "Waiting for a read while adding a line to the data directory lock file."
LOCK_FILE_ADDTODATADIR_SYNC "Waiting for data to reach durable storage while adding a line to the data directory lock file."
LOCK_FILE_ADDTODATADIR_WRITE "Waiting for a write while adding a line to the data directory lock file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 8d659b096f4..8b9a2de5b9c 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -3002,6 +3002,7 @@ write_eventlog(int level, const char *line, int len)
{
const WCHAR *utf16_const = utf16;
+ pgstat_report_wait_start(WAIT_EVENT_EVENTLOG_WRITE);
ReportEventW(evtHandle,
eventlevel,
0,
@@ -3011,12 +3012,14 @@ write_eventlog(int level, const char *line, int len)
0,
&utf16_const,
NULL);
+ pgstat_report_wait_end();
/* XXX Try ReportEventA() when ReportEventW() fails? */
pfree(utf16);
return;
}
}
+ pgstat_report_wait_start(WAIT_EVENT_EVENTLOG_WRITE);
ReportEventA(evtHandle,
eventlevel,
0,
@@ -3026,6 +3029,7 @@ write_eventlog(int level, const char *line, int len)
0,
&line,
NULL);
+ pgstat_report_wait_end();
}
#endif /* WIN32 */
@@ -3067,11 +3071,14 @@ write_console(const char *line, int len)
DWORD written;
stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
{
+ pgstat_report_wait_end();
pfree(utf16);
return;
}
+ pgstat_report_wait_end();
/*
* In case WriteConsoleW() failed, fall back to writing the
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-14 18:48 Andrey Borodin <[email protected]>
parent: 신성준 <[email protected]>
0 siblings, 1 reply; 9+ messages in thread
From: Andrey Borodin @ 2026-06-14 18:48 UTC (permalink / raw)
To: 신성준 <[email protected]>; +Cc: pgsql-hackers mailing list <[email protected]>; Kirk Wolak <[email protected]>; Kyotaro Horiguchi <[email protected]>; Andreas Karlsson <[email protected]>; [email protected]
> On 6 Jun 2026, at 21:52, 신성준 <[email protected]> wrote:
>
Hi Seongjun,
Naming is a hard thing. Anyone who can tell syslog from syslogger
without checking man is probably a computer themselves.
+1: a blocking logging write shows as wait_event IS NULL today, which
most tools read as on-CPU. This is very real for us. In our managed
Postgres the root filesystem that holds the logs sits on slow
network-backed HDD, so when the syslogger pipe backs up the backends
stall on the log write and the whole node looks like a CPU overload.
This patch would have made that directly visible.
Worth a line in the commit message: the call that actually blocks on a
slow log device is write_syslogger_file() in the syslogger, which has no
shared memory and isn't in pg_stat_activity. It surfaces on backends as
the pipe fills and they block on the pipe write (SysloggerWrite), so
the backend side is the right, visible layer.
One thing worth checking: wait events are single-slot by design (no
nesting - discussed back in 2016 [0]), and pgstat_report_wait_end()
just writes 0. So if any ereport() path runs while a wait event is
already set, instrumenting the logging write here clobbers that outer
event. Probably rare, but I'm not sure it never happens.
EventlogWrite necessarily shows up in the event list on all platforms
(the generator has no per-platform gating), but the description already
says "Windows event log", so that seems fine...
Thanks for working on this!
Best regards, Andrey Borodin.
[0] https://www.postgresql.org/message-id/CANP8%2BjKsS6SDo011AUWrLdBcBMv0KJha69t7eFGqEtqx9FVfag%40mail.g...
^ permalink raw reply [nested|flat] 9+ messages in thread
* Re: Add wait events for server logging destination writes
@ 2026-06-16 16:10 신성준 <[email protected]>
parent: Andrey Borodin <[email protected]>
0 siblings, 0 replies; 9+ messages in thread
From: 신성준 @ 2026-06-16 16:10 UTC (permalink / raw)
To: Andrey Borodin <[email protected]>; +Cc: pgsql-hackers mailing list <[email protected]>; Kirk Wolak <[email protected]>; Kyotaro Horiguchi <[email protected]>; Andreas Karlsson <[email protected]>; [email protected]
Hi Andrey,
> +1: a blocking logging write shows as wait_event IS NULL today, which most
> tools read as on-CPU. This is very real for us. In our managed Postgres the
> root filesystem that holds the logs sits on slow network-backed HDD, so when
> the syslogger pipe backs up the backends stall on the log write and the whole
> node looks like a CPU overload. This patch would have made that directly
> visible.
Thanks for the review, and for the production example. That node-looks-like-a-
CPU-overload case is exactly what this is meant to make visible.
> Worth a line in the commit message: the call that actually blocks on a slow
> log device is write_syslogger_file() in the syslogger, which has no shared
> memory and isn't in pg_stat_activity. It surfaces on backends as the pipe
> fills and they block on the pipe write (SysloggerWrite), so the backend side
> is the right, visible layer.
Agreed. In v5 I'll add that to the commit message, and a short code comment to
the same effect: the syslogger does the file write in write_syslogger_file()
and isn't in pg_stat_activity, so the backend-side SysloggerWrite is where the
stall becomes visible.
> Naming is a hard thing. Anyone who can tell syslog from syslogger without
> checking man is probably a computer themselves.
Fair. I kept each name aligned with the routine it wraps, and the descriptions
spell out the difference, so I'd lean toward leaving them unless others object.
> One thing worth checking: wait events are single-slot by design (no nesting -
> discussed back in 2016 [0]), and pgstat_report_wait_end() just writes 0. So
> if any ereport() path runs while a wait event is already set, instrumenting
> the logging write here clobbers that outer event. Probably rare, but I'm not
> sure it never happens.
Right about the mechanism. pgstat_report_wait_end() unconditionally writes 0,
so a log message emitted while an outer wait event is set would briefly mask
that event until the outer code reports its own end.
Two things, though. First, this is a pre-existing property of the subsystem,
not something the patch introduces: every pgstat_report_wait_start()/end() pair
behaves this way, and as the 2016 thread you linked concluded, the single
unsynchronized write is what keeps the mechanism cheap enough to stay on by
default, so nesting was deliberately left out. The patch follows that
convention rather than diverging from it. Second, the wrapped regions are tight
and log output is normally emitted outside them, so the overlap window should
be rare in practice, though I agree it isn't provably never.
So I'd prefer not to add save/restore here, since that would diverge from what
the rest of the tree relies on. I'll add a brief comment at these sites noting
the single-slot behavior. If reviewers would rather have a guard, a minimal
option that stays within the single-write model is to report the logging event
only when the slot is currently 0, which preserves the outer event at the cost
of not showing the logging write while nested. Happy to go that way if that's
the consensus.
> EventlogWrite necessarily shows up in the event list on all platforms (the
> generator has no per-platform gating), but the description already says
> "Windows event log", so that seems fine...
Agreed, I'll leave it as-is.
I'll hold v5 until Nikolay's Linux and Windows testing comments land too, so I
can fold everything into one revision.
Thanks again,
Seongjun
2026년 6월 15일 (월) 오전 3:49, Andrey Borodin <[email protected]>님이 작성:
>
>
>
> > On 6 Jun 2026, at 21:52, 신성준 <[email protected]> wrote:
> >
>
> Hi Seongjun,
>
> Naming is a hard thing. Anyone who can tell syslog from syslogger
> without checking man is probably a computer themselves.
>
> +1: a blocking logging write shows as wait_event IS NULL today, which
> most tools read as on-CPU. This is very real for us. In our managed
> Postgres the root filesystem that holds the logs sits on slow
> network-backed HDD, so when the syslogger pipe backs up the backends
> stall on the log write and the whole node looks like a CPU overload.
> This patch would have made that directly visible.
>
> Worth a line in the commit message: the call that actually blocks on a
> slow log device is write_syslogger_file() in the syslogger, which has no
> shared memory and isn't in pg_stat_activity. It surfaces on backends as
> the pipe fills and they block on the pipe write (SysloggerWrite), so
> the backend side is the right, visible layer.
>
> One thing worth checking: wait events are single-slot by design (no
> nesting - discussed back in 2016 [0]), and pgstat_report_wait_end()
> just writes 0. So if any ereport() path runs while a wait event is
> already set, instrumenting the logging write here clobbers that outer
> event. Probably rare, but I'm not sure it never happens.
>
> EventlogWrite necessarily shows up in the event list on all platforms
> (the generator has no per-platform gating), but the description already
> says "Windows event log", so that seems fine...
>
> Thanks for working on this!
>
>
> Best regards, Andrey Borodin.
>
> [0] https://www.postgresql.org/message-id/CANP8%2BjKsS6SDo011AUWrLdBcBMv0KJha69t7eFGqEtqx9FVfag%40mail.g...
>
>
^ permalink raw reply [nested|flat] 9+ messages in thread
end of thread, other threads:[~2026-06-16 16:10 UTC | newest]
Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-31 08:50 Add wait events for server logging destination writes 신성준 <[email protected]>
2026-05-31 10:42 ` 신성준 <[email protected]>
2026-06-01 04:37 ` Michael Paquier <[email protected]>
2026-05-31 16:53 ` Kirk Wolak <[email protected]>
2026-06-01 05:49 ` Kyotaro Horiguchi <[email protected]>
2026-06-06 16:25 ` 신성준 <[email protected]>
2026-06-06 16:52 ` 신성준 <[email protected]>
2026-06-14 18:48 ` Andrey Borodin <[email protected]>
2026-06-16 16:10 ` 신성준 <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox