public inbox for [email protected]
help / color / mirror / Atom feedFrom: 신성준 <[email protected]>
To: [email protected]
Cc: Kirk Wolak <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Andreas Karlsson <[email protected]>
Cc: Nikolay Samokhvalov <[email protected]>
Subject: Add wait events for server logging destination writes
Date: Sun, 31 May 2026 17:50:08 +0900
Message-ID: <CACdN0M78U+GvpqA7oey-GA7fFSYM636aDp6H9FVvCztv9zXxSA@mail.gmail.com> (raw)
Hi hackers,
The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:
- write_pipe_chunks(): write(2) to the syslogger pipe
- write_console(): write(2) to stderr (WriteConsoleW() on Windows)
If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.
Attached is a short series that adds two WaitEventIO events and reports
them around those writes:
IO / SysloggerWrite - write(2) to the syslogger pipe
IO / StderrWrite - write(2) to stderr, and WriteConsoleW()
0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.
It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.
I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.
- logging_collector = on (syslogger pipe):
master: NULL 100.0% (2184/2184)
patched: IO/SysloggerWrite 99.1% (2204/2224), NULL 0.9%
- logging_collector = off (stderr):
master: NULL 100.0% (2144/2144)
patched: IO/StderrWrite 90.7% (1952/2152), NULL 9.3%
On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.
Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.
Thanks,
Seongjun Shin
Attachments:
[application/octet-stream] v1-0001-Add-wait-events-for-server-logging-destination-wr.patch (3.6K, 2-v1-0001-Add-wait-events-for-server-logging-destination-wr.patch)
download | inline diff:
From 9c5907b11e5f20f9424757f474f8b9cb8c3b4266 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v1 1/2] Add wait events for server logging destination writes
When a backend writes to the syslogger pipe in write_pipe_chunks() or
to stderr in write_console(), the underlying write(2) can block once
the pipe buffer fills up or the output device is slow. These blocking
syscalls were not instrumented, so pg_stat_activity reported
wait_event IS NULL during that time. Many monitoring tools interpret
NULL as on-CPU work, which made heavy-logging stalls hard to
attribute.
Add two new WaitEventIO events and report them around the relevant
write(2) calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
The instrumentation is limited to the leaf write call. It uses only
the existing pgstat_report_wait_start()/end() inline helpers, which
are allocation-free and safe to call before MyProc is set up, so this
remains safe to invoke from within error reporting paths.
---
src/backend/utils/activity/wait_event_names.txt | 2 ++
src/backend/utils/error/elog.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5537a2d2530..ce33807c3fe 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -253,6 +253,8 @@ SLRU_WRITE "Waiting for a write of an SLRU page."
SNAPBUILD_READ "Waiting for a read of a serialized historical catalog snapshot."
SNAPBUILD_SYNC "Waiting for a serialized historical catalog snapshot to reach durable storage."
SNAPBUILD_WRITE "Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE "Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE "Waiting for a write to the syslogger pipe."
TIMELINE_HISTORY_FILE_SYNC "Waiting for a timeline history file received via streaming replication to reach durable storage."
TIMELINE_HISTORY_FILE_WRITE "Waiting for a write of a timeline history file received via streaming replication."
TIMELINE_HISTORY_READ "Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index aa530d3685e..fa38d6c6df8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2662,7 +2662,9 @@ write_console(const char *line, int len)
* We ignore any error from write() here. We have no useful way to report
* it ... certainly whining on stderr isn't likely to be productive.
*/
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
rc = write(fileno(stderr), line, len);
+ pgstat_report_wait_end();
(void) rc;
}
@@ -3503,7 +3505,9 @@ write_pipe_chunks(char *data, int len, int dest)
/* no need to set PIPE_PROTO_IS_LAST yet */
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+ pgstat_report_wait_end();
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
@@ -3513,7 +3517,9 @@ write_pipe_chunks(char *data, int len, int dest)
p.proto.flags |= PIPE_PROTO_IS_LAST;
p.proto.len = len;
memcpy(p.proto.data, data, len);
+ pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+ pgstat_report_wait_end();
(void) rc;
}
--
2.50.1 (Apple Git-155)
[application/octet-stream] v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch (1.3K, 3-v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch)
download | inline diff:
From 4d67d15c56da804b541426f5583f98c8b77cb19f Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v1 2/2] Report StderrWrite wait event around WriteConsoleW()
on Windows
On Windows, write_console() emits log messages to the console with
WriteConsoleW() rather than write(2). Like the write(2) path, this
call can block on a slow console, leaving wait_event IS NULL in
pg_stat_activity.
Wrap WriteConsoleW() with the StderrWrite wait event introduced in
the previous patch, so the Windows console path is instrumented
consistently with the stderr write(2) path.
---
src/backend/utils/error/elog.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index fa38d6c6df8..d19ec6a31e8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2634,11 +2634,14 @@ write_console(const char *line, int len)
DWORD written;
stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
{
+ pgstat_report_wait_end();
pfree(utf16);
return;
}
+ pgstat_report_wait_end();
/*
* In case WriteConsoleW() failed, fall back to writing the
--
2.50.1 (Apple Git-155)
view thread (9+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Add wait events for server logging destination writes
In-Reply-To: <CACdN0M78U+GvpqA7oey-GA7fFSYM636aDp6H9FVvCztv9zXxSA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox