public inbox for [email protected]  
help / color / mirror / Atom feed
From: 신성준 <[email protected]>
To: [email protected]
Cc: Kirk Wolak <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Andreas Karlsson <[email protected]>
Cc: Nikolay Samokhvalov <[email protected]>
Subject: Add wait events for server logging destination writes
Date: Sun, 31 May 2026 17:50:08 +0900
Message-ID: <CACdN0M78U+GvpqA7oey-GA7fFSYM636aDp6H9FVvCztv9zXxSA@mail.gmail.com> (raw)

Hi hackers,

The write(2) calls that flush server log output aren't covered by wait
events. When a backend logs something, the writes go out in:

  - write_pipe_chunks(): write(2) to the syslogger pipe
  - write_console(): write(2) to stderr (WriteConsoleW() on Windows)

If one of those blocks -- syslogger pipe full, slow console, slow log
device -- pg_stat_activity just shows wait_event = NULL until it
returns. Since NULL usually reads as "on CPU", a backend stuck writing
logs looks like it's doing work, so logging-related stalls are easy to
miss.

Attached is a short series that adds two WaitEventIO events and reports
them around those writes:

  IO / SysloggerWrite - write(2) to the syslogger pipe
  IO / StderrWrite - write(2) to stderr, and WriteConsoleW()

0001 adds the events and covers the write(2) paths. 0002 does the
Windows WriteConsoleW() path, split out since it's platform-specific.

It only wraps the leaf write call and uses the existing
pgstat_report_wait_start()/end() helpers, so it stays allocation-free
and safe to call from inside the error-reporting path.

I did a quick before/after to make sure the events show up: 8 backends
each emitting large RAISE LOG lines, sampling wait_event from
pg_stat_activity every 50 ms for 20 s.

  - logging_collector = on (syslogger pipe):
    master:  NULL                100.0%  (2184/2184)
    patched: IO/SysloggerWrite     99.1%  (2204/2224), NULL 0.9%

  - logging_collector = off (stderr):
    master:  NULL                100.0%  (2144/2144)
    patched: IO/StderrWrite        90.7%  (1952/2152), NULL 9.3%

On master that wait time is just invisible; with the patch it lands on
the new events. I can send the scripts and raw samples if anyone wants
to reproduce it.

Applies on current master. A couple of things I'm unsure about and
would appreciate input on: whether the event names fit the surrounding
conventions, and whether splitting the Windows path into its own patch
is the right call.

Thanks,
Seongjun Shin


Attachments:

  [application/octet-stream] v1-0001-Add-wait-events-for-server-logging-destination-wr.patch (3.6K, 2-v1-0001-Add-wait-events-for-server-logging-destination-wr.patch)
  download | inline diff:
From 9c5907b11e5f20f9424757f474f8b9cb8c3b4266 Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:45:23 +0900
Subject: [PATCH v1 1/2] Add wait events for server logging destination writes

When a backend writes to the syslogger pipe in write_pipe_chunks() or
to stderr in write_console(), the underlying write(2) can block once
the pipe buffer fills up or the output device is slow.  These blocking
syscalls were not instrumented, so pg_stat_activity reported
wait_event IS NULL during that time.  Many monitoring tools interpret
NULL as on-CPU work, which made heavy-logging stalls hard to
attribute.

Add two new WaitEventIO events and report them around the relevant
write(2) calls:

  IO / SysloggerWrite - write(2) to the syslogger pipe inside
                        write_pipe_chunks().
  IO / StderrWrite    - write(2) to stderr inside write_console().

The instrumentation is limited to the leaf write call.  It uses only
the existing pgstat_report_wait_start()/end() inline helpers, which
are allocation-free and safe to call before MyProc is set up, so this
remains safe to invoke from within error reporting paths.
---
 src/backend/utils/activity/wait_event_names.txt | 2 ++
 src/backend/utils/error/elog.c                  | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 5537a2d2530..ce33807c3fe 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -253,6 +253,8 @@ SLRU_WRITE	"Waiting for a write of an SLRU page."
 SNAPBUILD_READ	"Waiting for a read of a serialized historical catalog snapshot."
 SNAPBUILD_SYNC	"Waiting for a serialized historical catalog snapshot to reach durable storage."
 SNAPBUILD_WRITE	"Waiting for a write of a serialized historical catalog snapshot."
+STDERR_WRITE	"Waiting for a write to the server's standard error stream."
+SYSLOGGER_WRITE	"Waiting for a write to the syslogger pipe."
 TIMELINE_HISTORY_FILE_SYNC	"Waiting for a timeline history file received via streaming replication to reach durable storage."
 TIMELINE_HISTORY_FILE_WRITE	"Waiting for a write of a timeline history file received via streaming replication."
 TIMELINE_HISTORY_READ	"Waiting for a read of a timeline history file."
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index aa530d3685e..fa38d6c6df8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2662,7 +2662,9 @@ write_console(const char *line, int len)
 	 * We ignore any error from write() here.  We have no useful way to report
 	 * it ... certainly whining on stderr isn't likely to be productive.
 	 */
+	pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
 	rc = write(fileno(stderr), line, len);
+	pgstat_report_wait_end();
 	(void) rc;
 }
 
@@ -3503,7 +3505,9 @@ write_pipe_chunks(char *data, int len, int dest)
 		/* no need to set PIPE_PROTO_IS_LAST yet */
 		p.proto.len = PIPE_MAX_PAYLOAD;
 		memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
+		pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
 		rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
+		pgstat_report_wait_end();
 		(void) rc;
 		data += PIPE_MAX_PAYLOAD;
 		len -= PIPE_MAX_PAYLOAD;
@@ -3513,7 +3517,9 @@ write_pipe_chunks(char *data, int len, int dest)
 	p.proto.flags |= PIPE_PROTO_IS_LAST;
 	p.proto.len = len;
 	memcpy(p.proto.data, data, len);
+	pgstat_report_wait_start(WAIT_EVENT_SYSLOGGER_WRITE);
 	rc = write(fd, &p, PIPE_HEADER_SIZE + len);
+	pgstat_report_wait_end();
 	(void) rc;
 }
 
-- 
2.50.1 (Apple Git-155)



  [application/octet-stream] v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch (1.3K, 3-v1-0002-Report-StderrWrite-wait-event-around-WriteConsole.patch)
  download | inline diff:
From 4d67d15c56da804b541426f5583f98c8b77cb19f Mon Sep 17 00:00:00 2001
From: Seongjun Shin <[email protected]>
Date: Fri, 29 May 2026 14:52:11 +0900
Subject: [PATCH v1 2/2] Report StderrWrite wait event around WriteConsoleW()
 on Windows

On Windows, write_console() emits log messages to the console with
WriteConsoleW() rather than write(2).  Like the write(2) path, this
call can block on a slow console, leaving wait_event IS NULL in
pg_stat_activity.

Wrap WriteConsoleW() with the StderrWrite wait event introduced in
the previous patch, so the Windows console path is instrumented
consistently with the stderr write(2) path.
---
 src/backend/utils/error/elog.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index fa38d6c6df8..d19ec6a31e8 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2634,11 +2634,14 @@ write_console(const char *line, int len)
 			DWORD		written;
 
 			stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+			pgstat_report_wait_start(WAIT_EVENT_STDERR_WRITE);
 			if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
 			{
+				pgstat_report_wait_end();
 				pfree(utf16);
 				return;
 			}
+			pgstat_report_wait_end();
 
 			/*
 			 * In case WriteConsoleW() failed, fall back to writing the
-- 
2.50.1 (Apple Git-155)



view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Add wait events for server logging destination writes
  In-Reply-To: <CACdN0M78U+GvpqA7oey-GA7fFSYM636aDp6H9FVvCztv9zXxSA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox