public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ayush Tiwari <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: [email protected]
Cc: [email protected] <[email protected]>
Subject: Re: [PATCH] postmaster: fix stale PM_STARTUP comment
Date: Fri, 17 Apr 2026 19:17:18 +0530
Message-ID: <CAJTYsWX=EkwTvA15=T8dE1c4iVoapTT5=nkrByd2h-18NU8hZA@mail.gmail.com> (raw)
In-Reply-To: <CAJTYsWVw3h1hTDyUdaDb8MmmXF0qQKzV3YkQUwC1DhvcApFvTw@mail.gmail.com>
References: <CAJTYsWVoD3V9yhhqSae1_wqcnTdpFY-hDT7dPm5005ZFsL_bpA@mail.gmail.com>
	<[email protected]>
	<CAJTYsWVw3h1hTDyUdaDb8MmmXF0qQKzV3YkQUwC1DhvcApFvTw@mail.gmail.com>

Hi,

On Wed, 15 Apr 2026 at 23:31, Ayush Tiwari <[email protected]>
wrote:

> Hi
>
> On Wed, 15 Apr 2026 at 22:21, Heikki Linnakangas <[email protected]> wrote:
>
>> On 15/04/2026 16:57, Ayush Tiwari wrote:
>> > Hi,
>> >
>> > The comment above the PM_STARTUP startup-process-failure case still says
>> > that there are no other processes running yet, so the postmaster can
>> just
>> > exit.
>> >
>> > That no longer matches the current startup flow: PM_STARTUP may already
>> > have auxiliary processes running by that point. The attached patch
>> updates
>> > that comment to describe the current behavior.
>>
>> Hmm, shouldn't the postmaster kill and wait for the auxiliary processes
>> to exit first in that case? ISTM we need code changes here, not just
>> comments.
>>
>> - Heikki
>>
>>
> Yes, I agree, code change is required here.
>
> The proper thing is to
> route this through the existing crash-handling path so the postmaster
> SIGQUITs the aux children and waits for them to exit before terminating.
>
> I think the minimal change is:
>
>   1. Replace the ExitPostmaster(1) shortcut in the PM_STARTUP
>      startup-failure case with HandleChildCrash(), which calls
>      TerminateChildren(SIGQUIT) and transitions through the state
>      machine.  Set StartupStatus = STARTUP_CRASHED so the state
>      machine does not try to reinitialize.
>
>   2. Let HandleFatalError() handle PM_STARTUP by transitioning to
>      PM_WAIT_BACKENDS, instead of the current Assert(false).
>
>
The minimal fix turned out to be smaller than I first described, the
existing paragraph immediately below the ExitPostmaster(1) block already
handles !EXIT_STATUS_0 with StartupStatus != STARTUP_SIGNALED correctly
(sets STARTUP_CRASHED and HandleChildCrash). So, likely fix would be:

1. Deleting the PM_STARTUP ExitPostmaster(1) shortcut, and letting
execution fall through to the next stanza.

2. Replacing the Assert(false) for PM_STARTUP in HandleFatalError() with a
fall-through to UpdatePMState(PM_WAIT_BACKENDS).

Verification that I did for patch:

On a fresh initdb'd cluster, I zeroed out the first WAL segment to force
the startup process to FATAL at StartupXLOG, then ran PG in foreground
under strace.

Before (master):
  LOG:  startup process (PID N) exited with exit code 1
  LOG:  aborting startup due to startup process failure
  LOG:  database system is shut down

  strace of the postmaster PID shows 0 kill() calls to children before
  exit_group(1). Checkpointer, bgwriter and io workers were running at
  the time of the failure and were orphaned.


After (patched):
  LOG:  startup process (PID N) exited with exit code 1
  LOG:  terminating any other active server processes
<state transitions PM_STARTUP -> PM_WAIT_BACKENDS -> PM_WAIT_DEAD_END
   -> PM_NO_CHILDREN>
  LOG:  shutting down due to startup process failure
  LOG:  database system is shut down

  strace shows 8 SIGQUIT deliveries (4 children, each signaled by PID
  and by process-group) before the postmaster's own exit_group(1).

I've attached a patch, please review and let me know your thoughts.

Regards,
Ayush


Attachments:

  [application/octet-stream] 0001-postmaster-drain-aux-processes-on-startup-process-fa.patch (3.2K, 3-0001-postmaster-drain-aux-processes-on-startup-process-fa.patch)
  download | inline diff:
From 6267a416cc773cc88fe17322908961128764c254 Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <[email protected]>
Date: Fri, 17 Apr 2026 18:49:34 +0530
Subject: [PATCH] postmaster: drain aux processes on startup-process failure
 during PM_STARTUP

When the startup process exits with a non-zero status during PM_STARTUP,
the postmaster called ExitPostmaster(1) immediately. But by the time
PM_STARTUP is active, checkpointer, bgwriter, io workers, and
BgWorkerStart_PostmasterStart background workers may already be running.
Exiting immediately orphaned them.

Route this path through the existing crash-handling machinery: fall
through to the following stanza which sets StartupStatus = STARTUP_CRASHED
and calls HandleChildCrash(), causing HandleFatalError() to SIGQUIT the
aux children and transition to PM_WAIT_BACKENDS. The state machine then
drains through PM_WAIT_DEAD_END to PM_NO_CHILDREN, where the existing
STARTUP_CRASHED check logs 'shutting down due to startup process failure'
and calls ExitPostmaster(1).

Also replace the Assert(false) for PM_STARTUP in HandleFatalError() with
a transition to PM_WAIT_BACKENDS. The assert was a latent bug: any aux
process crash during PM_STARTUP (not just startup-process failure) would
reach it via HandleChildCrash -> HandleFatalError.
---
 src/backend/postmaster/postmaster.c | 29 +++++------------------------
 1 file changed, 5 insertions(+), 24 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b6fd332f196..01df0f634e3 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2305,26 +2305,10 @@ process_pm_child_exit(void)
 			}
 
 			/*
-			 * Unexpected exit of startup process (including FATAL exit)
-			 * during PM_STARTUP is treated as catastrophic. There are no
-			 * other processes running yet, so we can just exit.
-			 */
-			if (pmState == PM_STARTUP &&
-				StartupStatus != STARTUP_SIGNALED &&
-				!EXIT_STATUS_0(exitstatus))
-			{
-				LogChildExit(LOG, _("startup process"),
-							 pid, exitstatus);
-				ereport(LOG,
-						(errmsg("aborting startup due to startup process failure")));
-				ExitPostmaster(1);
-			}
-
-			/*
-			 * After PM_STARTUP, any unexpected exit (including FATAL exit) of
-			 * the startup process is catastrophic, so kill other children,
-			 * and set StartupStatus so we don't try to reinitialize after
-			 * they're gone.  Exception: if StartupStatus is STARTUP_SIGNALED,
+			 * Any unexpected exit (including FATAL exit) of the startup
+			 * process is catastrophic, so kill other children, and set
+			 * StartupStatus so we don't try to reinitialize after they're
+			 * gone.  Exception: if StartupStatus is STARTUP_SIGNALED,
 			 * then we previously sent the startup process a SIGQUIT; so
 			 * that's probably the reason it died, and we do want to try to
 			 * restart in that case.
@@ -2780,12 +2764,9 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
 			/* shouldn't have any children */
 			Assert(false);
 			break;
-		case PM_STARTUP:
-			/* should have been handled in process_pm_child_exit */
-			Assert(false);
-			break;
 
 			/* wait for children to die */
+		case PM_STARTUP:
 		case PM_RECOVERY:
 		case PM_HOT_STANDBY:
 		case PM_RUN:
-- 
2.34.1



view thread (8+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: [PATCH] postmaster: fix stale PM_STARTUP comment
  In-Reply-To: <CAJTYsWX=EkwTvA15=T8dE1c4iVoapTT5=nkrByd2h-18NU8hZA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox