Hi,

On Wed, 15 Apr 2026 at 23:31, Ayush Tiwari <ayushtiwari.slg01@gmail.com> wrote:
Hi

On Wed, 15 Apr 2026 at 22:21, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 15/04/2026 16:57, Ayush Tiwari wrote:
> Hi,
>
> The comment above the PM_STARTUP startup-process-failure case still says
> that there are no other processes running yet, so the postmaster can just
> exit.
>
> That no longer matches the current startup flow: PM_STARTUP may already
> have auxiliary processes running by that point. The attached patch updates
> that comment to describe the current behavior.

Hmm, shouldn't the postmaster kill and wait for the auxiliary processes
to exit first in that case? ISTM we need code changes here, not just
comments.

- Heikki


Yes, I agree, code change is required here.

The proper thing is to
route this through the existing crash-handling path so the postmaster
SIGQUITs the aux children and waits for them to exit before terminating.

I think the minimal change is:

  1. Replace the ExitPostmaster(1) shortcut in the PM_STARTUP
     startup-failure case with HandleChildCrash(), which calls
     TerminateChildren(SIGQUIT) and transitions through the state
     machine.  Set StartupStatus = STARTUP_CRASHED so the state
     machine does not try to reinitialize.

  2. Let HandleFatalError() handle PM_STARTUP by transitioning to
     PM_WAIT_BACKENDS, instead of the current Assert(false).


The minimal fix turned out to be smaller than I first described, the existing paragraph immediately below the ExitPostmaster(1) block already handles !EXIT_STATUS_0 with StartupStatus != STARTUP_SIGNALED correctly (sets STARTUP_CRASHED and HandleChildCrash). So, likely fix would be:

1. Deleting the PM_STARTUP ExitPostmaster(1) shortcut, and letting execution fall through to the next stanza.

2. Replacing the Assert(false) for PM_STARTUP in HandleFatalError() with a fall-through to UpdatePMState(PM_WAIT_BACKENDS). 

Verification that I did for patch: 

On a fresh initdb'd cluster, I zeroed out the first WAL segment to force the startup process to FATAL at StartupXLOG, then ran PG in foreground under strace.

Before (master):
  LOG:  startup process (PID N) exited with exit code 1
  LOG:  aborting startup due to startup process failure
  LOG:  database system is shut down

  strace of the postmaster PID shows 0 kill() calls to children before
  exit_group(1). Checkpointer, bgwriter and io workers were running at
  the time of the failure and were orphaned.


After (patched):
  LOG:  startup process (PID N) exited with exit code 1
  LOG:  terminating any other active server processes
<state transitions PM_STARTUP -> PM_WAIT_BACKENDS -> PM_WAIT_DEAD_END
   -> PM_NO_CHILDREN>
  LOG:  shutting down due to startup process failure
  LOG:  database system is shut down

  strace shows 8 SIGQUIT deliveries (4 children, each signaled by PID
  and by process-group) before the postmaster's own exit_group(1).

I've attached a patch, please review and let me know your thoughts.

Regards,
Ayush