Re: Change checkpoint‑record‑missing PANIC to FATAL

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Michael Paquier <[email protected]>
To: Nitin Jadhav <[email protected]>
Cc: Pg Hackers <[email protected]>
Subject: Re: Change checkpoint‑record‑missing PANIC to FATAL
Date: Thu, 5 Feb 2026 09:40:58 +0900
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAMm1aWb47v9Bx40P1_6YpRxxKi9XSwjAV_bLbFxx66Rg8o3+=g@mail.gmail.com>
References: <CAMm1aWZ9Tv=Wrx52_2Ppw+6ULf_twRZuQm=ZWLA_a-kXWykHkQ@mail.gmail.com>
	<[email protected]>
	<CAMm1aWb47v9Bx40P1_6YpRxxKi9XSwjAV_bLbFxx66Rg8o3+=g@mail.gmail.com>

On Mon, Dec 29, 2025 at 08:39:08PM +0530, Nitin Jadhav wrote:
> Apologies for the delay.
> At a high level, the recovery startup cases we want to test fall into
> two main buckets:
> (1) with a backup_label file and (2) without a backup_label file.

For clarity's sake, we are talking about lowering this one in
xlogrecovery.c, which relates to the code path where these is no
backup_label file:
ereport(PANIC,
        errmsg("could not locate a valid checkpoint record at %X/%08X",
               LSN_FORMAT_ARGS(CheckPointLoc)));

> From these two situations, we can cover the following scenarios:
> 1) Primary crash recovery without a backup_label – Delete the WAL
> segment containing the checkpoint record and try starting the server.

Yeah, let's add a test for that.  It would be enough to remove the
segment that includes the checkpoint record.  There should be no need
to be fancy with injection points like the other test case from
15f68cebdcec.

> 2) Primary crash recovery with a backup_label – Take a base backup
> (which creates the backup_label), remove the checkpoint WAL segment,
> and start the server with that backup directory.

Okay.  I don't mind something here, for the two FATAL cases in the
code path where the backup_label exists:
- REDO record missing with checkpoint record found.  This is similar
to 15f68cebdcec.
- Checkpoint record missing.
Both should be cross-checked with the FATAL errors generated in the
server logs.

> 3) Standby crash recovery – Stop the standby, delete the checkpoint
> WAL segment, and start it again to see how standby recovery behaves.

In this case, we need to have a restore_command set anyway, no,
meaning that we should never fail?  I don't recall that we have a test
for that, currently, where we could look at the server logs to check
that a segment has been retrieved because the segment that includes
the checkpoint record is missing..

> 4) PITR / archive‑recovery – Remove the checkpoint WAL segment and
> start the server with a valid restore_command so it enters archive
> recovery.

Same as 3) to me, standby mode cannot be activated without a
restore_command and the recovery GUC checks are done in accordance to
the signal files before we attempt to read the initial checkpoint
record.

> Tests (2) and (4) are fairly similar, so we can merge them if they
> turn out to be redundant.
> These are the scenarios I have in mind so far. Please let me know if
> you think anything else should be added.

For the sake of the change from the PANIC to FATAL mentioned at the
top of this message, (1) would be enough.

The two cases of (2) I'm mentioning would be nice bonuses.  I would
recommend to double-check first if we trigger these errors in some
tests of the existing tests, actually, perhaps we don't need to add
anything except a check in some node's logs for the error string
patterns wanted.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Change checkpoint‑record‑missing PANIC to FATAL
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox