Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7s1v-005mKi-10 for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 09:38:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w7s1s-00GMof-2i for pgsql-hackers@arkaria.postgresql.org; Wed, 01 Apr 2026 09:38:29 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w7s1s-00GMoX-1T for pgsql-hackers@lists.postgresql.org; Wed, 01 Apr 2026 09:38:28 +0000 Received: from lahtoruutu.iki.fi ([2a0b:5c81:1c1::37]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w7s1q-000000027HB-1R5c for pgsql-hackers@lists.postgresql.org; Wed, 01 Apr 2026 09:38:28 +0000 Received: from [10.0.2.15] (unknown [130.41.208.1]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id 4fm0KN5ssZz49PsK; Wed, 01 Apr 2026 12:38:16 +0300 (EEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1775036297; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fuj5Y6zbA6afuYcoQ4dUofmW9p75HN0TawQkJ39Tm54=; b=WtNAr+o2Otw45waUbdhPEfGiagBTdg4rD2TwrxXhTHetEWTiPHgeHAf+kQlZZAfDyZ/dzV GMqfCpz4jcMI2V2dtl5mz9O+XgrKv8onAcvsZrzufrwakhWZO39jPGEmZQmK/JYymE969w X7OJ41LhSA5fJ3veBQlOTZvbpOzdkvVsvv/ZS1iqupTucaQmY1D4otsnrhJr/RIzRkA94g ntXU1l4TiMxZAy5WxlXIC82agFOEdfSugxZAAXzdlFxM0Fxme8A0ib5/l0H30C5+4ceMsn L8d58hp1BgR5BwT+mP0DQbgiaOmI1uwlE9X0FV9BKJCiIS/U3ka5hByv5fSV8w== ARC-Seal: i=1; a=rsa-sha256; d=iki.fi; s=lahtoruutu; cv=none; t=1775036297; b=gkSyd7mCphp9vZaoZquRFvDB5GEx5cQz5F3vvQUCxZ5Ikmq6N3b56fjHLgelWzjsJoUnZy 2M2msd/uxgbOlYTJghoytyHDHCyMktunBg13r/y1G3JTW7H8bRvqQAnkXEhudVIaTiCbvP HWxqzmCSTnV/dNj0Lrd48NOco1tgjvBN5VTUGYRMJBx4x7IjgLvBUQNu7BeDN8Zl6tGIEE nIwHluwoV/mdsUiDLbnLKUh8MVplzd8S8Kl8AM/WEsxCLx5FomAfio4xn1GIxifYRYPPO2 9Gm1Zbh9G2ZYnZInR/CM/pkUJV+5ufqNH/xzeQu88YvpYUdDSNogkyo9NApR+A== ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1775036297; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fuj5Y6zbA6afuYcoQ4dUofmW9p75HN0TawQkJ39Tm54=; b=rJdfoiSK9NARln7qjrw3C4FpwI0BciJ5Gjjk0+ncWMkML77qI1uHvg64I3Q9Uzi3eJW5gk rR+lvSym5kj4fid/1Wk9GIwzEhvBE0bQxezXtgzy4OyVHA4jy1vtWl/TeA9unFmlzQoLlh zC1wIcrmd1YiliVE2t6uG3Dji7cn1F9vOs1r4vCRbKsEbEpNBpY8I6jjlUDLEC15696EHf AI2DGALpdg+wcj9ircV43WRDgupXq2pR8d4+LoWtzhSaZH6LbNwMz7kCR/0E5EAFXiZqu2 +ZRbEzioEpVi8LcKafQWmR0L8LeNY/2Pb1Sa6Mxx8VR3vVv3AqIqkcVdORaniQ== Message-ID: <038d97bc-fbe4-4d99-b7a5-e468ef361123@iki.fi> Date: Wed, 1 Apr 2026 12:38:15 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] Fix minRecoveryPoint not advanced past checkpoint in CreateRestartPoint To: Adam Lee , pgsql-hackers@lists.postgresql.org Cc: Michael Paquier References: Content-Language: en-US From: Heikki Linnakangas In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 01/04/2026 11:53, Adam Lee wrote: > Hi hackers, > > I ran into this while working on recovery pre-check logic that relies on > pg_controldata to verify whether replay has reached a specific restore point. > > Reproducer: > > ``` > -- on primary: > CHECKPOINT; > SELECT pg_create_restore_point('test_rp'); > > -- recover with: > -- recovery_target_name = 'test_rp' > -- recovery_target_action = 'shutdown' > > -- after recovery shuts down: > pg_controldata shows minRecoveryPoint 104 bytes behind > pg_create_restore_point's return value (104 bytes = one > RESTORE_POINT WAL record). > ``` > > My RCA: > > When recovery_target_action=shutdown triggers, the checkpointer performs a > shutdown restartpoint via CreateRestartPoint(). If a CHECKPOINT record was > replayed shortly before the recovery target, CreateRestartPoint advances > minRecoveryPoint to the end of that CHECKPOINT record. > > However, any no-op records replayed after the CHECKPOINT (such as > RESTORE_POINT) do not dirty pages, so the lazy minRecoveryPoint update that > normally happens during page flushes never fires for them. As a result, > minRecoveryPoint in pg_control ends up behind the actual replay position. Hmm, what exactly does minRecoveryPoint mean? The current behavior is correct in the sense that if you restarted recovery, you could still stop the recovery at the earlier LSN that's the minRecoveryPoint in the control file, and the system would be consistent. I agree it feels pretty weird though, it would seem natural to advance minRecoveryPoint to the last replayed record on a restartpoint. > My Fix: > > The attached patch fixes this by reading the current replay position from > shared memory after advancing minRecoveryPoint to the checkpoint end, and > advancing further if replay has progressed past it. This is safe because > CheckPointGuts() has already flushed all dirty buffers and the startup process > has exited, so replayEndRecPtr is stable and all pages are on disk. We have this comment earlier in CreateRestartPoint(): > * We don't explicitly advance minRecoveryPoint when we do create a > * restartpoint. It's assumed that flushing the buffers will do that as a > * side-effect. That assumption is not quite right, then. Perhaps we should simply call UpdateMinRecoveryPoint()? That would cause the control file to be flushed twice though, so it's a little inefficient, but maybe that's fine. If we go with your patch, does it make this existing logic below obsolete? > if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) > { > if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr) > { > ControlFile->minRecoveryPoint = lastCheckPointEndPtr; > ControlFile->minRecoveryPointTLI = lastCheckPoint.ThisTimeLineID; > > /* update local copy */ > LocalMinRecoveryPoint = ControlFile->minRecoveryPoint; > LocalMinRecoveryPointTLI = ControlFile->minRecoveryPointTLI; > } > if (flags & CHECKPOINT_IS_SHUTDOWN) > ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; > } - Heikki