public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Steele <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Pg Hackers <[email protected]>
Subject: Re: Return pg_control from pg_backup_stop().
Date: Wed, 02 Oct 2024 09:03:27 +0000 (UTC)
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On 10/2/24 10:11, Michael Paquier wrote:
> On Fri, May 17, 2024 at 12:46:49PM +1000, David Steele wrote:
> 
>> The basic idea is to harden recovery by returning a copy of pg_control from
>> pg_backup_stop() that has a flag set to prevent recovery if the backup_label
>> file is missing. Instead of backup software copying pg_control from PGDATA,
>> it stores an updated version that is returned from pg_backup_stop().
> 
> Hmm, okay.  There is also a slight impact for BASE_BACKUP, requiring
> basebackup_progress_wait_wal_archive() and do_pg_backup_stop() to be
> called earlier when sending the main data directory which is the last
> one in the list of tablespaces.  As far as I can see, this does not
> change the logic because do_pg_backup_stop() does not touch the
> control file, but it seems to me that we'd rather keep these two
> calls as they are now, and send the control file once we are out of
> the for loop that processes all the tablespaces.  That seems slightly
> cleaner to me, still I agree that both are the same things.

Sending pg_control later results in even more code churn because of how 
tar files are terminated. I've updated it that way in v2 so you can see 
what I mean. I don't have a strong preference, though, so if you prefer 
the implementation in v2 then that's fine with me.

> Anyway, finishing do_pg_backup_stop() and then sending the control
> file is just a consequence of the implementation choice driven by the
> output required for the SQL function so as this is stored in the
> backup state to get it down to pg_backup_stop() in xlogfuncs.c, so we
> could also take one step back and forget about the SQL function,
> setting only the new flag when sending a BASE_BACKUP, or just not use
> the backupState to store this data as the exercise involves forcing
> one boolean and recalculate a CRC32.

I can definitely see us making other updates to pg_control so I would 
rather keep this logic centralized, even though it is not too 
complicated at this point. Still, even 8 lines of code (as it is now) 
seems better not to duplicate.

>> * We don't need to worry about backup software seeing a torn copy of
>> pg_control, since Postgres can safely read it out of memory and provide a
>> valid copy via pg_backup_stop(). This solves torn reads without needing to
>> write pg_control via a temp file, which may affect performance on a standby.
> 
> We're talking about a 8kB file which has a size of 512B
> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues.  So I'm not sure to
> see your point here?

Even at 512B it is possible to see tears in pg_control and they happen 
in the build farm right now. In fact, this thread [1] trying to fix the 
problem was what got me thinking about alternate solutions to preventing 
tears in pg_control. Thomas' proposed fixes have not been committed to 
my knowledge so the problem remains, but would be fixed by this commit.

> There is a large comment block in do_pg_backup_stop() around
> backup_stopped_in_recovery.  Perhaps it should be improved based on
> this patch.

I added a sentence to this comment block in v2.

> The main concern that I have over this patch is: who is actually going
> to use this extension of the SQL stop function?

Primarily existing backup software, I would imagine. The idea is that it 
would give them feature parity with pg_basebackup, rather than the new 
protections being exclusive to pg_basebackup.

> Perhaps existing
> backup solutions are good enough risk vs reward is not worth it?  

I'm not sure I see the risk here. Saving out pg_control is optional so 
no changes to current software is required. Of course they miss the 
benefit of the protection against tears and missing backup_label, but 
that is a choice.

Also, no matter what current backup solutions do, they cannot prevent a 
user from removing backup_label after restore. This patch prevents 
invalid recovery when that happens.

> The
> label_file and the tablespace map are text, this is a series of bytes
> that has no visibility for the end-user unless checked on the
> client-side.  This adds a new step where backups would need to copy
> the control file to the data folder.

Again, optional, but if I was able to manage these saves using the psql 
interface in the TAP tests then I'd say it would be pretty easy for 
anyone with a normal connection to Postgres. Also, we require users to 
treat tabelspace_map and backup_label as binary so not too big a change 
here.

[1] 
https://www.postgresql.org/message-id/CA%2BhUKG%2Bjig%2BQdBETj_ab%2B%2BVWSoJjbwT3sLNCnk0wFsY_6tRqoA%...
From f9ebd108a9c5dc55fa485dc8c60eb11d2c16715c Mon Sep 17 00:00:00 2001
From: David Steele <[email protected]>
Date: Wed, 2 Oct 2024 08:59:12 +0000
Subject: Return pg_control from pg_backup_stop().

Harden recovery by returning a copy of pg_control from pg_backup_stop() that has
a flag set to prevent recovery if the backup_label file is missing. Instead of
backup software copying pg_control from PGDATA, it stores an updated version
that is returned from pg_backup_stop(). This is better for the following
reasons:

* The user can no longer remove backup_label and get what looks like a
successful recovery (while almost certainly causing corruption). If backup_label
is removed the cluster will not start. The user may try pg_resetwal, but that
tool makes it pretty clear that corruption will result from its use.

* We don't need to worry about backup software seeing a torn copy of pg_control,
since Postgres can safely read it out of memory and provide a valid copy via
pg_backup_stop(). This solves torn reads without needing to write pg_control via
a temp file, which may affect performance on a standby.

* For backup from standby, we no longer need to instruct the backup software to
copy pg_control last. In fact the backup software should not copy pg_control from
PGDATA at all.

These changes have no impact on current backup software and they are free to use
the pg_control available from pg_stop_backup() or continue to use pg_control from
PGDATA. Of course they will miss the benefits of getting a consistent copy of
pg_control and the backup_label checking, but will be no worse off than before.

Control and catalog version bumps are required.
---
 doc/src/sgml/backup.sgml                    | 18 +++++-
 doc/src/sgml/func.sgml                      | 10 ++-
 src/backend/access/transam/xlog.c           | 23 ++++++-
 src/backend/access/transam/xlogfuncs.c      | 17 ++++--
 src/backend/access/transam/xlogrecovery.c   | 10 ++-
 src/backend/backup/basebackup.c             | 34 +++++------
 src/backend/catalog/system_functions.sql    |  2 +-
 src/backend/utils/misc/pg_controldata.c     |  7 ++-
 src/bin/pg_controldata/pg_controldata.c     |  2 +
 src/bin/pg_resetwal/pg_resetwal.c           |  1 +
 src/bin/pg_rewind/pg_rewind.c               |  1 +
 src/include/access/xlogbackup.h             |  8 +++
 src/include/catalog/pg_control.h            |  4 ++
 src/include/catalog/pg_proc.dat             | 10 +--
 src/test/recovery/t/002_archiving.pl        | 20 ++++++
 src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++-
 16 files changed, 192 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e4e4c56cf1..2fcf181121 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
      values. The second of these fields should be written to a file named
      <filename>backup_label</filename> in the root directory of the backup. The
      third field should be written to a file named
-     <filename>tablespace_map</filename> unless the field is empty. These files are
+     <filename>tablespace_map</filename> unless the field is empty. The fourth
+     field should be written into a file named
+     <filename>global/pg_control</filename> (overwriting the existing file when
+     present). These files are
      vital to the backup working and must be written byte for byte without
-     modification, which may require opening the file in binary mode.
+     modification, which will require opening the file in binary mode.
     </para>
    </listitem>
    <listitem>
@@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
    </para>
 
    <para>
-    You should, however, omit from the backup the files within the
+    You should exclude <filename>global/pg_control</filename> from your backup
+    and put the contents of the <parameter>controlfile</parameter> column
+    returned from <function>pg_backup_stop</function> in your backup at
+    <filename>global/pg_control</filename>. This version of pg_control contains
+    safeguards against recovery without backup_label present and is guaranteed
+    not to be torn.
+   </para>
+
+   <para>
+    You should also omit from the backup the files within the
     cluster's <filename>pg_wal/</filename> subdirectory.  This
     slight adjustment is worthwhile because it reduces the risk
     of mistakes when restoring.  This is easy to arrange if
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d6acdd3059..a2622014ae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27899,6 +27899,11 @@ acl      | {postgres=arwdDxtm/postgres,foo=r/postgres}
        <entry><type>boolean</type></entry>
       </row>
 
+      <row>
+       <entry><structfield>backup_label_required</structfield></entry>
+       <entry><type>boolean</type></entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
@@ -28576,8 +28581,9 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
         The result of the function is a single record.
         The <parameter>lsn</parameter> column holds the backup's ending
         write-ahead log location (which again can be ignored).  The second
-        column returns the contents of the backup label file, and the third
-        column returns the contents of the tablespace map file.  These must be
+        column returns the contents of the backup label file, the third column
+        returns the contents of the tablespace map file, and the fourth column
+        returns the contents of pg_control.  These must be
         stored as part of the backup and are required as part of the restore
         process.
        </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9102c8d772..d28f155700 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9110,7 +9110,9 @@ get_backup_status(void)
  * wait for WAL segments to be archived.
  *
  * "state" is filled with the information necessary to restore from this
- * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
+ * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc. This
+ * includes a copy of pg_control with safeguards against it being used without
+ * backup_label.
  *
  * It is the responsibility of the caller of this function to verify the
  * permissions of the calling user!
@@ -9127,11 +9129,30 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 	int			seconds_before_warning;
 	int			waits = 0;
 	bool		reported_waiting = false;
+	ControlFileData *controlFileCopy = (ControlFileData *)state->controlFile;
 
 	Assert(state != NULL);
 
 	backup_stopped_in_recovery = RecoveryInProgress();
 
+	/*
+	 * Create a copy of control data and update it to require a backup label
+	 * for recovery. Also recalculate the CRC.
+	 */
+	memset(
+		state->controlFile + sizeof(ControlFileData), 0,
+		PG_CONTROL_FILE_SIZE - sizeof(ControlFileData));
+
+	LWLockAcquire(ControlFileLock, LW_SHARED);
+	memcpy(controlFileCopy, ControlFile, sizeof(ControlFileData));
+	LWLockRelease(ControlFileLock);
+
+	controlFileCopy->backupLabelRequired = true;
+
+	INIT_CRC32C(controlFileCopy->crc);
+	COMP_CRC32C(controlFileCopy->crc, controlFileCopy, offsetof(ControlFileData, crc));
+	FIN_CRC32C(controlFileCopy->crc);
+
 	/*
 	 * During recovery, we don't need to check WAL level. Because, if WAL
 	 * level is not sufficient, it's impossible to get here during recovery.
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb618..fc2401d859 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -115,9 +115,11 @@ pg_backup_start(PG_FUNCTION_ARGS)
  *
  * The backup_label contains the user-supplied label string (typically this
  * would be used to tell where the backup dump will be stored), the starting
- * time, starting WAL location for the dump and so on.  It is the caller's
- * responsibility to write the backup_label and tablespace_map files in the
- * data folder that will be restored from this backup.
+ * time, starting WAL location for the dump and so on.  The pg_control file
+ * contains represents a consistent copy of pg_control that also has a safeguard
+ * against being used without backup_label.  It is the caller's responsibility
+ * to write the backup_label, pg_control, and tablespace_map files in the data
+ * folder that will be restored from this backup.
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -125,12 +127,13 @@ pg_backup_start(PG_FUNCTION_ARGS)
 Datum
 pg_backup_stop(PG_FUNCTION_ARGS)
 {
-#define PG_BACKUP_STOP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 4
 	TupleDesc	tupdesc;
 	Datum		values[PG_BACKUP_STOP_V2_COLS] = {0};
 	bool		nulls[PG_BACKUP_STOP_V2_COLS] = {0};
 	bool		waitforarchive = PG_GETARG_BOOL(0);
 	char	   *backup_label;
+	bytea	   *pg_control_bytea;
 	SessionBackupState status = get_backup_status();
 
 	/* Initialize attributes information in the tuple descriptor */
@@ -152,9 +155,15 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 	/* Build the contents of backup_label */
 	backup_label = build_backup_content(backup_state, false);
 
+	/* Build the contents of pg_control */
+	pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ);
+	SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ);
+	memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_FILE_SIZE);
+
 	values[0] = LSNGetDatum(backup_state->stoppoint);
 	values[1] = CStringGetTextDatum(backup_label);
 	values[2] = CStringGetTextDatum(tablespace_map->data);
+	values[3] = PointerGetDatum(pg_control_bytea);
 
 	/* Deallocate backup-related variables */
 	pfree(backup_label);
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1..193b7e2045 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -704,7 +704,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	}
 	else
 	{
-		/* No backup_label file has been found if we are here. */
+		/*
+		 * No backup_label file has been found if we are here. Error if the
+		 * control file requires backup_label.
+		 */
+		if (ControlFile->backupLabelRequired)
+			ereport(FATAL,
+					(errmsg("could not find backup_label required for recovery"),
+					 errhint("backup_label must be present for recovery to succeed")));
 
 		/*
 		 * If tablespace_map file is present without backup_label file, there
@@ -977,6 +984,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 		{
 			ControlFile->backupStartPoint = checkPoint.redo;
 			ControlFile->backupEndRequired = backupEndRequired;
+			ControlFile->backupLabelRequired = false;
 
 			if (backupFromStandby)
 			{
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 14e5ba72e9..d0d4e74cb0 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -325,7 +325,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 
 			if (ti->path == NULL)
 			{
-				struct stat statbuf;
 				bool		sendtblspclinks = true;
 				char	   *backup_label;
 
@@ -348,16 +347,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 				/* Then the bulk of the files... */
 				sendDir(sink, ".", 1, false, state.tablespaces,
 						sendtblspclinks, &manifest, InvalidOid, ib);
-
-				/* ... and pg_control after everything else. */
-				if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
-					ereport(ERROR,
-							(errcode_for_file_access(),
-							 errmsg("could not stat file \"%s\": %m",
-									XLOG_CONTROL_FILE)));
-				sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
-						 false, InvalidOid, InvalidOid,
-						 InvalidRelFileNumber, 0, &manifest, 0, NULL, 0);
 			}
 			else
 			{
@@ -374,7 +363,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 			 * include the xlog files below and stop afterwards. This is safe
 			 * since the main data directory is always sent *last*.
 			 */
-			if (opt->includewal && ti->path == NULL)
+			if (ti->path == NULL)
 			{
 				Assert(lnext(state.tablespaces, lc) == NULL);
 			}
@@ -394,6 +383,11 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 		basebackup_progress_wait_wal_archive(&state);
 		do_pg_backup_stop(backup_state, !opt->nowait);
 
+		/* Send pg_control */
+		sendFileWithContent(sink, XLOG_CONTROL_FILE,
+							(char *)backup_state->controlFile,
+							PG_CONTROL_FILE_SIZE, &manifest);
+
 		endptr = backup_state->stoppoint;
 		endtli = backup_state->stoptli;
 
@@ -632,16 +626,16 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 			StatusFilePath(pathbuf, fname, ".done");
 			sendFileWithContent(sink, pathbuf, "", -1, &manifest);
 		}
+	}
 
-		/* Properly terminate the tar file. */
-		StaticAssertStmt(2 * TAR_BLOCK_SIZE <= BLCKSZ,
-						 "BLCKSZ too small for 2 tar blocks");
-		memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
-		bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
+	/* Properly terminate the tar file. */
+	StaticAssertStmt(2 * TAR_BLOCK_SIZE <= BLCKSZ,
+						"BLCKSZ too small for 2 tar blocks");
+	memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
+	bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
 
-		/* OK, that's the end of the archive. */
-		bbsink_end_archive(sink);
-	}
+	/* OK, that's the end of the archive. */
+	bbsink_end_archive(sink);
 
 	AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
 							   endptr, endtli);
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index b0d0de051e..758f90b2c3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION
 
 CREATE OR REPLACE FUNCTION pg_backup_stop (
         wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn,
-        OUT labelfile text, OUT spcmapfile text)
+        OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea)
   RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop'
   PARALLEL RESTRICTED;
 
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 98c932dc7b..9eaf3f8b9f 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 Datum
 pg_control_recovery(PG_FUNCTION_ARGS)
 {
-	Datum		values[5];
-	bool		nulls[5];
+	Datum		values[6];
+	bool		nulls[6];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
@@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 	values[4] = BoolGetDatum(ControlFile->backupEndRequired);
 	nulls[4] = false;
 
+	values[5] = BoolGetDatum(ControlFile->backupLabelRequired);
+	nulls[5] = false;
+
 	htup = heap_form_tuple(tupdesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..8cc29904c6 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -283,6 +283,8 @@ main(int argc, char *argv[])
 		   LSN_FORMAT_ARGS(ControlFile->backupEndPoint));
 	printf(_("End-of-backup record required:        %s\n"),
 		   ControlFile->backupEndRequired ? _("yes") : _("no"));
+	printf(_("Backup label required:                %s\n"),
+		   ControlFile->backupLabelRequired ? _("yes") : _("no"));
 	printf(_("wal_level setting:                    %s\n"),
 		   wal_level_str(ControlFile->wal_level));
 	printf(_("wal_log_hints setting:                %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..7056752cff 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -875,6 +875,7 @@ RewriteControlFile(void)
 	ControlFile.backupStartPoint = 0;
 	ControlFile.backupEndPoint = 0;
 	ControlFile.backupEndRequired = false;
+	ControlFile.backupLabelRequired = false;
 
 	/*
 	 * Force the defaults for max_* settings. The values don't really matter
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 960916a1e8..79a44b70ee 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -722,6 +722,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
+	ControlFile_new.backupLabelRequired = true;
 	if (!dry_run)
 		update_controlfile(datadir_target, &ControlFile_new, do_sync);
 }
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index c30d4a5991..93a0e5c5cc 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -15,6 +15,7 @@
 #define XLOG_BACKUP_H
 
 #include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
 #include "pgtime.h"
 
 /* Structure to hold backup state. */
@@ -35,6 +36,13 @@ typedef struct BackupState
 	XLogRecPtr	stoppoint;		/* backup stop WAL location */
 	TimeLineID	stoptli;		/* backup stop TLI */
 	pg_time_t	stoptime;		/* backup stop time */
+
+	/*
+	 * After pg_backup_stop() returns this field will contain a copy of
+	 * pg_control that should be stored with the backup. backupLabelRequired
+	 * has been set so backup_label will be required for recovery to start.
+	 */
+	uint8_t controlFile[PG_CONTROL_FILE_SIZE];
 } BackupState;
 
 extern char *build_backup_content(BackupState *state,
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e80ff8e414..b471a9b02e 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -164,12 +164,16 @@ typedef struct ControlFileData
 	 * If backupEndRequired is true, we know for sure that we're restoring
 	 * from a backup, and must see a backup-end record before we can safely
 	 * start up.
+	 *
+	 * If backupLabelRequired is true, then a backup_label file must be
+	 * present in order for recovery to succeed.
 	 */
 	XLogRecPtr	minRecoveryPoint;
 	TimeLineID	minRecoveryPointTLI;
 	XLogRecPtr	backupStartPoint;
 	XLogRecPtr	backupEndPoint;
 	bool		backupEndRequired;
+	bool		backupLabelRequired;
 
 	/*
 	 * Parameter settings that determine if the WAL can be used for archival
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..ea93241eed 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6569,8 +6569,8 @@
 { oid => '2739', descr => 'finish taking an online backup',
   proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r',
   prorettype => 'record', proargtypes => 'bool',
-  proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}',
-  proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}',
+  proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}',
+  proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}',
   prosrc => 'pg_backup_stop' },
 { oid => '3436', descr => 'promote standby server',
   proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',
@@ -12119,9 +12119,9 @@
 { oid => '3443',
   descr => 'pg_controldata recovery state information as a function',
   proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record',
-  proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}',
-  proargmodes => '{o,o,o,o,o}',
-  proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}',
+  proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}',
   prosrc => 'pg_control_recovery' },
 
 { oid => '3444',
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index bc447330e1..462f1dcf0d 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -41,6 +41,26 @@ $node_standby->append_conf(
 archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file'
 recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file'
 ));
+
+# Rename backup_label to verify that recovery will not start without it
+rename("$data_dir/backup_label", "$data_dir/backup_label.tmp")
+  or BAIL_OUT "could not move $data_dir/backup_label";
+
+my $res = run_log(
+	[
+		'pg_ctl', '-D', $node_standby->data_dir, '-l',
+		$node_standby->logfile, 'start'
+	]);
+ok(!$res, 'invalid recovery startup fails');
+
+my $logfile = slurp_file($node_standby->logfile());
+ok($logfile =~ qr/could not find backup_label required for recovery/,
+	'could not find backup_label required for recovery');
+
+# Restore backup_label so recovery proceeds normally
+rename("$data_dir/backup_label.tmp", "$data_dir/backup_label")
+  or BAIL_OUT "could not move $data_dir/backup_label";
+
 $node_standby->start;
 
 # Create some content on primary
diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl
index 61d23187e0..bd3a99960f 100644
--- a/src/test/recovery/t/042_low_level_backup.pl
+++ b/src/test/recovery/t/042_low_level_backup.pl
@@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+# Decode hex to binary
+sub decode_hex
+{
+	my ($encoded) = @_;
+	my $decoded;
+
+	$encoded =~ s/^\s+|\s+$//g;
+
+	for (my $idx = 0; $idx < length($encoded); $idx += 2)
+	{
+		$decoded .= pack('C', hex(substr($encoded, $idx, 2)));
+	}
+
+	return $decoded;
+}
+
+# Get backup_label/pg_control from pg_stop_backup()
+sub stop_backup_result
+{
+	my ($psql) = @_;
+
+	my $encoded = $psql->query_safe(
+		"select encode(labelfile::bytea, 'hex') || ',' || " .
+		"       encode(controlfile, 'hex')" .
+		"  from pg_backup_stop()");
+
+	my @result;
+
+    foreach my $column (split(',', $encoded))
+	{
+		push(@result, decode_hex($column));
+	}
+
+	return @result;
+}
+
 # Start primary node with archiving.
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 $node_primary->init(has_archiving => 1, allows_streaming => 1);
@@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres',
 	'SELECT pg_walfile_name(pg_current_wal_lsn())');
 
 # Stop backup and get backup_label, the last segment is archived.
-my $backup_label =
-  $psql->query_safe("select labelfile from pg_backup_stop()");
+(my $backup_label, my $pg_control) = stop_backup_result($psql);
 
 $psql->quit;
 
@@ -118,10 +153,36 @@ ok( $node_replica->log_contains(
 $node_replica->teardown_node;
 $node_replica->clean_node;
 
+# Save only pg_control into the backup to demonstrate that pg_control returned
+# from pg_stop_backup() will only perform recovery when backup_label is present.
+open(my $fh, ">", "$backup_dir/global/pg_control")
+  or die "could not open pg_control";
+binmode($fh);
+syswrite($fh, $pg_control);
+close($fh);
+
+$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2');
+$node_replica->init_from_backup($node_primary, $backup_name,
+	has_restoring => 1);
+
+my $res = run_log(
+	[
+		'pg_ctl', '-D', $node_replica->data_dir, '-l',
+		$node_replica->logfile, 'start'
+	]);
+ok(!$res, 'invalid recovery startup fails');
+
+my $logfile = slurp_file($node_replica->logfile());
+ok($logfile =~ qr/could not find backup_label required for recovery/,
+	'could not find backup_label required for recovery');
+
+$node_replica->teardown_node;
+$node_replica->clean_node;
+
 # Save backup_label into the backup directory and recover using the primary's
 # archive.  This time recovery will succeed and the canary table will be
 # present.
-open my $fh, ">>", "$backup_dir/backup_label"
+open $fh, ">>", "$backup_dir/backup_label"
   or die "could not open backup_label";
 # Binary mode is required for Windows, as the backup_label parsing is not
 # able to cope with CRLFs.
-- 
2.34.1



Attachments:

  [text/plain] pgcontrol-from-backupstop-v2.patch (23.8K, 2-pgcontrol-from-backupstop-v2.patch)
  download | inline diff:
From f9ebd108a9c5dc55fa485dc8c60eb11d2c16715c Mon Sep 17 00:00:00 2001
From: David Steele <[email protected]>
Date: Wed, 2 Oct 2024 08:59:12 +0000
Subject: Return pg_control from pg_backup_stop().

Harden recovery by returning a copy of pg_control from pg_backup_stop() that has
a flag set to prevent recovery if the backup_label file is missing. Instead of
backup software copying pg_control from PGDATA, it stores an updated version
that is returned from pg_backup_stop(). This is better for the following
reasons:

* The user can no longer remove backup_label and get what looks like a
successful recovery (while almost certainly causing corruption). If backup_label
is removed the cluster will not start. The user may try pg_resetwal, but that
tool makes it pretty clear that corruption will result from its use.

* We don't need to worry about backup software seeing a torn copy of pg_control,
since Postgres can safely read it out of memory and provide a valid copy via
pg_backup_stop(). This solves torn reads without needing to write pg_control via
a temp file, which may affect performance on a standby.

* For backup from standby, we no longer need to instruct the backup software to
copy pg_control last. In fact the backup software should not copy pg_control from
PGDATA at all.

These changes have no impact on current backup software and they are free to use
the pg_control available from pg_stop_backup() or continue to use pg_control from
PGDATA. Of course they will miss the benefits of getting a consistent copy of
pg_control and the backup_label checking, but will be no worse off than before.

Control and catalog version bumps are required.
---
 doc/src/sgml/backup.sgml                    | 18 +++++-
 doc/src/sgml/func.sgml                      | 10 ++-
 src/backend/access/transam/xlog.c           | 23 ++++++-
 src/backend/access/transam/xlogfuncs.c      | 17 ++++--
 src/backend/access/transam/xlogrecovery.c   | 10 ++-
 src/backend/backup/basebackup.c             | 34 +++++------
 src/backend/catalog/system_functions.sql    |  2 +-
 src/backend/utils/misc/pg_controldata.c     |  7 ++-
 src/bin/pg_controldata/pg_controldata.c     |  2 +
 src/bin/pg_resetwal/pg_resetwal.c           |  1 +
 src/bin/pg_rewind/pg_rewind.c               |  1 +
 src/include/access/xlogbackup.h             |  8 +++
 src/include/catalog/pg_control.h            |  4 ++
 src/include/catalog/pg_proc.dat             | 10 +--
 src/test/recovery/t/002_archiving.pl        | 20 ++++++
 src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++-
 16 files changed, 192 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index e4e4c56cf1..2fcf181121 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
      values. The second of these fields should be written to a file named
      <filename>backup_label</filename> in the root directory of the backup. The
      third field should be written to a file named
-     <filename>tablespace_map</filename> unless the field is empty. These files are
+     <filename>tablespace_map</filename> unless the field is empty. The fourth
+     field should be written into a file named
+     <filename>global/pg_control</filename> (overwriting the existing file when
+     present). These files are
      vital to the backup working and must be written byte for byte without
-     modification, which may require opening the file in binary mode.
+     modification, which will require opening the file in binary mode.
     </para>
    </listitem>
    <listitem>
@@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true);
    </para>
 
    <para>
-    You should, however, omit from the backup the files within the
+    You should exclude <filename>global/pg_control</filename> from your backup
+    and put the contents of the <parameter>controlfile</parameter> column
+    returned from <function>pg_backup_stop</function> in your backup at
+    <filename>global/pg_control</filename>. This version of pg_control contains
+    safeguards against recovery without backup_label present and is guaranteed
+    not to be torn.
+   </para>
+
+   <para>
+    You should also omit from the backup the files within the
     cluster's <filename>pg_wal/</filename> subdirectory.  This
     slight adjustment is worthwhile because it reduces the risk
     of mistakes when restoring.  This is easy to arrange if
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d6acdd3059..a2622014ae 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27899,6 +27899,11 @@ acl      | {postgres=arwdDxtm/postgres,foo=r/postgres}
        <entry><type>boolean</type></entry>
       </row>
 
+      <row>
+       <entry><structfield>backup_label_required</structfield></entry>
+       <entry><type>boolean</type></entry>
+      </row>
+
      </tbody>
     </tgroup>
    </table>
@@ -28576,8 +28581,9 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
         The result of the function is a single record.
         The <parameter>lsn</parameter> column holds the backup's ending
         write-ahead log location (which again can be ignored).  The second
-        column returns the contents of the backup label file, and the third
-        column returns the contents of the tablespace map file.  These must be
+        column returns the contents of the backup label file, the third column
+        returns the contents of the tablespace map file, and the fourth column
+        returns the contents of pg_control.  These must be
         stored as part of the backup and are required as part of the restore
         process.
        </para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9102c8d772..d28f155700 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9110,7 +9110,9 @@ get_backup_status(void)
  * wait for WAL segments to be archived.
  *
  * "state" is filled with the information necessary to restore from this
- * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc.
+ * backup with its stop LSN (stoppoint), its timeline ID (stoptli), etc. This
+ * includes a copy of pg_control with safeguards against it being used without
+ * backup_label.
  *
  * It is the responsibility of the caller of this function to verify the
  * permissions of the calling user!
@@ -9127,11 +9129,30 @@ do_pg_backup_stop(BackupState *state, bool waitforarchive)
 	int			seconds_before_warning;
 	int			waits = 0;
 	bool		reported_waiting = false;
+	ControlFileData *controlFileCopy = (ControlFileData *)state->controlFile;
 
 	Assert(state != NULL);
 
 	backup_stopped_in_recovery = RecoveryInProgress();
 
+	/*
+	 * Create a copy of control data and update it to require a backup label
+	 * for recovery. Also recalculate the CRC.
+	 */
+	memset(
+		state->controlFile + sizeof(ControlFileData), 0,
+		PG_CONTROL_FILE_SIZE - sizeof(ControlFileData));
+
+	LWLockAcquire(ControlFileLock, LW_SHARED);
+	memcpy(controlFileCopy, ControlFile, sizeof(ControlFileData));
+	LWLockRelease(ControlFileLock);
+
+	controlFileCopy->backupLabelRequired = true;
+
+	INIT_CRC32C(controlFileCopy->crc);
+	COMP_CRC32C(controlFileCopy->crc, controlFileCopy, offsetof(ControlFileData, crc));
+	FIN_CRC32C(controlFileCopy->crc);
+
 	/*
 	 * During recovery, we don't need to check WAL level. Because, if WAL
 	 * level is not sufficient, it's impossible to get here during recovery.
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e3d2bb618..fc2401d859 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -115,9 +115,11 @@ pg_backup_start(PG_FUNCTION_ARGS)
  *
  * The backup_label contains the user-supplied label string (typically this
  * would be used to tell where the backup dump will be stored), the starting
- * time, starting WAL location for the dump and so on.  It is the caller's
- * responsibility to write the backup_label and tablespace_map files in the
- * data folder that will be restored from this backup.
+ * time, starting WAL location for the dump and so on.  The pg_control file
+ * contains represents a consistent copy of pg_control that also has a safeguard
+ * against being used without backup_label.  It is the caller's responsibility
+ * to write the backup_label, pg_control, and tablespace_map files in the data
+ * folder that will be restored from this backup.
  *
  * Permission checking for this function is managed through the normal
  * GRANT system.
@@ -125,12 +127,13 @@ pg_backup_start(PG_FUNCTION_ARGS)
 Datum
 pg_backup_stop(PG_FUNCTION_ARGS)
 {
-#define PG_BACKUP_STOP_V2_COLS 3
+#define PG_BACKUP_STOP_V2_COLS 4
 	TupleDesc	tupdesc;
 	Datum		values[PG_BACKUP_STOP_V2_COLS] = {0};
 	bool		nulls[PG_BACKUP_STOP_V2_COLS] = {0};
 	bool		waitforarchive = PG_GETARG_BOOL(0);
 	char	   *backup_label;
+	bytea	   *pg_control_bytea;
 	SessionBackupState status = get_backup_status();
 
 	/* Initialize attributes information in the tuple descriptor */
@@ -152,9 +155,15 @@ pg_backup_stop(PG_FUNCTION_ARGS)
 	/* Build the contents of backup_label */
 	backup_label = build_backup_content(backup_state, false);
 
+	/* Build the contents of pg_control */
+	pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ);
+	SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ);
+	memcpy(VARDATA(pg_control_bytea), backup_state->controlFile, PG_CONTROL_FILE_SIZE);
+
 	values[0] = LSNGetDatum(backup_state->stoppoint);
 	values[1] = CStringGetTextDatum(backup_label);
 	values[2] = CStringGetTextDatum(tablespace_map->data);
+	values[3] = PointerGetDatum(pg_control_bytea);
 
 	/* Deallocate backup-related variables */
 	pfree(backup_label);
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 320b14add1..193b7e2045 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -704,7 +704,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	}
 	else
 	{
-		/* No backup_label file has been found if we are here. */
+		/*
+		 * No backup_label file has been found if we are here. Error if the
+		 * control file requires backup_label.
+		 */
+		if (ControlFile->backupLabelRequired)
+			ereport(FATAL,
+					(errmsg("could not find backup_label required for recovery"),
+					 errhint("backup_label must be present for recovery to succeed")));
 
 		/*
 		 * If tablespace_map file is present without backup_label file, there
@@ -977,6 +984,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 		{
 			ControlFile->backupStartPoint = checkPoint.redo;
 			ControlFile->backupEndRequired = backupEndRequired;
+			ControlFile->backupLabelRequired = false;
 
 			if (backupFromStandby)
 			{
diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 14e5ba72e9..d0d4e74cb0 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -325,7 +325,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 
 			if (ti->path == NULL)
 			{
-				struct stat statbuf;
 				bool		sendtblspclinks = true;
 				char	   *backup_label;
 
@@ -348,16 +347,6 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 				/* Then the bulk of the files... */
 				sendDir(sink, ".", 1, false, state.tablespaces,
 						sendtblspclinks, &manifest, InvalidOid, ib);
-
-				/* ... and pg_control after everything else. */
-				if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0)
-					ereport(ERROR,
-							(errcode_for_file_access(),
-							 errmsg("could not stat file \"%s\": %m",
-									XLOG_CONTROL_FILE)));
-				sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf,
-						 false, InvalidOid, InvalidOid,
-						 InvalidRelFileNumber, 0, &manifest, 0, NULL, 0);
 			}
 			else
 			{
@@ -374,7 +363,7 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 			 * include the xlog files below and stop afterwards. This is safe
 			 * since the main data directory is always sent *last*.
 			 */
-			if (opt->includewal && ti->path == NULL)
+			if (ti->path == NULL)
 			{
 				Assert(lnext(state.tablespaces, lc) == NULL);
 			}
@@ -394,6 +383,11 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 		basebackup_progress_wait_wal_archive(&state);
 		do_pg_backup_stop(backup_state, !opt->nowait);
 
+		/* Send pg_control */
+		sendFileWithContent(sink, XLOG_CONTROL_FILE,
+							(char *)backup_state->controlFile,
+							PG_CONTROL_FILE_SIZE, &manifest);
+
 		endptr = backup_state->stoppoint;
 		endtli = backup_state->stoptli;
 
@@ -632,16 +626,16 @@ perform_base_backup(basebackup_options *opt, bbsink *sink,
 			StatusFilePath(pathbuf, fname, ".done");
 			sendFileWithContent(sink, pathbuf, "", -1, &manifest);
 		}
+	}
 
-		/* Properly terminate the tar file. */
-		StaticAssertStmt(2 * TAR_BLOCK_SIZE <= BLCKSZ,
-						 "BLCKSZ too small for 2 tar blocks");
-		memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
-		bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
+	/* Properly terminate the tar file. */
+	StaticAssertStmt(2 * TAR_BLOCK_SIZE <= BLCKSZ,
+						"BLCKSZ too small for 2 tar blocks");
+	memset(sink->bbs_buffer, 0, 2 * TAR_BLOCK_SIZE);
+	bbsink_archive_contents(sink, 2 * TAR_BLOCK_SIZE);
 
-		/* OK, that's the end of the archive. */
-		bbsink_end_archive(sink);
-	}
+	/* OK, that's the end of the archive. */
+	bbsink_end_archive(sink);
 
 	AddWALInfoToBackupManifest(&manifest, state.startptr, state.starttli,
 							   endptr, endtli);
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index b0d0de051e..758f90b2c3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION
 
 CREATE OR REPLACE FUNCTION pg_backup_stop (
         wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn,
-        OUT labelfile text, OUT spcmapfile text)
+        OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea)
   RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop'
   PARALLEL RESTRICTED;
 
diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c
index 98c932dc7b..9eaf3f8b9f 100644
--- a/src/backend/utils/misc/pg_controldata.c
+++ b/src/backend/utils/misc/pg_controldata.c
@@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS)
 Datum
 pg_control_recovery(PG_FUNCTION_ARGS)
 {
-	Datum		values[5];
-	bool		nulls[5];
+	Datum		values[6];
+	bool		nulls[6];
 	TupleDesc	tupdesc;
 	HeapTuple	htup;
 	ControlFileData *ControlFile;
@@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS)
 	values[4] = BoolGetDatum(ControlFile->backupEndRequired);
 	nulls[4] = false;
 
+	values[5] = BoolGetDatum(ControlFile->backupLabelRequired);
+	nulls[5] = false;
+
 	htup = heap_form_tuple(tupdesc, values, nulls);
 
 	PG_RETURN_DATUM(HeapTupleGetDatum(htup));
diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c
index 93a05d80ca..8cc29904c6 100644
--- a/src/bin/pg_controldata/pg_controldata.c
+++ b/src/bin/pg_controldata/pg_controldata.c
@@ -283,6 +283,8 @@ main(int argc, char *argv[])
 		   LSN_FORMAT_ARGS(ControlFile->backupEndPoint));
 	printf(_("End-of-backup record required:        %s\n"),
 		   ControlFile->backupEndRequired ? _("yes") : _("no"));
+	printf(_("Backup label required:                %s\n"),
+		   ControlFile->backupLabelRequired ? _("yes") : _("no"));
 	printf(_("wal_level setting:                    %s\n"),
 		   wal_level_str(ControlFile->wal_level));
 	printf(_("wal_log_hints setting:                %s\n"),
diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
index e9dcb5a6d8..7056752cff 100644
--- a/src/bin/pg_resetwal/pg_resetwal.c
+++ b/src/bin/pg_resetwal/pg_resetwal.c
@@ -875,6 +875,7 @@ RewriteControlFile(void)
 	ControlFile.backupStartPoint = 0;
 	ControlFile.backupEndPoint = 0;
 	ControlFile.backupEndRequired = false;
+	ControlFile.backupLabelRequired = false;
 
 	/*
 	 * Force the defaults for max_* settings. The values don't really matter
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index 960916a1e8..79a44b70ee 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -722,6 +722,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
 	ControlFile_new.minRecoveryPoint = endrec;
 	ControlFile_new.minRecoveryPointTLI = endtli;
 	ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY;
+	ControlFile_new.backupLabelRequired = true;
 	if (!dry_run)
 		update_controlfile(datadir_target, &ControlFile_new, do_sync);
 }
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
index c30d4a5991..93a0e5c5cc 100644
--- a/src/include/access/xlogbackup.h
+++ b/src/include/access/xlogbackup.h
@@ -15,6 +15,7 @@
 #define XLOG_BACKUP_H
 
 #include "access/xlogdefs.h"
+#include "catalog/pg_control.h"
 #include "pgtime.h"
 
 /* Structure to hold backup state. */
@@ -35,6 +36,13 @@ typedef struct BackupState
 	XLogRecPtr	stoppoint;		/* backup stop WAL location */
 	TimeLineID	stoptli;		/* backup stop TLI */
 	pg_time_t	stoptime;		/* backup stop time */
+
+	/*
+	 * After pg_backup_stop() returns this field will contain a copy of
+	 * pg_control that should be stored with the backup. backupLabelRequired
+	 * has been set so backup_label will be required for recovery to start.
+	 */
+	uint8_t controlFile[PG_CONTROL_FILE_SIZE];
 } BackupState;
 
 extern char *build_backup_content(BackupState *state,
diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h
index e80ff8e414..b471a9b02e 100644
--- a/src/include/catalog/pg_control.h
+++ b/src/include/catalog/pg_control.h
@@ -164,12 +164,16 @@ typedef struct ControlFileData
 	 * If backupEndRequired is true, we know for sure that we're restoring
 	 * from a backup, and must see a backup-end record before we can safely
 	 * start up.
+	 *
+	 * If backupLabelRequired is true, then a backup_label file must be
+	 * present in order for recovery to succeed.
 	 */
 	XLogRecPtr	minRecoveryPoint;
 	TimeLineID	minRecoveryPointTLI;
 	XLogRecPtr	backupStartPoint;
 	XLogRecPtr	backupEndPoint;
 	bool		backupEndRequired;
+	bool		backupLabelRequired;
 
 	/*
 	 * Parameter settings that determine if the WAL can be used for archival
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..ea93241eed 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6569,8 +6569,8 @@
 { oid => '2739', descr => 'finish taking an online backup',
   proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r',
   prorettype => 'record', proargtypes => 'bool',
-  proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}',
-  proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}',
+  proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}',
+  proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}',
   prosrc => 'pg_backup_stop' },
 { oid => '3436', descr => 'promote standby server',
   proname => 'pg_promote', provolatile => 'v', prorettype => 'bool',
@@ -12119,9 +12119,9 @@
 { oid => '3443',
   descr => 'pg_controldata recovery state information as a function',
   proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record',
-  proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}',
-  proargmodes => '{o,o,o,o,o}',
-  proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}',
+  proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}',
+  proargmodes => '{o,o,o,o,o,o}',
+  proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}',
   prosrc => 'pg_control_recovery' },
 
 { oid => '3444',
diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl
index bc447330e1..462f1dcf0d 100644
--- a/src/test/recovery/t/002_archiving.pl
+++ b/src/test/recovery/t/002_archiving.pl
@@ -41,6 +41,26 @@ $node_standby->append_conf(
 archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file'
 recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file'
 ));
+
+# Rename backup_label to verify that recovery will not start without it
+rename("$data_dir/backup_label", "$data_dir/backup_label.tmp")
+  or BAIL_OUT "could not move $data_dir/backup_label";
+
+my $res = run_log(
+	[
+		'pg_ctl', '-D', $node_standby->data_dir, '-l',
+		$node_standby->logfile, 'start'
+	]);
+ok(!$res, 'invalid recovery startup fails');
+
+my $logfile = slurp_file($node_standby->logfile());
+ok($logfile =~ qr/could not find backup_label required for recovery/,
+	'could not find backup_label required for recovery');
+
+# Restore backup_label so recovery proceeds normally
+rename("$data_dir/backup_label.tmp", "$data_dir/backup_label")
+  or BAIL_OUT "could not move $data_dir/backup_label";
+
 $node_standby->start;
 
 # Create some content on primary
diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl
index 61d23187e0..bd3a99960f 100644
--- a/src/test/recovery/t/042_low_level_backup.pl
+++ b/src/test/recovery/t/042_low_level_backup.pl
@@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+# Decode hex to binary
+sub decode_hex
+{
+	my ($encoded) = @_;
+	my $decoded;
+
+	$encoded =~ s/^\s+|\s+$//g;
+
+	for (my $idx = 0; $idx < length($encoded); $idx += 2)
+	{
+		$decoded .= pack('C', hex(substr($encoded, $idx, 2)));
+	}
+
+	return $decoded;
+}
+
+# Get backup_label/pg_control from pg_stop_backup()
+sub stop_backup_result
+{
+	my ($psql) = @_;
+
+	my $encoded = $psql->query_safe(
+		"select encode(labelfile::bytea, 'hex') || ',' || " .
+		"       encode(controlfile, 'hex')" .
+		"  from pg_backup_stop()");
+
+	my @result;
+
+    foreach my $column (split(',', $encoded))
+	{
+		push(@result, decode_hex($column));
+	}
+
+	return @result;
+}
+
 # Start primary node with archiving.
 my $node_primary = PostgreSQL::Test::Cluster->new('primary');
 $node_primary->init(has_archiving => 1, allows_streaming => 1);
@@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres',
 	'SELECT pg_walfile_name(pg_current_wal_lsn())');
 
 # Stop backup and get backup_label, the last segment is archived.
-my $backup_label =
-  $psql->query_safe("select labelfile from pg_backup_stop()");
+(my $backup_label, my $pg_control) = stop_backup_result($psql);
 
 $psql->quit;
 
@@ -118,10 +153,36 @@ ok( $node_replica->log_contains(
 $node_replica->teardown_node;
 $node_replica->clean_node;
 
+# Save only pg_control into the backup to demonstrate that pg_control returned
+# from pg_stop_backup() will only perform recovery when backup_label is present.
+open(my $fh, ">", "$backup_dir/global/pg_control")
+  or die "could not open pg_control";
+binmode($fh);
+syswrite($fh, $pg_control);
+close($fh);
+
+$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2');
+$node_replica->init_from_backup($node_primary, $backup_name,
+	has_restoring => 1);
+
+my $res = run_log(
+	[
+		'pg_ctl', '-D', $node_replica->data_dir, '-l',
+		$node_replica->logfile, 'start'
+	]);
+ok(!$res, 'invalid recovery startup fails');
+
+my $logfile = slurp_file($node_replica->logfile());
+ok($logfile =~ qr/could not find backup_label required for recovery/,
+	'could not find backup_label required for recovery');
+
+$node_replica->teardown_node;
+$node_replica->clean_node;
+
 # Save backup_label into the backup directory and recover using the primary's
 # archive.  This time recovery will succeed and the canary table will be
 # present.
-open my $fh, ">>", "$backup_dir/backup_label"
+open $fh, ">>", "$backup_dir/backup_label"
   or die "could not open backup_label";
 # Binary mode is required for Windows, as the backup_label parsing is not
 # able to cope with CRLFs.
-- 
2.34.1



view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Return pg_control from pg_backup_stop().
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox