public inbox for [email protected]help / color / mirror / Atom feed
Re: Return pg_control from pg_backup_stop(). 3+ messages / 2 participants [nested] [flat]
* Re: Return pg_control from pg_backup_stop(). @ 2024-10-03 04:45 Michael Paquier <[email protected]> 2024-10-03 09:11 ` Re: Return pg_control from pg_backup_stop(). David Steele <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Michael Paquier @ 2024-10-03 04:45 UTC (permalink / raw) To: David Steele <[email protected]>; +Cc: pgsql-hackers On Wed, Oct 02, 2024 at 09:03:27AM +0000, David Steele wrote: > On 10/2/24 10:11, Michael Paquier wrote: >> Hmm, okay. There is also a slight impact for BASE_BACKUP, requiring >> basebackup_progress_wait_wal_archive() and do_pg_backup_stop() to be >> called earlier when sending the main data directory which is the last >> one in the list of tablespaces. As far as I can see, this does not >> change the logic because do_pg_backup_stop() does not touch the >> control file, but it seems to me that we'd rather keep these two >> calls as they are now, and send the control file once we are out of >> the for loop that processes all the tablespaces. That seems slightly >> cleaner to me, still I agree that both are the same things. > > Sending pg_control later results in even more code churn because of how tar > files are terminated. I've updated it that way in v2 so you can see what I > mean. I don't have a strong preference, though, so if you prefer the > implementation in v2 then that's fine with me. It does not make much of a difference, indeed. >> Anyway, finishing do_pg_backup_stop() and then sending the control >> file is just a consequence of the implementation choice driven by the >> output required for the SQL function so as this is stored in the >> backup state to get it down to pg_backup_stop() in xlogfuncs.c, so we >> could also take one step back and forget about the SQL function, >> setting only the new flag when sending a BASE_BACKUP, or just not use >> the backupState to store this data as the exercise involves forcing >> one boolean and recalculate a CRC32. > > I can definitely see us making other updates to pg_control so I would rather > keep this logic centralized, even though it is not too complicated at this > point. Still, even 8 lines of code (as it is now) seems better not to > duplicate. I was wondering if the field update should be hidden behind a macro that uses an offsetof() on ControlFileData, with the name of the field and a pointer to the value to update to. If you include the CRC32 calculation in that, that makes for less chunks of code when updating one field of the control file. The full CRC calculation could also be hidden inside a macro, as there are a couple of places where we do the same things, like pg_rewind.c, etc. >> We're talking about a 8kB file which has a size of 512B >> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to >> see your point here? > > Even at 512B it is possible to see tears in pg_control and they happen in > the build farm right now. In fact, this thread [1] trying to fix the problem > was what got me thinking about alternate solutions to preventing tears in > pg_control. Thomas' proposed fixes have not been committed to my knowledge > so the problem remains, but would be fixed by this commit. Ah, right. That rings a bell. Thomas has done some work with c558e6fd92ff and 63a582222c6b. And we're still not taking the ControlFileLock while copying it over.. It looks like we should do it separately, and backpatch. That's not something for this thread to worry about. >> Perhaps existing >> backup solutions are good enough risk vs reward is not worth it? > > I'm not sure I see the risk here. Saving out pg_control is optional so no > changes to current software is required. Of course they miss the benefit of > the protection against tears and missing backup_label, but that is a choice. > > Again, optional, but if I was able to manage these saves using the psql > interface in the TAP tests then I'd say it would be pretty easy for anyone > with a normal connection to Postgres. Also, we require users to treat > tabelspace_map and backup_label as binary so not too big a change here. Maintenance cost for a limited user impact overall. With incremental backups being a thing in v18 only available through the replication protocol, the SQL functions have less advantages these days. My point would be to see this thread as a two-step process: 1) Update the flag in the control file when sending it across in replication stream. 2) Do the SQL function thing with the bytea for the control file, if necessary. 1) is something that has more value than 2), IMO, because there is no need for a manual step when a backup is taken by the replication protocol. Well, custom backup solutions that rely on the replication protocol to copy the data would need to make sure that they have a backup_label, but that's something they should do anyway and what this patch wants to protect users from. The SQL part is optional IMO. It can be done, but it has less impact overall and makes backups more complicated by requiring the manual copy of the control file. -- Michael Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Return pg_control from pg_backup_stop(). 2024-10-03 04:45 Re: Return pg_control from pg_backup_stop(). Michael Paquier <[email protected]> @ 2024-10-03 09:11 ` David Steele <[email protected]> 2024-11-20 22:44 ` Re: Return pg_control from pg_backup_stop(). David Steele <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: David Steele @ 2024-10-03 09:11 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: pgsql-hackers On 10/3/24 07:45, Michael Paquier wrote: > On Wed, Oct 02, 2024 at 09:03:27AM +0000, David Steele wrote: >> On 10/2/24 10:11, Michael Paquier wrote: > >> I can definitely see us making other updates to pg_control so I would rather >> keep this logic centralized, even though it is not too complicated at this >> point. Still, even 8 lines of code (as it is now) seems better not to >> duplicate. > > I was wondering if the field update should be hidden behind a macro > that uses an offsetof() on ControlFileData, with the name of the field > and a pointer to the value to update to. If you include the CRC32 > calculation in that, that makes for less chunks of code when updating > one field of the control file. > > The full CRC calculation could also be hidden inside a macro, as there > are a couple of places where we do the same things, like pg_rewind.c, > etc. This seems to be a different case than pg_rewind, especially since we need a ControlFileLock. I think that is a bit much to do in a macro, so I split the functionality out into a function instead. This simplifies the logic in basebackup.c but has little impact elsewhere. >>> We're talking about a 8kB file which has a size of 512B >>> (PG_CONTROL_MAX_SAFE_SIZE) to avoid such issues. So I'm not sure to >>> see your point here? >> >> Even at 512B it is possible to see tears in pg_control and they happen in >> the build farm right now. In fact, this thread [1] trying to fix the problem >> was what got me thinking about alternate solutions to preventing tears in >> pg_control. Thomas' proposed fixes have not been committed to my knowledge >> so the problem remains, but would be fixed by this commit. > > Ah, right. That rings a bell. Thomas has done some work with > c558e6fd92ff and 63a582222c6b. And we're still not taking the > ControlFileLock while copying it over.. It looks like we should do it > separately, and backpatch. That's not something for this thread to > worry about. I'd be happy to adapt patch 01 to be back-patched (without the new flag) if we decide it is a good idea. Just locking and making a copy of pg_control is easy enough, but if we accept the backup_control_file() function for new versions then we could keep that for the back patch to reduce churn between versions. >>> Perhaps existing >>> backup solutions are good enough risk vs reward is not worth it? >> >> I'm not sure I see the risk here. Saving out pg_control is optional so no >> changes to current software is required. Of course they miss the benefit of >> the protection against tears and missing backup_label, but that is a choice. >> >> Again, optional, but if I was able to manage these saves using the psql >> interface in the TAP tests then I'd say it would be pretty easy for anyone >> with a normal connection to Postgres. Also, we require users to treat >> tabelspace_map and backup_label as binary so not too big a change here. > > Maintenance cost for a limited user impact overall. With incremental > backups being a thing in v18 only available through the replication > protocol, the SQL functions have less advantages these days. My point > would be to see this thread as a two-step process: > 1) Update the flag in the control file when sending it across in > replication stream. > 2) Do the SQL function thing with the bytea for the control file, if > necessary. OK, I have split the patch into two parts along these lines. > 1) is something that has more value than 2), IMO, because there is no > need for a manual step when a backup is taken by the replication > protocol. Well, custom backup solutions that rely on the replication > protocol to copy the data would need to make sure that they have a > backup_label, but that's something they should do anyway and what this > patch wants to protect users from. The SQL part is optional IMO. It > can be done, but it has less impact overall and makes backups more > complicated by requiring the manual copy of the control file. I don't think having incremental backup in pg_basebackup means alternate backup solutions are going away or that we should deprecate the SQL interface. If nothing else, third-party solutions need a way to get an untorn copy of pg_control and in general I think the new flag will be universally useful. Regards, -David From 77df2c779b57dde90f81d5244b55e4d0d644137d Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Thu, 3 Oct 2024 08:41:45 +0000 Subject: Add pg_control flag to prevent recovery without backup_label. Harden recovery by adding a flag to pg_control to indicate that backup_label is required. This prevents the user from deleting backup_label resulting in an inconsistent recovery. Another advantage is that the copy of pg_control used by pg_basebackup is guaranteed not to be torn. This functionality is limited to pg_basebackup (or any software comfortable with modifying pg_control). Control and catalog version bumps are required. --- doc/src/sgml/func.sgml | 5 +++++ src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++ src/backend/access/transam/xlogrecovery.c | 10 +++++++++- src/backend/backup/basebackup.c | 15 ++++++--------- src/backend/utils/misc/pg_controldata.c | 7 +++++-- src/bin/pg_controldata/pg_controldata.c | 2 ++ src/bin/pg_resetwal/pg_resetwal.c | 1 + src/bin/pg_rewind/pg_rewind.c | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_control.h | 4 ++++ src/include/catalog/pg_proc.dat | 6 +++--- src/test/recovery/t/002_archiving.pl | 20 ++++++++++++++++++++ 12 files changed, 80 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7b4fbb5047..786269b920 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27899,6 +27899,11 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} <entry><type>boolean</type></entry> </row> + <row> + <entry><structfield>backup_label_required</structfield></entry> + <entry><type>boolean</type></entry> + </row> + </tbody> </tgroup> </table> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9102c8d772..79065b55ef 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9412,6 +9412,29 @@ do_pg_abort_backup(int code, Datum arg) } } +/* + * Create a consistent copy of control data to be used for backup and update it + * to require a backup label for recovery. Also recalculate the CRC. + */ +void +backup_control_file(uint8_t *controlFile) +{ + ControlFileData *controlData = ((ControlFileData *)controlFile); + + memset(controlFile + sizeof(ControlFileData), 0, + PG_CONTROL_FILE_SIZE - sizeof(ControlFileData)); + + LWLockAcquire(ControlFileLock, LW_SHARED); + memcpy(controlFile, ControlFile, sizeof(ControlFileData)); + LWLockRelease(ControlFileLock); + + controlData->backupLabelRequired = true; + + INIT_CRC32C(controlData->crc); + COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc)); + FIN_CRC32C(controlData->crc); +} + /* * Register a handler that will warn about unterminated backups at end of * session, unless this has already been done. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 320b14add1..193b7e2045 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -704,7 +704,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, } else { - /* No backup_label file has been found if we are here. */ + /* + * No backup_label file has been found if we are here. Error if the + * control file requires backup_label. + */ + if (ControlFile->backupLabelRequired) + ereport(FATAL, + (errmsg("could not find backup_label required for recovery"), + errhint("backup_label must be present for recovery to succeed"))); /* * If tablespace_map file is present without backup_label file, there @@ -977,6 +984,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; + ControlFile->backupLabelRequired = false; if (backupFromStandby) { diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 14e5ba72e9..ce7fd76e48 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -23,6 +23,7 @@ #include "backup/basebackup_incremental.h" #include "backup/basebackup_sink.h" #include "backup/basebackup_target.h" +#include "catalog/pg_control.h" #include "catalog/pg_tablespace_d.h" #include "commands/defrem.h" #include "common/compression.h" @@ -325,9 +326,9 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, if (ti->path == NULL) { - struct stat statbuf; bool sendtblspclinks = true; char *backup_label; + uint8_t controlFile[PG_CONTROL_FILE_SIZE]; bbsink_begin_archive(sink, "base.tar"); @@ -350,14 +351,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, sendtblspclinks, &manifest, InvalidOid, ib); /* ... and pg_control after everything else. */ - if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not stat file \"%s\": %m", - XLOG_CONTROL_FILE))); - sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, - false, InvalidOid, InvalidOid, - InvalidRelFileNumber, 0, &manifest, 0, NULL, 0); + backup_control_file(controlFile); + sendFileWithContent(sink, XLOG_CONTROL_FILE, + (char *)controlFile, PG_CONTROL_FILE_SIZE, + &manifest); } else { diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 98c932dc7b..9eaf3f8b9f 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) Datum pg_control_recovery(PG_FUNCTION_ARGS) { - Datum values[5]; - bool nulls[5]; + Datum values[6]; + bool nulls[6]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS) values[4] = BoolGetDatum(ControlFile->backupEndRequired); nulls[4] = false; + values[5] = BoolGetDatum(ControlFile->backupLabelRequired); + nulls[5] = false; + htup = heap_form_tuple(tupdesc, values, nulls); PG_RETURN_DATUM(HeapTupleGetDatum(htup)); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 93a05d80ca..8cc29904c6 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -283,6 +283,8 @@ main(int argc, char *argv[]) LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); printf(_("End-of-backup record required: %s\n"), ControlFile->backupEndRequired ? _("yes") : _("no")); + printf(_("Backup label required: %s\n"), + ControlFile->backupLabelRequired ? _("yes") : _("no")); printf(_("wal_level setting: %s\n"), wal_level_str(ControlFile->wal_level)); printf(_("wal_log_hints setting: %s\n"), diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index e9dcb5a6d8..7056752cff 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -875,6 +875,7 @@ RewriteControlFile(void) ControlFile.backupStartPoint = 0; ControlFile.backupEndPoint = 0; ControlFile.backupEndRequired = false; + ControlFile.backupLabelRequired = false; /* * Force the defaults for max_* settings. The values don't really matter diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index 960916a1e8..79a44b70ee 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -722,6 +722,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source, ControlFile_new.minRecoveryPoint = endrec; ControlFile_new.minRecoveryPointTLI = endtli; ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; + ControlFile_new.backupLabelRequired = true; if (!dry_run) update_controlfile(datadir_target, &ControlFile_new, do_sync); } diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 34ad46c067..9d5d8ed43c 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -295,6 +295,7 @@ extern void do_pg_backup_start(const char *backupidstr, bool fast, StringInfo tblspcmapfile); extern void do_pg_backup_stop(BackupState *state, bool waitforarchive); extern void do_pg_abort_backup(int code, Datum arg); +extern void backup_control_file(uint8_t *controlFile); extern void register_persistent_abort_backup_handler(void); extern SessionBackupState get_backup_status(void); diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index e80ff8e414..b471a9b02e 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -164,12 +164,16 @@ typedef struct ControlFileData * If backupEndRequired is true, we know for sure that we're restoring * from a backup, and must see a backup-end record before we can safely * start up. + * + * If backupLabelRequired is true, then a backup_label file must be + * present in order for recovery to succeed. */ XLogRecPtr minRecoveryPoint; TimeLineID minRecoveryPointTLI; XLogRecPtr backupStartPoint; XLogRecPtr backupEndPoint; bool backupEndRequired; + bool backupLabelRequired; /* * Parameter settings that determine if the WAL can be used for archival diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 77f54a79e6..7cc5421ea1 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12119,9 +12119,9 @@ { oid => '3443', descr => 'pg_controldata recovery state information as a function', proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record', - proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}', - proargmodes => '{o,o,o,o,o}', - proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', + proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}', + proargmodes => '{o,o,o,o,o,o}', + proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}', prosrc => 'pg_control_recovery' }, { oid => '3444', diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl index bc447330e1..462f1dcf0d 100644 --- a/src/test/recovery/t/002_archiving.pl +++ b/src/test/recovery/t/002_archiving.pl @@ -41,6 +41,26 @@ $node_standby->append_conf( archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file' recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file' )); + +# Rename backup_label to verify that recovery will not start without it +rename("$data_dir/backup_label", "$data_dir/backup_label.tmp") + or BAIL_OUT "could not move $data_dir/backup_label"; + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_standby->data_dir, '-l', + $node_standby->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +# Restore backup_label so recovery proceeds normally +rename("$data_dir/backup_label.tmp", "$data_dir/backup_label") + or BAIL_OUT "could not move $data_dir/backup_label"; + $node_standby->start; # Create some content on primary -- 2.34.1 From dba5a2a57e5472682a9be6d4badf1e8127768610 Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Thu, 3 Oct 2024 08:41:45 +0000 Subject: Return pg_control from pg_backup_stop(). Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required. --- doc/src/sgml/backup.sgml | 18 +++++- doc/src/sgml/func.sgml | 5 +- src/backend/access/transam/xlogfuncs.c | 17 ++++-- src/backend/catalog/system_functions.sql | 2 +- src/include/catalog/pg_proc.dat | 4 +- src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++- 6 files changed, 98 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index e4e4c56cf1..2fcf181121 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); values. The second of these fields should be written to a file named <filename>backup_label</filename> in the root directory of the backup. The third field should be written to a file named - <filename>tablespace_map</filename> unless the field is empty. These files are + <filename>tablespace_map</filename> unless the field is empty. The fourth + field should be written into a file named + <filename>global/pg_control</filename> (overwriting the existing file when + present). These files are vital to the backup working and must be written byte for byte without - modification, which may require opening the file in binary mode. + modification, which will require opening the file in binary mode. </para> </listitem> <listitem> @@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </para> <para> - You should, however, omit from the backup the files within the + You should exclude <filename>global/pg_control</filename> from your backup + and put the contents of the <parameter>controlfile</parameter> column + returned from <function>pg_backup_stop</function> in your backup at + <filename>global/pg_control</filename>. This version of pg_control contains + safeguards against recovery without backup_label present and is guaranteed + not to be torn. + </para> + + <para> + You should also omit from the backup the files within the cluster's <filename>pg_wal/</filename> subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 786269b920..09204db7cd 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28581,8 +28581,9 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 The result of the function is a single record. The <parameter>lsn</parameter> column holds the backup's ending write-ahead log location (which again can be ignored). The second - column returns the contents of the backup label file, and the third - column returns the contents of the tablespace map file. These must be + column returns the contents of the backup label file, the third column + returns the contents of the tablespace map file, and the fourth column + returns the contents of pg_control. These must be stored as part of the backup and are required as part of the restore process. </para> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 3e3d2bb618..a7d7c1bcbe 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -115,9 +115,11 @@ pg_backup_start(PG_FUNCTION_ARGS) * * The backup_label contains the user-supplied label string (typically this * would be used to tell where the backup dump will be stored), the starting - * time, starting WAL location for the dump and so on. It is the caller's - * responsibility to write the backup_label and tablespace_map files in the - * data folder that will be restored from this backup. + * time, starting WAL location for the dump and so on. The pg_control file + * contains a consistent copy of pg_control that also has a safeguard against + * being used without backup_label. It is the caller's responsibility to write + * the backup_label, pg_control, and tablespace_map files in the data folder + * that will be restored from this backup. * * Permission checking for this function is managed through the normal * GRANT system. @@ -125,12 +127,13 @@ pg_backup_start(PG_FUNCTION_ARGS) Datum pg_backup_stop(PG_FUNCTION_ARGS) { -#define PG_BACKUP_STOP_V2_COLS 3 +#define PG_BACKUP_STOP_V2_COLS 4 TupleDesc tupdesc; Datum values[PG_BACKUP_STOP_V2_COLS] = {0}; bool nulls[PG_BACKUP_STOP_V2_COLS] = {0}; bool waitforarchive = PG_GETARG_BOOL(0); char *backup_label; + bytea *pg_control_bytea; SessionBackupState status = get_backup_status(); /* Initialize attributes information in the tuple descriptor */ @@ -152,9 +155,15 @@ pg_backup_stop(PG_FUNCTION_ARGS) /* Build the contents of backup_label */ backup_label = build_backup_content(backup_state, false); + /* Build the contents of pg_control */ + pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ); + SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ); + backup_control_file((uint8_t *)VARDATA(pg_control_bytea)); + values[0] = LSNGetDatum(backup_state->stoppoint); values[1] = CStringGetTextDatum(backup_label); values[2] = CStringGetTextDatum(tablespace_map->data); + values[3] = PointerGetDatum(pg_control_bytea); /* Deallocate backup-related variables */ pfree(backup_label); diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index b0d0de051e..758f90b2c3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION CREATE OR REPLACE FUNCTION pg_backup_stop ( wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn, - OUT labelfile text, OUT spcmapfile text) + OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea) RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop' PARALLEL RESTRICTED; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7cc5421ea1..ea93241eed 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6569,8 +6569,8 @@ { oid => '2739', descr => 'finish taking an online backup', proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => 'bool', - proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}', - proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}', + proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}', + proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}', prosrc => 'pg_backup_stop' }, { oid => '3436', descr => 'promote standby server', proname => 'pg_promote', provolatile => 'v', prorettype => 'bool', diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl index 61d23187e0..bd3a99960f 100644 --- a/src/test/recovery/t/042_low_level_backup.pl +++ b/src/test/recovery/t/042_low_level_backup.pl @@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +# Decode hex to binary +sub decode_hex +{ + my ($encoded) = @_; + my $decoded; + + $encoded =~ s/^\s+|\s+$//g; + + for (my $idx = 0; $idx < length($encoded); $idx += 2) + { + $decoded .= pack('C', hex(substr($encoded, $idx, 2))); + } + + return $decoded; +} + +# Get backup_label/pg_control from pg_stop_backup() +sub stop_backup_result +{ + my ($psql) = @_; + + my $encoded = $psql->query_safe( + "select encode(labelfile::bytea, 'hex') || ',' || " . + " encode(controlfile, 'hex')" . + " from pg_backup_stop()"); + + my @result; + + foreach my $column (split(',', $encoded)) + { + push(@result, decode_hex($column)); + } + + return @result; +} + # Start primary node with archiving. my $node_primary = PostgreSQL::Test::Cluster->new('primary'); $node_primary->init(has_archiving => 1, allows_streaming => 1); @@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres', 'SELECT pg_walfile_name(pg_current_wal_lsn())'); # Stop backup and get backup_label, the last segment is archived. -my $backup_label = - $psql->query_safe("select labelfile from pg_backup_stop()"); +(my $backup_label, my $pg_control) = stop_backup_result($psql); $psql->quit; @@ -118,10 +153,36 @@ ok( $node_replica->log_contains( $node_replica->teardown_node; $node_replica->clean_node; +# Save only pg_control into the backup to demonstrate that pg_control returned +# from pg_stop_backup() will only perform recovery when backup_label is present. +open(my $fh, ">", "$backup_dir/global/pg_control") + or die "could not open pg_control"; +binmode($fh); +syswrite($fh, $pg_control); +close($fh); + +$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2'); +$node_replica->init_from_backup($node_primary, $backup_name, + has_restoring => 1); + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_replica->data_dir, '-l', + $node_replica->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_replica->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +$node_replica->teardown_node; +$node_replica->clean_node; + # Save backup_label into the backup directory and recover using the primary's # archive. This time recovery will succeed and the canary table will be # present. -open my $fh, ">>", "$backup_dir/backup_label" +open $fh, ">>", "$backup_dir/backup_label" or die "could not open backup_label"; # Binary mode is required for Windows, as the backup_label parsing is not # able to cope with CRLFs. -- 2.34.1 Attachments: [text/plain] pgcontrol-flag-v3-01-basebackup.patch (11.4K, 2-pgcontrol-flag-v3-01-basebackup.patch) download | inline diff: From 77df2c779b57dde90f81d5244b55e4d0d644137d Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Thu, 3 Oct 2024 08:41:45 +0000 Subject: Add pg_control flag to prevent recovery without backup_label. Harden recovery by adding a flag to pg_control to indicate that backup_label is required. This prevents the user from deleting backup_label resulting in an inconsistent recovery. Another advantage is that the copy of pg_control used by pg_basebackup is guaranteed not to be torn. This functionality is limited to pg_basebackup (or any software comfortable with modifying pg_control). Control and catalog version bumps are required. --- doc/src/sgml/func.sgml | 5 +++++ src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++ src/backend/access/transam/xlogrecovery.c | 10 +++++++++- src/backend/backup/basebackup.c | 15 ++++++--------- src/backend/utils/misc/pg_controldata.c | 7 +++++-- src/bin/pg_controldata/pg_controldata.c | 2 ++ src/bin/pg_resetwal/pg_resetwal.c | 1 + src/bin/pg_rewind/pg_rewind.c | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_control.h | 4 ++++ src/include/catalog/pg_proc.dat | 6 +++--- src/test/recovery/t/002_archiving.pl | 20 ++++++++++++++++++++ 12 files changed, 80 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7b4fbb5047..786269b920 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27899,6 +27899,11 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} <entry><type>boolean</type></entry> </row> + <row> + <entry><structfield>backup_label_required</structfield></entry> + <entry><type>boolean</type></entry> + </row> + </tbody> </tgroup> </table> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9102c8d772..79065b55ef 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9412,6 +9412,29 @@ do_pg_abort_backup(int code, Datum arg) } } +/* + * Create a consistent copy of control data to be used for backup and update it + * to require a backup label for recovery. Also recalculate the CRC. + */ +void +backup_control_file(uint8_t *controlFile) +{ + ControlFileData *controlData = ((ControlFileData *)controlFile); + + memset(controlFile + sizeof(ControlFileData), 0, + PG_CONTROL_FILE_SIZE - sizeof(ControlFileData)); + + LWLockAcquire(ControlFileLock, LW_SHARED); + memcpy(controlFile, ControlFile, sizeof(ControlFileData)); + LWLockRelease(ControlFileLock); + + controlData->backupLabelRequired = true; + + INIT_CRC32C(controlData->crc); + COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc)); + FIN_CRC32C(controlData->crc); +} + /* * Register a handler that will warn about unterminated backups at end of * session, unless this has already been done. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 320b14add1..193b7e2045 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -704,7 +704,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, } else { - /* No backup_label file has been found if we are here. */ + /* + * No backup_label file has been found if we are here. Error if the + * control file requires backup_label. + */ + if (ControlFile->backupLabelRequired) + ereport(FATAL, + (errmsg("could not find backup_label required for recovery"), + errhint("backup_label must be present for recovery to succeed"))); /* * If tablespace_map file is present without backup_label file, there @@ -977,6 +984,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; + ControlFile->backupLabelRequired = false; if (backupFromStandby) { diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index 14e5ba72e9..ce7fd76e48 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -23,6 +23,7 @@ #include "backup/basebackup_incremental.h" #include "backup/basebackup_sink.h" #include "backup/basebackup_target.h" +#include "catalog/pg_control.h" #include "catalog/pg_tablespace_d.h" #include "commands/defrem.h" #include "common/compression.h" @@ -325,9 +326,9 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, if (ti->path == NULL) { - struct stat statbuf; bool sendtblspclinks = true; char *backup_label; + uint8_t controlFile[PG_CONTROL_FILE_SIZE]; bbsink_begin_archive(sink, "base.tar"); @@ -350,14 +351,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, sendtblspclinks, &manifest, InvalidOid, ib); /* ... and pg_control after everything else. */ - if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not stat file \"%s\": %m", - XLOG_CONTROL_FILE))); - sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, - false, InvalidOid, InvalidOid, - InvalidRelFileNumber, 0, &manifest, 0, NULL, 0); + backup_control_file(controlFile); + sendFileWithContent(sink, XLOG_CONTROL_FILE, + (char *)controlFile, PG_CONTROL_FILE_SIZE, + &manifest); } else { diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 98c932dc7b..9eaf3f8b9f 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) Datum pg_control_recovery(PG_FUNCTION_ARGS) { - Datum values[5]; - bool nulls[5]; + Datum values[6]; + bool nulls[6]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS) values[4] = BoolGetDatum(ControlFile->backupEndRequired); nulls[4] = false; + values[5] = BoolGetDatum(ControlFile->backupLabelRequired); + nulls[5] = false; + htup = heap_form_tuple(tupdesc, values, nulls); PG_RETURN_DATUM(HeapTupleGetDatum(htup)); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 93a05d80ca..8cc29904c6 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -283,6 +283,8 @@ main(int argc, char *argv[]) LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); printf(_("End-of-backup record required: %s\n"), ControlFile->backupEndRequired ? _("yes") : _("no")); + printf(_("Backup label required: %s\n"), + ControlFile->backupLabelRequired ? _("yes") : _("no")); printf(_("wal_level setting: %s\n"), wal_level_str(ControlFile->wal_level)); printf(_("wal_log_hints setting: %s\n"), diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index e9dcb5a6d8..7056752cff 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -875,6 +875,7 @@ RewriteControlFile(void) ControlFile.backupStartPoint = 0; ControlFile.backupEndPoint = 0; ControlFile.backupEndRequired = false; + ControlFile.backupLabelRequired = false; /* * Force the defaults for max_* settings. The values don't really matter diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index 960916a1e8..79a44b70ee 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -722,6 +722,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source, ControlFile_new.minRecoveryPoint = endrec; ControlFile_new.minRecoveryPointTLI = endtli; ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; + ControlFile_new.backupLabelRequired = true; if (!dry_run) update_controlfile(datadir_target, &ControlFile_new, do_sync); } diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 34ad46c067..9d5d8ed43c 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -295,6 +295,7 @@ extern void do_pg_backup_start(const char *backupidstr, bool fast, StringInfo tblspcmapfile); extern void do_pg_backup_stop(BackupState *state, bool waitforarchive); extern void do_pg_abort_backup(int code, Datum arg); +extern void backup_control_file(uint8_t *controlFile); extern void register_persistent_abort_backup_handler(void); extern SessionBackupState get_backup_status(void); diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index e80ff8e414..b471a9b02e 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -164,12 +164,16 @@ typedef struct ControlFileData * If backupEndRequired is true, we know for sure that we're restoring * from a backup, and must see a backup-end record before we can safely * start up. + * + * If backupLabelRequired is true, then a backup_label file must be + * present in order for recovery to succeed. */ XLogRecPtr minRecoveryPoint; TimeLineID minRecoveryPointTLI; XLogRecPtr backupStartPoint; XLogRecPtr backupEndPoint; bool backupEndRequired; + bool backupLabelRequired; /* * Parameter settings that determine if the WAL can be used for archival diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 77f54a79e6..7cc5421ea1 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12119,9 +12119,9 @@ { oid => '3443', descr => 'pg_controldata recovery state information as a function', proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record', - proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}', - proargmodes => '{o,o,o,o,o}', - proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', + proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}', + proargmodes => '{o,o,o,o,o,o}', + proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}', prosrc => 'pg_control_recovery' }, { oid => '3444', diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl index bc447330e1..462f1dcf0d 100644 --- a/src/test/recovery/t/002_archiving.pl +++ b/src/test/recovery/t/002_archiving.pl @@ -41,6 +41,26 @@ $node_standby->append_conf( archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file' recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file' )); + +# Rename backup_label to verify that recovery will not start without it +rename("$data_dir/backup_label", "$data_dir/backup_label.tmp") + or BAIL_OUT "could not move $data_dir/backup_label"; + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_standby->data_dir, '-l', + $node_standby->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +# Restore backup_label so recovery proceeds normally +rename("$data_dir/backup_label.tmp", "$data_dir/backup_label") + or BAIL_OUT "could not move $data_dir/backup_label"; + $node_standby->start; # Create some content on primary -- 2.34.1 [text/plain] pgcontrol-flag-v3-02-sql.patch (11.3K, 3-pgcontrol-flag-v3-02-sql.patch) download | inline diff: From dba5a2a57e5472682a9be6d4badf1e8127768610 Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Thu, 3 Oct 2024 08:41:45 +0000 Subject: Return pg_control from pg_backup_stop(). Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required. --- doc/src/sgml/backup.sgml | 18 +++++- doc/src/sgml/func.sgml | 5 +- src/backend/access/transam/xlogfuncs.c | 17 ++++-- src/backend/catalog/system_functions.sql | 2 +- src/include/catalog/pg_proc.dat | 4 +- src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++- 6 files changed, 98 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index e4e4c56cf1..2fcf181121 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); values. The second of these fields should be written to a file named <filename>backup_label</filename> in the root directory of the backup. The third field should be written to a file named - <filename>tablespace_map</filename> unless the field is empty. These files are + <filename>tablespace_map</filename> unless the field is empty. The fourth + field should be written into a file named + <filename>global/pg_control</filename> (overwriting the existing file when + present). These files are vital to the backup working and must be written byte for byte without - modification, which may require opening the file in binary mode. + modification, which will require opening the file in binary mode. </para> </listitem> <listitem> @@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </para> <para> - You should, however, omit from the backup the files within the + You should exclude <filename>global/pg_control</filename> from your backup + and put the contents of the <parameter>controlfile</parameter> column + returned from <function>pg_backup_stop</function> in your backup at + <filename>global/pg_control</filename>. This version of pg_control contains + safeguards against recovery without backup_label present and is guaranteed + not to be torn. + </para> + + <para> + You should also omit from the backup the files within the cluster's <filename>pg_wal/</filename> subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 786269b920..09204db7cd 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28581,8 +28581,9 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 The result of the function is a single record. The <parameter>lsn</parameter> column holds the backup's ending write-ahead log location (which again can be ignored). The second - column returns the contents of the backup label file, and the third - column returns the contents of the tablespace map file. These must be + column returns the contents of the backup label file, the third column + returns the contents of the tablespace map file, and the fourth column + returns the contents of pg_control. These must be stored as part of the backup and are required as part of the restore process. </para> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 3e3d2bb618..a7d7c1bcbe 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -115,9 +115,11 @@ pg_backup_start(PG_FUNCTION_ARGS) * * The backup_label contains the user-supplied label string (typically this * would be used to tell where the backup dump will be stored), the starting - * time, starting WAL location for the dump and so on. It is the caller's - * responsibility to write the backup_label and tablespace_map files in the - * data folder that will be restored from this backup. + * time, starting WAL location for the dump and so on. The pg_control file + * contains a consistent copy of pg_control that also has a safeguard against + * being used without backup_label. It is the caller's responsibility to write + * the backup_label, pg_control, and tablespace_map files in the data folder + * that will be restored from this backup. * * Permission checking for this function is managed through the normal * GRANT system. @@ -125,12 +127,13 @@ pg_backup_start(PG_FUNCTION_ARGS) Datum pg_backup_stop(PG_FUNCTION_ARGS) { -#define PG_BACKUP_STOP_V2_COLS 3 +#define PG_BACKUP_STOP_V2_COLS 4 TupleDesc tupdesc; Datum values[PG_BACKUP_STOP_V2_COLS] = {0}; bool nulls[PG_BACKUP_STOP_V2_COLS] = {0}; bool waitforarchive = PG_GETARG_BOOL(0); char *backup_label; + bytea *pg_control_bytea; SessionBackupState status = get_backup_status(); /* Initialize attributes information in the tuple descriptor */ @@ -152,9 +155,15 @@ pg_backup_stop(PG_FUNCTION_ARGS) /* Build the contents of backup_label */ backup_label = build_backup_content(backup_state, false); + /* Build the contents of pg_control */ + pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ); + SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ); + backup_control_file((uint8_t *)VARDATA(pg_control_bytea)); + values[0] = LSNGetDatum(backup_state->stoppoint); values[1] = CStringGetTextDatum(backup_label); values[2] = CStringGetTextDatum(tablespace_map->data); + values[3] = PointerGetDatum(pg_control_bytea); /* Deallocate backup-related variables */ pfree(backup_label); diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index b0d0de051e..758f90b2c3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION CREATE OR REPLACE FUNCTION pg_backup_stop ( wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn, - OUT labelfile text, OUT spcmapfile text) + OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea) RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop' PARALLEL RESTRICTED; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7cc5421ea1..ea93241eed 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6569,8 +6569,8 @@ { oid => '2739', descr => 'finish taking an online backup', proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => 'bool', - proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}', - proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}', + proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}', + proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}', prosrc => 'pg_backup_stop' }, { oid => '3436', descr => 'promote standby server', proname => 'pg_promote', provolatile => 'v', prorettype => 'bool', diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl index 61d23187e0..bd3a99960f 100644 --- a/src/test/recovery/t/042_low_level_backup.pl +++ b/src/test/recovery/t/042_low_level_backup.pl @@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +# Decode hex to binary +sub decode_hex +{ + my ($encoded) = @_; + my $decoded; + + $encoded =~ s/^\s+|\s+$//g; + + for (my $idx = 0; $idx < length($encoded); $idx += 2) + { + $decoded .= pack('C', hex(substr($encoded, $idx, 2))); + } + + return $decoded; +} + +# Get backup_label/pg_control from pg_stop_backup() +sub stop_backup_result +{ + my ($psql) = @_; + + my $encoded = $psql->query_safe( + "select encode(labelfile::bytea, 'hex') || ',' || " . + " encode(controlfile, 'hex')" . + " from pg_backup_stop()"); + + my @result; + + foreach my $column (split(',', $encoded)) + { + push(@result, decode_hex($column)); + } + + return @result; +} + # Start primary node with archiving. my $node_primary = PostgreSQL::Test::Cluster->new('primary'); $node_primary->init(has_archiving => 1, allows_streaming => 1); @@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres', 'SELECT pg_walfile_name(pg_current_wal_lsn())'); # Stop backup and get backup_label, the last segment is archived. -my $backup_label = - $psql->query_safe("select labelfile from pg_backup_stop()"); +(my $backup_label, my $pg_control) = stop_backup_result($psql); $psql->quit; @@ -118,10 +153,36 @@ ok( $node_replica->log_contains( $node_replica->teardown_node; $node_replica->clean_node; +# Save only pg_control into the backup to demonstrate that pg_control returned +# from pg_stop_backup() will only perform recovery when backup_label is present. +open(my $fh, ">", "$backup_dir/global/pg_control") + or die "could not open pg_control"; +binmode($fh); +syswrite($fh, $pg_control); +close($fh); + +$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2'); +$node_replica->init_from_backup($node_primary, $backup_name, + has_restoring => 1); + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_replica->data_dir, '-l', + $node_replica->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_replica->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +$node_replica->teardown_node; +$node_replica->clean_node; + # Save backup_label into the backup directory and recover using the primary's # archive. This time recovery will succeed and the canary table will be # present. -open my $fh, ">>", "$backup_dir/backup_label" +open $fh, ">>", "$backup_dir/backup_label" or die "could not open backup_label"; # Binary mode is required for Windows, as the backup_label parsing is not # able to cope with CRLFs. -- 2.34.1 ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Return pg_control from pg_backup_stop(). 2024-10-03 04:45 Re: Return pg_control from pg_backup_stop(). Michael Paquier <[email protected]> 2024-10-03 09:11 ` Re: Return pg_control from pg_backup_stop(). David Steele <[email protected]> @ 2024-11-20 22:44 ` David Steele <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: David Steele @ 2024-11-20 22:44 UTC (permalink / raw) To: Michael Paquier <[email protected]>; +Cc: pgsql-hackers On 10/3/24 05:11, David Steele wrote: > On 10/3/24 07:45, Michael Paquier wrote: > >> 1) is something that has more value than 2), IMO, because there is no >> need for a manual step when a backup is taken by the replication >> protocol. Well, custom backup solutions that rely on the replication >> protocol to copy the data would need to make sure that they have a >> backup_label, but that's something they should do anyway and what this >> patch wants to protect users from. The SQL part is optional IMO. It >> can be done, but it has less impact overall and makes backups more >> complicated by requiring the manual copy of the control file. > > I don't think having incremental backup in pg_basebackup means alternate > backup solutions are going away or that we should deprecate the SQL > interface. If nothing else, third-party solutions need a way to get an > untorn copy of pg_control and in general I think the new flag will be > universally useful. I updated this patch to fix an issue with -fsanitize=alignment. I'm not entirely happy with copying twice but not sure of another way to do it. As far as I can see VARDATA() will not return aligned data on 64-bit architectures. Regards, -David From 659a1d9b6b1528e139741dc69146848ca15a07b9 Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Wed, 20 Nov 2024 22:33:36 +0000 Subject: Return pg_control from pg_backup_stop(). Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required. --- doc/src/sgml/backup.sgml | 18 +++++- doc/src/sgml/func.sgml | 5 +- src/backend/access/transam/xlogfuncs.c | 20 ++++-- src/backend/catalog/system_functions.sql | 2 +- src/include/catalog/pg_proc.dat | 4 +- src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++- 6 files changed, 101 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index e4e4c56cf14..2fcf1811213 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); values. The second of these fields should be written to a file named <filename>backup_label</filename> in the root directory of the backup. The third field should be written to a file named - <filename>tablespace_map</filename> unless the field is empty. These files are + <filename>tablespace_map</filename> unless the field is empty. The fourth + field should be written into a file named + <filename>global/pg_control</filename> (overwriting the existing file when + present). These files are vital to the backup working and must be written byte for byte without - modification, which may require opening the file in binary mode. + modification, which will require opening the file in binary mode. </para> </listitem> <listitem> @@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </para> <para> - You should, however, omit from the backup the files within the + You should exclude <filename>global/pg_control</filename> from your backup + and put the contents of the <parameter>controlfile</parameter> column + returned from <function>pg_backup_stop</function> in your backup at + <filename>global/pg_control</filename>. This version of pg_control contains + safeguards against recovery without backup_label present and is guaranteed + not to be torn. + </para> + + <para> + You should also omit from the backup the files within the cluster's <filename>pg_wal/</filename> subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fa37a0469ae..b3d8c73295b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28600,8 +28600,9 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 The result of the function is a single record. The <parameter>lsn</parameter> column holds the backup's ending write-ahead log location (which again can be ignored). The second - column returns the contents of the backup label file, and the third - column returns the contents of the tablespace map file. These must be + column returns the contents of the backup label file, the third column + returns the contents of the tablespace map file, and the fourth column + returns the contents of pg_control. These must be stored as part of the backup and are required as part of the restore process. </para> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index b0c6d7c6875..7547bb6e28e 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -113,9 +113,11 @@ pg_backup_start(PG_FUNCTION_ARGS) * * The backup_label contains the user-supplied label string (typically this * would be used to tell where the backup dump will be stored), the starting - * time, starting WAL location for the dump and so on. It is the caller's - * responsibility to write the backup_label and tablespace_map files in the - * data folder that will be restored from this backup. + * time, starting WAL location for the dump and so on. The pg_control file + * contains a consistent copy of pg_control that also has a safeguard against + * being used without backup_label. It is the caller's responsibility to write + * the backup_label, pg_control, and tablespace_map files in the data folder + * that will be restored from this backup. * * Permission checking for this function is managed through the normal * GRANT system. @@ -123,12 +125,14 @@ pg_backup_start(PG_FUNCTION_ARGS) Datum pg_backup_stop(PG_FUNCTION_ARGS) { -#define PG_BACKUP_STOP_V2_COLS 3 +#define PG_BACKUP_STOP_V2_COLS 4 TupleDesc tupdesc; Datum values[PG_BACKUP_STOP_V2_COLS] = {0}; bool nulls[PG_BACKUP_STOP_V2_COLS] = {0}; bool waitforarchive = PG_GETARG_BOOL(0); char *backup_label; + bytea *pg_control_bytea; + uint8_t pg_control[PG_CONTROL_FILE_SIZE]; SessionBackupState status = get_backup_status(); /* Initialize attributes information in the tuple descriptor */ @@ -150,9 +154,17 @@ pg_backup_stop(PG_FUNCTION_ARGS) /* Build the contents of backup_label */ backup_label = build_backup_content(backup_state, false); + /* Build the contents of pg_control */ + backup_control_file(pg_control); + + pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ); + SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ); + memcpy(VARDATA(pg_control_bytea), pg_control, PG_CONTROL_FILE_SIZE); + values[0] = LSNGetDatum(backup_state->stoppoint); values[1] = CStringGetTextDatum(backup_label); values[2] = CStringGetTextDatum(tablespace_map->data); + values[3] = PointerGetDatum(pg_control_bytea); /* Deallocate backup-related variables */ pfree(backup_label); diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index c51dfca802f..6a29b9b3cf3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION CREATE OR REPLACE FUNCTION pg_backup_stop ( wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn, - OUT labelfile text, OUT spcmapfile text) + OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea) RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop' PARALLEL RESTRICTED; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 8d9d584735e..6f65cc7c777 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6589,8 +6589,8 @@ { oid => '2739', descr => 'finish taking an online backup', proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => 'bool', - proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}', - proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}', + proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}', + proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}', prosrc => 'pg_backup_stop' }, { oid => '3436', descr => 'promote standby server', proname => 'pg_promote', provolatile => 'v', prorettype => 'bool', diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl index 61d23187e0f..bd3a99960f5 100644 --- a/src/test/recovery/t/042_low_level_backup.pl +++ b/src/test/recovery/t/042_low_level_backup.pl @@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +# Decode hex to binary +sub decode_hex +{ + my ($encoded) = @_; + my $decoded; + + $encoded =~ s/^\s+|\s+$//g; + + for (my $idx = 0; $idx < length($encoded); $idx += 2) + { + $decoded .= pack('C', hex(substr($encoded, $idx, 2))); + } + + return $decoded; +} + +# Get backup_label/pg_control from pg_stop_backup() +sub stop_backup_result +{ + my ($psql) = @_; + + my $encoded = $psql->query_safe( + "select encode(labelfile::bytea, 'hex') || ',' || " . + " encode(controlfile, 'hex')" . + " from pg_backup_stop()"); + + my @result; + + foreach my $column (split(',', $encoded)) + { + push(@result, decode_hex($column)); + } + + return @result; +} + # Start primary node with archiving. my $node_primary = PostgreSQL::Test::Cluster->new('primary'); $node_primary->init(has_archiving => 1, allows_streaming => 1); @@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres', 'SELECT pg_walfile_name(pg_current_wal_lsn())'); # Stop backup and get backup_label, the last segment is archived. -my $backup_label = - $psql->query_safe("select labelfile from pg_backup_stop()"); +(my $backup_label, my $pg_control) = stop_backup_result($psql); $psql->quit; @@ -118,10 +153,36 @@ ok( $node_replica->log_contains( $node_replica->teardown_node; $node_replica->clean_node; +# Save only pg_control into the backup to demonstrate that pg_control returned +# from pg_stop_backup() will only perform recovery when backup_label is present. +open(my $fh, ">", "$backup_dir/global/pg_control") + or die "could not open pg_control"; +binmode($fh); +syswrite($fh, $pg_control); +close($fh); + +$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2'); +$node_replica->init_from_backup($node_primary, $backup_name, + has_restoring => 1); + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_replica->data_dir, '-l', + $node_replica->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_replica->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +$node_replica->teardown_node; +$node_replica->clean_node; + # Save backup_label into the backup directory and recover using the primary's # archive. This time recovery will succeed and the canary table will be # present. -open my $fh, ">>", "$backup_dir/backup_label" +open $fh, ">>", "$backup_dir/backup_label" or die "could not open backup_label"; # Binary mode is required for Windows, as the backup_label parsing is not # able to cope with CRLFs. -- 2.34.1 From 005365fd1f87a03b4e4de8485be0ed6af9127acc Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Wed, 20 Nov 2024 22:33:36 +0000 Subject: Add pg_control flag to prevent recovery without backup_label. Harden recovery by adding a flag to pg_control to indicate that backup_label is required. This prevents the user from deleting backup_label resulting in an inconsistent recovery. Another advantage is that the copy of pg_control used by pg_basebackup is guaranteed not to be torn. This functionality is limited to pg_basebackup (or any software comfortable with modifying pg_control). Control and catalog version bumps are required. --- doc/src/sgml/func.sgml | 5 +++++ src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++ src/backend/access/transam/xlogrecovery.c | 10 +++++++++- src/backend/backup/basebackup.c | 15 ++++++--------- src/backend/utils/misc/pg_controldata.c | 7 +++++-- src/bin/pg_controldata/pg_controldata.c | 2 ++ src/bin/pg_resetwal/pg_resetwal.c | 1 + src/bin/pg_rewind/pg_rewind.c | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_control.h | 4 ++++ src/include/catalog/pg_proc.dat | 6 +++--- src/test/recovery/t/002_archiving.pl | 20 ++++++++++++++++++++ 12 files changed, 80 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 72f223a0414..fa37a0469ae 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27918,6 +27918,11 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} <entry><type>boolean</type></entry> </row> + <row> + <entry><structfield>backup_label_required</structfield></entry> + <entry><type>boolean</type></entry> + </row> + </tbody> </tgroup> </table> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6f58412bcab..2a2c80a6f74 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9413,6 +9413,29 @@ do_pg_abort_backup(int code, Datum arg) } } +/* + * Create a consistent copy of control data to be used for backup and update it + * to require a backup label for recovery. Also recalculate the CRC. + */ +void +backup_control_file(uint8_t *controlFile) +{ + ControlFileData *controlData = ((ControlFileData *)controlFile); + + memset(controlFile + sizeof(ControlFileData), 0, + PG_CONTROL_FILE_SIZE - sizeof(ControlFileData)); + + LWLockAcquire(ControlFileLock, LW_SHARED); + memcpy(controlFile, ControlFile, sizeof(ControlFileData)); + LWLockRelease(ControlFileLock); + + controlData->backupLabelRequired = true; + + INIT_CRC32C(controlData->crc); + COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc)); + FIN_CRC32C(controlData->crc); +} + /* * Register a handler that will warn about unterminated backups at end of * session, unless this has already been done. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 05c738d6614..f641cb1a8ad 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -703,7 +703,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, } else { - /* No backup_label file has been found if we are here. */ + /* + * No backup_label file has been found if we are here. Error if the + * control file requires backup_label. + */ + if (ControlFile->backupLabelRequired) + ereport(FATAL, + (errmsg("could not find backup_label required for recovery"), + errhint("backup_label must be present for recovery to succeed"))); /* * If tablespace_map file is present without backup_label file, there @@ -976,6 +983,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; + ControlFile->backupLabelRequired = false; if (backupFromStandby) { diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index e2ed9081d1c..b673bc10bc3 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -23,6 +23,7 @@ #include "backup/basebackup_incremental.h" #include "backup/basebackup_sink.h" #include "backup/basebackup_target.h" +#include "catalog/pg_control.h" #include "catalog/pg_tablespace_d.h" #include "commands/defrem.h" #include "common/compression.h" @@ -326,9 +327,9 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, if (ti->path == NULL) { - struct stat statbuf; bool sendtblspclinks = true; char *backup_label; + uint8_t controlFile[PG_CONTROL_FILE_SIZE]; bbsink_begin_archive(sink, "base.tar"); @@ -351,14 +352,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, sendtblspclinks, &manifest, InvalidOid, ib); /* ... and pg_control after everything else. */ - if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not stat file \"%s\": %m", - XLOG_CONTROL_FILE))); - sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, - false, InvalidOid, InvalidOid, - InvalidRelFileNumber, 0, &manifest, 0, NULL, 0); + backup_control_file(controlFile); + sendFileWithContent(sink, XLOG_CONTROL_FILE, + (char *)controlFile, PG_CONTROL_FILE_SIZE, + &manifest); } else { diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 98c932dc7bd..9eaf3f8b9f6 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) Datum pg_control_recovery(PG_FUNCTION_ARGS) { - Datum values[5]; - bool nulls[5]; + Datum values[6]; + bool nulls[6]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS) values[4] = BoolGetDatum(ControlFile->backupEndRequired); nulls[4] = false; + values[5] = BoolGetDatum(ControlFile->backupLabelRequired); + nulls[5] = false; + htup = heap_form_tuple(tupdesc, values, nulls); PG_RETURN_DATUM(HeapTupleGetDatum(htup)); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 93a05d80ca7..8cc29904c6b 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -283,6 +283,8 @@ main(int argc, char *argv[]) LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); printf(_("End-of-backup record required: %s\n"), ControlFile->backupEndRequired ? _("yes") : _("no")); + printf(_("Backup label required: %s\n"), + ControlFile->backupLabelRequired ? _("yes") : _("no")); printf(_("wal_level setting: %s\n"), wal_level_str(ControlFile->wal_level)); printf(_("wal_log_hints setting: %s\n"), diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index e9dcb5a6d89..7056752cff0 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -875,6 +875,7 @@ RewriteControlFile(void) ControlFile.backupStartPoint = 0; ControlFile.backupEndPoint = 0; ControlFile.backupEndRequired = false; + ControlFile.backupLabelRequired = false; /* * Force the defaults for max_* settings. The values don't really matter diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index c4fe4e37040..b9faa51661e 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -725,6 +725,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source, ControlFile_new.minRecoveryPoint = endrec; ControlFile_new.minRecoveryPointTLI = endtli; ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; + ControlFile_new.backupLabelRequired = true; if (!dry_run) update_controlfile(datadir_target, &ControlFile_new, do_sync); } diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 34ad46c067b..9d5d8ed43c2 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -295,6 +295,7 @@ extern void do_pg_backup_start(const char *backupidstr, bool fast, StringInfo tblspcmapfile); extern void do_pg_backup_stop(BackupState *state, bool waitforarchive); extern void do_pg_abort_backup(int code, Datum arg); +extern void backup_control_file(uint8_t *controlFile); extern void register_persistent_abort_backup_handler(void); extern SessionBackupState get_backup_status(void); diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index e80ff8e4140..b471a9b02e7 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -164,12 +164,16 @@ typedef struct ControlFileData * If backupEndRequired is true, we know for sure that we're restoring * from a backup, and must see a backup-end record before we can safely * start up. + * + * If backupLabelRequired is true, then a backup_label file must be + * present in order for recovery to succeed. */ XLogRecPtr minRecoveryPoint; TimeLineID minRecoveryPointTLI; XLogRecPtr backupStartPoint; XLogRecPtr backupEndPoint; bool backupEndRequired; + bool backupLabelRequired; /* * Parameter settings that determine if the WAL can be used for archival diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cbbe8acd382..8d9d584735e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12139,9 +12139,9 @@ { oid => '3443', descr => 'pg_controldata recovery state information as a function', proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record', - proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}', - proargmodes => '{o,o,o,o,o}', - proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', + proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}', + proargmodes => '{o,o,o,o,o,o}', + proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}', prosrc => 'pg_control_recovery' }, { oid => '3444', diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl index bc447330e15..462f1dcf0d6 100644 --- a/src/test/recovery/t/002_archiving.pl +++ b/src/test/recovery/t/002_archiving.pl @@ -41,6 +41,26 @@ $node_standby->append_conf( archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file' recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file' )); + +# Rename backup_label to verify that recovery will not start without it +rename("$data_dir/backup_label", "$data_dir/backup_label.tmp") + or BAIL_OUT "could not move $data_dir/backup_label"; + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_standby->data_dir, '-l', + $node_standby->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +# Restore backup_label so recovery proceeds normally +rename("$data_dir/backup_label.tmp", "$data_dir/backup_label") + or BAIL_OUT "could not move $data_dir/backup_label"; + $node_standby->start; # Create some content on primary -- 2.34.1 Attachments: [text/plain] pgcontrol-flag-v4-02-sql.patch (11.4K, 2-pgcontrol-flag-v4-02-sql.patch) download | inline diff: From 659a1d9b6b1528e139741dc69146848ca15a07b9 Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Wed, 20 Nov 2024 22:33:36 +0000 Subject: Return pg_control from pg_backup_stop(). Harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup software copying pg_control from PGDATA, it stores an updated version that is returned from pg_backup_stop(). This is better for the following reasons: * The user can no longer remove backup_label and get what looks like a successful recovery (while almost certainly causing corruption). If backup_label is removed the cluster will not start. The user may try pg_resetwal, but that tool makes it pretty clear that corruption will result from its use. * We don't need to worry about backup software seeing a torn copy of pg_control, since Postgres can safely read it out of memory and provide a valid copy via pg_backup_stop(). This solves torn reads without needing to write pg_control via a temp file, which may affect performance on a standby. * For backup from standby, we no longer need to instruct the backup software to copy pg_control last. In fact the backup software should not copy pg_control from PGDATA at all. These changes have no impact on current backup software and they are free to use the pg_control available from pg_stop_backup() or continue to use pg_control from PGDATA. Of course they will miss the benefits of getting a consistent copy of pg_control and the backup_label checking, but will be no worse off than before. Catalog version bump is required. --- doc/src/sgml/backup.sgml | 18 +++++- doc/src/sgml/func.sgml | 5 +- src/backend/access/transam/xlogfuncs.c | 20 ++++-- src/backend/catalog/system_functions.sql | 2 +- src/include/catalog/pg_proc.dat | 4 +- src/test/recovery/t/042_low_level_backup.pl | 67 ++++++++++++++++++++- 6 files changed, 101 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml index e4e4c56cf14..2fcf1811213 100644 --- a/doc/src/sgml/backup.sgml +++ b/doc/src/sgml/backup.sgml @@ -1021,9 +1021,12 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); values. The second of these fields should be written to a file named <filename>backup_label</filename> in the root directory of the backup. The third field should be written to a file named - <filename>tablespace_map</filename> unless the field is empty. These files are + <filename>tablespace_map</filename> unless the field is empty. The fourth + field should be written into a file named + <filename>global/pg_control</filename> (overwriting the existing file when + present). These files are vital to the backup working and must be written byte for byte without - modification, which may require opening the file in binary mode. + modification, which will require opening the file in binary mode. </para> </listitem> <listitem> @@ -1095,7 +1098,16 @@ SELECT * FROM pg_backup_stop(wait_for_archive => true); </para> <para> - You should, however, omit from the backup the files within the + You should exclude <filename>global/pg_control</filename> from your backup + and put the contents of the <parameter>controlfile</parameter> column + returned from <function>pg_backup_stop</function> in your backup at + <filename>global/pg_control</filename>. This version of pg_control contains + safeguards against recovery without backup_label present and is guaranteed + not to be torn. + </para> + + <para> + You should also omit from the backup the files within the cluster's <filename>pg_wal/</filename> subdirectory. This slight adjustment is worthwhile because it reduces the risk of mistakes when restoring. This is easy to arrange if diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fa37a0469ae..b3d8c73295b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -28600,8 +28600,9 @@ LOG: Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560 The result of the function is a single record. The <parameter>lsn</parameter> column holds the backup's ending write-ahead log location (which again can be ignored). The second - column returns the contents of the backup label file, and the third - column returns the contents of the tablespace map file. These must be + column returns the contents of the backup label file, the third column + returns the contents of the tablespace map file, and the fourth column + returns the contents of pg_control. These must be stored as part of the backup and are required as part of the restore process. </para> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index b0c6d7c6875..7547bb6e28e 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -113,9 +113,11 @@ pg_backup_start(PG_FUNCTION_ARGS) * * The backup_label contains the user-supplied label string (typically this * would be used to tell where the backup dump will be stored), the starting - * time, starting WAL location for the dump and so on. It is the caller's - * responsibility to write the backup_label and tablespace_map files in the - * data folder that will be restored from this backup. + * time, starting WAL location for the dump and so on. The pg_control file + * contains a consistent copy of pg_control that also has a safeguard against + * being used without backup_label. It is the caller's responsibility to write + * the backup_label, pg_control, and tablespace_map files in the data folder + * that will be restored from this backup. * * Permission checking for this function is managed through the normal * GRANT system. @@ -123,12 +125,14 @@ pg_backup_start(PG_FUNCTION_ARGS) Datum pg_backup_stop(PG_FUNCTION_ARGS) { -#define PG_BACKUP_STOP_V2_COLS 3 +#define PG_BACKUP_STOP_V2_COLS 4 TupleDesc tupdesc; Datum values[PG_BACKUP_STOP_V2_COLS] = {0}; bool nulls[PG_BACKUP_STOP_V2_COLS] = {0}; bool waitforarchive = PG_GETARG_BOOL(0); char *backup_label; + bytea *pg_control_bytea; + uint8_t pg_control[PG_CONTROL_FILE_SIZE]; SessionBackupState status = get_backup_status(); /* Initialize attributes information in the tuple descriptor */ @@ -150,9 +154,17 @@ pg_backup_stop(PG_FUNCTION_ARGS) /* Build the contents of backup_label */ backup_label = build_backup_content(backup_state, false); + /* Build the contents of pg_control */ + backup_control_file(pg_control); + + pg_control_bytea = (bytea *) palloc(PG_CONTROL_FILE_SIZE + VARHDRSZ); + SET_VARSIZE(pg_control_bytea, PG_CONTROL_FILE_SIZE + VARHDRSZ); + memcpy(VARDATA(pg_control_bytea), pg_control, PG_CONTROL_FILE_SIZE); + values[0] = LSNGetDatum(backup_state->stoppoint); values[1] = CStringGetTextDatum(backup_label); values[2] = CStringGetTextDatum(tablespace_map->data); + values[3] = PointerGetDatum(pg_control_bytea); /* Deallocate backup-related variables */ pfree(backup_label); diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql index c51dfca802f..6a29b9b3cf3 100644 --- a/src/backend/catalog/system_functions.sql +++ b/src/backend/catalog/system_functions.sql @@ -390,7 +390,7 @@ CREATE OR REPLACE FUNCTION CREATE OR REPLACE FUNCTION pg_backup_stop ( wait_for_archive boolean DEFAULT true, OUT lsn pg_lsn, - OUT labelfile text, OUT spcmapfile text) + OUT labelfile text, OUT spcmapfile text, OUT controlfile bytea) RETURNS record STRICT VOLATILE LANGUAGE internal as 'pg_backup_stop' PARALLEL RESTRICTED; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 8d9d584735e..6f65cc7c777 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -6589,8 +6589,8 @@ { oid => '2739', descr => 'finish taking an online backup', proname => 'pg_backup_stop', provolatile => 'v', proparallel => 'r', prorettype => 'record', proargtypes => 'bool', - proallargtypes => '{bool,pg_lsn,text,text}', proargmodes => '{i,o,o,o}', - proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile}', + proallargtypes => '{bool,pg_lsn,text,text,bytea}', proargmodes => '{i,o,o,o,o}', + proargnames => '{wait_for_archive,lsn,labelfile,spcmapfile,controlfile}', prosrc => 'pg_backup_stop' }, { oid => '3436', descr => 'promote standby server', proname => 'pg_promote', provolatile => 'v', prorettype => 'bool', diff --git a/src/test/recovery/t/042_low_level_backup.pl b/src/test/recovery/t/042_low_level_backup.pl index 61d23187e0f..bd3a99960f5 100644 --- a/src/test/recovery/t/042_low_level_backup.pl +++ b/src/test/recovery/t/042_low_level_backup.pl @@ -13,6 +13,42 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +# Decode hex to binary +sub decode_hex +{ + my ($encoded) = @_; + my $decoded; + + $encoded =~ s/^\s+|\s+$//g; + + for (my $idx = 0; $idx < length($encoded); $idx += 2) + { + $decoded .= pack('C', hex(substr($encoded, $idx, 2))); + } + + return $decoded; +} + +# Get backup_label/pg_control from pg_stop_backup() +sub stop_backup_result +{ + my ($psql) = @_; + + my $encoded = $psql->query_safe( + "select encode(labelfile::bytea, 'hex') || ',' || " . + " encode(controlfile, 'hex')" . + " from pg_backup_stop()"); + + my @result; + + foreach my $column (split(',', $encoded)) + { + push(@result, decode_hex($column)); + } + + return @result; +} + # Start primary node with archiving. my $node_primary = PostgreSQL::Test::Cluster->new('primary'); $node_primary->init(has_archiving => 1, allows_streaming => 1); @@ -80,8 +116,7 @@ my $stop_segment_name = $node_primary->safe_psql('postgres', 'SELECT pg_walfile_name(pg_current_wal_lsn())'); # Stop backup and get backup_label, the last segment is archived. -my $backup_label = - $psql->query_safe("select labelfile from pg_backup_stop()"); +(my $backup_label, my $pg_control) = stop_backup_result($psql); $psql->quit; @@ -118,10 +153,36 @@ ok( $node_replica->log_contains( $node_replica->teardown_node; $node_replica->clean_node; +# Save only pg_control into the backup to demonstrate that pg_control returned +# from pg_stop_backup() will only perform recovery when backup_label is present. +open(my $fh, ">", "$backup_dir/global/pg_control") + or die "could not open pg_control"; +binmode($fh); +syswrite($fh, $pg_control); +close($fh); + +$node_replica = PostgreSQL::Test::Cluster->new('replica_fail2'); +$node_replica->init_from_backup($node_primary, $backup_name, + has_restoring => 1); + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_replica->data_dir, '-l', + $node_replica->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_replica->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +$node_replica->teardown_node; +$node_replica->clean_node; + # Save backup_label into the backup directory and recover using the primary's # archive. This time recovery will succeed and the canary table will be # present. -open my $fh, ">>", "$backup_dir/backup_label" +open $fh, ">>", "$backup_dir/backup_label" or die "could not open backup_label"; # Binary mode is required for Windows, as the backup_label parsing is not # able to cope with CRLFs. -- 2.34.1 [text/plain] pgcontrol-flag-v4-01-basebackup.patch (11.4K, 3-pgcontrol-flag-v4-01-basebackup.patch) download | inline diff: From 005365fd1f87a03b4e4de8485be0ed6af9127acc Mon Sep 17 00:00:00 2001 From: David Steele <[email protected]> Date: Wed, 20 Nov 2024 22:33:36 +0000 Subject: Add pg_control flag to prevent recovery without backup_label. Harden recovery by adding a flag to pg_control to indicate that backup_label is required. This prevents the user from deleting backup_label resulting in an inconsistent recovery. Another advantage is that the copy of pg_control used by pg_basebackup is guaranteed not to be torn. This functionality is limited to pg_basebackup (or any software comfortable with modifying pg_control). Control and catalog version bumps are required. --- doc/src/sgml/func.sgml | 5 +++++ src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++ src/backend/access/transam/xlogrecovery.c | 10 +++++++++- src/backend/backup/basebackup.c | 15 ++++++--------- src/backend/utils/misc/pg_controldata.c | 7 +++++-- src/bin/pg_controldata/pg_controldata.c | 2 ++ src/bin/pg_resetwal/pg_resetwal.c | 1 + src/bin/pg_rewind/pg_rewind.c | 1 + src/include/access/xlog.h | 1 + src/include/catalog/pg_control.h | 4 ++++ src/include/catalog/pg_proc.dat | 6 +++--- src/test/recovery/t/002_archiving.pl | 20 ++++++++++++++++++++ 12 files changed, 80 insertions(+), 15 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 72f223a0414..fa37a0469ae 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27918,6 +27918,11 @@ acl | {postgres=arwdDxtm/postgres,foo=r/postgres} <entry><type>boolean</type></entry> </row> + <row> + <entry><structfield>backup_label_required</structfield></entry> + <entry><type>boolean</type></entry> + </row> + </tbody> </tgroup> </table> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 6f58412bcab..2a2c80a6f74 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9413,6 +9413,29 @@ do_pg_abort_backup(int code, Datum arg) } } +/* + * Create a consistent copy of control data to be used for backup and update it + * to require a backup label for recovery. Also recalculate the CRC. + */ +void +backup_control_file(uint8_t *controlFile) +{ + ControlFileData *controlData = ((ControlFileData *)controlFile); + + memset(controlFile + sizeof(ControlFileData), 0, + PG_CONTROL_FILE_SIZE - sizeof(ControlFileData)); + + LWLockAcquire(ControlFileLock, LW_SHARED); + memcpy(controlFile, ControlFile, sizeof(ControlFileData)); + LWLockRelease(ControlFileLock); + + controlData->backupLabelRequired = true; + + INIT_CRC32C(controlData->crc); + COMP_CRC32C(controlData->crc, controlFile, offsetof(ControlFileData, crc)); + FIN_CRC32C(controlData->crc); +} + /* * Register a handler that will warn about unterminated backups at end of * session, unless this has already been done. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 05c738d6614..f641cb1a8ad 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -703,7 +703,14 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, } else { - /* No backup_label file has been found if we are here. */ + /* + * No backup_label file has been found if we are here. Error if the + * control file requires backup_label. + */ + if (ControlFile->backupLabelRequired) + ereport(FATAL, + (errmsg("could not find backup_label required for recovery"), + errhint("backup_label must be present for recovery to succeed"))); /* * If tablespace_map file is present without backup_label file, there @@ -976,6 +983,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, { ControlFile->backupStartPoint = checkPoint.redo; ControlFile->backupEndRequired = backupEndRequired; + ControlFile->backupLabelRequired = false; if (backupFromStandby) { diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c index e2ed9081d1c..b673bc10bc3 100644 --- a/src/backend/backup/basebackup.c +++ b/src/backend/backup/basebackup.c @@ -23,6 +23,7 @@ #include "backup/basebackup_incremental.h" #include "backup/basebackup_sink.h" #include "backup/basebackup_target.h" +#include "catalog/pg_control.h" #include "catalog/pg_tablespace_d.h" #include "commands/defrem.h" #include "common/compression.h" @@ -326,9 +327,9 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, if (ti->path == NULL) { - struct stat statbuf; bool sendtblspclinks = true; char *backup_label; + uint8_t controlFile[PG_CONTROL_FILE_SIZE]; bbsink_begin_archive(sink, "base.tar"); @@ -351,14 +352,10 @@ perform_base_backup(basebackup_options *opt, bbsink *sink, sendtblspclinks, &manifest, InvalidOid, ib); /* ... and pg_control after everything else. */ - if (lstat(XLOG_CONTROL_FILE, &statbuf) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not stat file \"%s\": %m", - XLOG_CONTROL_FILE))); - sendFile(sink, XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, - false, InvalidOid, InvalidOid, - InvalidRelFileNumber, 0, &manifest, 0, NULL, 0); + backup_control_file(controlFile); + sendFileWithContent(sink, XLOG_CONTROL_FILE, + (char *)controlFile, PG_CONTROL_FILE_SIZE, + &manifest); } else { diff --git a/src/backend/utils/misc/pg_controldata.c b/src/backend/utils/misc/pg_controldata.c index 98c932dc7bd..9eaf3f8b9f6 100644 --- a/src/backend/utils/misc/pg_controldata.c +++ b/src/backend/utils/misc/pg_controldata.c @@ -162,8 +162,8 @@ pg_control_checkpoint(PG_FUNCTION_ARGS) Datum pg_control_recovery(PG_FUNCTION_ARGS) { - Datum values[5]; - bool nulls[5]; + Datum values[6]; + bool nulls[6]; TupleDesc tupdesc; HeapTuple htup; ControlFileData *ControlFile; @@ -195,6 +195,9 @@ pg_control_recovery(PG_FUNCTION_ARGS) values[4] = BoolGetDatum(ControlFile->backupEndRequired); nulls[4] = false; + values[5] = BoolGetDatum(ControlFile->backupLabelRequired); + nulls[5] = false; + htup = heap_form_tuple(tupdesc, values, nulls); PG_RETURN_DATUM(HeapTupleGetDatum(htup)); diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 93a05d80ca7..8cc29904c6b 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -283,6 +283,8 @@ main(int argc, char *argv[]) LSN_FORMAT_ARGS(ControlFile->backupEndPoint)); printf(_("End-of-backup record required: %s\n"), ControlFile->backupEndRequired ? _("yes") : _("no")); + printf(_("Backup label required: %s\n"), + ControlFile->backupLabelRequired ? _("yes") : _("no")); printf(_("wal_level setting: %s\n"), wal_level_str(ControlFile->wal_level)); printf(_("wal_log_hints setting: %s\n"), diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c index e9dcb5a6d89..7056752cff0 100644 --- a/src/bin/pg_resetwal/pg_resetwal.c +++ b/src/bin/pg_resetwal/pg_resetwal.c @@ -875,6 +875,7 @@ RewriteControlFile(void) ControlFile.backupStartPoint = 0; ControlFile.backupEndPoint = 0; ControlFile.backupEndRequired = false; + ControlFile.backupLabelRequired = false; /* * Force the defaults for max_* settings. The values don't really matter diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c index c4fe4e37040..b9faa51661e 100644 --- a/src/bin/pg_rewind/pg_rewind.c +++ b/src/bin/pg_rewind/pg_rewind.c @@ -725,6 +725,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source, ControlFile_new.minRecoveryPoint = endrec; ControlFile_new.minRecoveryPointTLI = endtli; ControlFile_new.state = DB_IN_ARCHIVE_RECOVERY; + ControlFile_new.backupLabelRequired = true; if (!dry_run) update_controlfile(datadir_target, &ControlFile_new, do_sync); } diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 34ad46c067b..9d5d8ed43c2 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -295,6 +295,7 @@ extern void do_pg_backup_start(const char *backupidstr, bool fast, StringInfo tblspcmapfile); extern void do_pg_backup_stop(BackupState *state, bool waitforarchive); extern void do_pg_abort_backup(int code, Datum arg); +extern void backup_control_file(uint8_t *controlFile); extern void register_persistent_abort_backup_handler(void); extern SessionBackupState get_backup_status(void); diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index e80ff8e4140..b471a9b02e7 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -164,12 +164,16 @@ typedef struct ControlFileData * If backupEndRequired is true, we know for sure that we're restoring * from a backup, and must see a backup-end record before we can safely * start up. + * + * If backupLabelRequired is true, then a backup_label file must be + * present in order for recovery to succeed. */ XLogRecPtr minRecoveryPoint; TimeLineID minRecoveryPointTLI; XLogRecPtr backupStartPoint; XLogRecPtr backupEndPoint; bool backupEndRequired; + bool backupLabelRequired; /* * Parameter settings that determine if the WAL can be used for archival diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index cbbe8acd382..8d9d584735e 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -12139,9 +12139,9 @@ { oid => '3443', descr => 'pg_controldata recovery state information as a function', proname => 'pg_control_recovery', provolatile => 'v', prorettype => 'record', - proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool}', - proargmodes => '{o,o,o,o,o}', - proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required}', + proargtypes => '', proallargtypes => '{pg_lsn,int4,pg_lsn,pg_lsn,bool,bool}', + proargmodes => '{o,o,o,o,o,o}', + proargnames => '{min_recovery_end_lsn,min_recovery_end_timeline,backup_start_lsn,backup_end_lsn,end_of_backup_record_required,backup_label_required}', prosrc => 'pg_control_recovery' }, { oid => '3444', diff --git a/src/test/recovery/t/002_archiving.pl b/src/test/recovery/t/002_archiving.pl index bc447330e15..462f1dcf0d6 100644 --- a/src/test/recovery/t/002_archiving.pl +++ b/src/test/recovery/t/002_archiving.pl @@ -41,6 +41,26 @@ $node_standby->append_conf( archive_cleanup_command = 'echo archive_cleanup_done > $archive_cleanup_command_file' recovery_end_command = 'echo recovery_ended_done > $recovery_end_command_file' )); + +# Rename backup_label to verify that recovery will not start without it +rename("$data_dir/backup_label", "$data_dir/backup_label.tmp") + or BAIL_OUT "could not move $data_dir/backup_label"; + +my $res = run_log( + [ + 'pg_ctl', '-D', $node_standby->data_dir, '-l', + $node_standby->logfile, 'start' + ]); +ok(!$res, 'invalid recovery startup fails'); + +my $logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/could not find backup_label required for recovery/, + 'could not find backup_label required for recovery'); + +# Restore backup_label so recovery proceeds normally +rename("$data_dir/backup_label.tmp", "$data_dir/backup_label") + or BAIL_OUT "could not move $data_dir/backup_label"; + $node_standby->start; # Create some content on primary -- 2.34.1 ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2024-11-20 22:44 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-10-03 04:45 Re: Return pg_control from pg_backup_stop(). Michael Paquier <[email protected]> 2024-10-03 09:11 ` David Steele <[email protected]> 2024-11-20 22:44 ` David Steele <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox