public inbox for [email protected]help / color / mirror / Atom feed
Re: pg_waldump: support decoding of WAL inside tarfile 8+ messages / 3 participants [nested] [flat]
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-01-26 17:22 Robert Haas <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: Robert Haas @ 2026-01-26 17:22 UTC (permalink / raw) To: Amul Sul <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Fri, Jan 23, 2026 at 7:27 AM Amul Sul <[email protected]> wrote: > Another option I previously considered was adding the filtration logic > inside the archive streamer itself. However, since the very first read > is required to calculate the WAL segment size, the filter check cannot > be performed immediately. However, we could send a signal to the > archive streamer via privateInfo (e.g., a read_any_wal or > skip_wal_check boolean flag) to disable the filtration check until the > size is calculated. But that approach isn't very elegant; if the first > WAL page we read belongs to a segment we actually want to skip, we > would still have to run the filter check and handle the skip/removal > logic outside of the streamer (i.e., inside init_archive_reader()). > This would result in performing the same filtration check in two > different places. I mean, I don't really buy this logic. If the information added to privateInfo is "here's the LSN before which you can remove stuff," and it starts out initialized to 0/0, then the read of the first WAL page causes no problem at all, because nothing is before 0/0. After it gets updated to some non-zero value, the next call to astreamer_waldump_content() can handle discarding any data we don't need. IMHO, the best argument for keeping things are is that in some cases, that decision might result in a bit of delay in discarding old data, but I don't think that really matters. I think all that we care about is the peak memory utilization of an operation, and I don't think that an explicit signaling system should increase that at all. That said, I'm certainly willing to consider other ideas about how this can work. However, I feel strongly that the logic needs to be not only correct, but clear and well-explained. Setting cur_wal to NULL to make the astreamer skip without adequate comments doesn't meet that standard. Maybe with some better comments it's all right, but frankly I'm a bit skeptical. Right now, you're using whether or not cur_wal is NULL as a signal to skip data or not skip data. How is that better than passing down the LSN and TLI that you want to read next and letting the astreamer figure out what to do itself? It's a signaling mechanism either way, but it seems a lot easier to figure out whether we always keep the LSN and TLI updated properly than to figure out whether cur_wal is always NULL at exactly the right times. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-01-27 12:06 Amul Sul <[email protected]> parent: Robert Haas <[email protected]> 0 siblings, 2 replies; 8+ messages in thread From: Amul Sul @ 2026-01-27 12:06 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Mon, Jan 26, 2026 at 10:52 PM Robert Haas <[email protected]> wrote: > > On Fri, Jan 23, 2026 at 7:27 AM Amul Sul <[email protected]> wrote: > > Another option I previously considered was adding the filtration logic > > inside the archive streamer itself. However, since the very first read > > is required to calculate the WAL segment size, the filter check cannot > > be performed immediately. However, we could send a signal to the > > archive streamer via privateInfo (e.g., a read_any_wal or > > skip_wal_check boolean flag) to disable the filtration check until the > > size is calculated. But that approach isn't very elegant; if the first > > WAL page we read belongs to a segment we actually want to skip, we > > would still have to run the filter check and handle the skip/removal > > logic outside of the streamer (i.e., inside init_archive_reader()). > > This would result in performing the same filtration check in two > > different places. > > I mean, I don't really buy this logic. If the information added to > privateInfo is "here's the LSN before which you can remove stuff," and > it starts out initialized to 0/0, then the read of the first WAL page > causes no problem at all, because nothing is before 0/0. After it gets > updated to some non-zero value, the next call to > astreamer_waldump_content() can handle discarding any data we don't > need. > > IMHO, the best argument for keeping things are is that in some cases, > that decision might result in a bit of delay in discarding old data, > but I don't think that really matters. I think all that we care about > is the peak memory utilization of an operation, and I don't think that > an explicit signaling system should increase that at all. > > That said, I'm certainly willing to consider other ideas about how > this can work. However, I feel strongly that the logic needs to be not > only correct, but clear and well-explained. Setting cur_wal to NULL to > make the astreamer skip without adequate comments doesn't meet that > standard. Maybe with some better comments it's all right, but frankly > I'm a bit skeptical. Right now, you're using whether or not cur_wal is > NULL as a signal to skip data or not skip data. How is that better > than passing down the LSN and TLI that you want to read next and > letting the astreamer figure out what to do itself? It's a signaling > mechanism either way, but it seems a lot easier to figure out whether > we always keep the LSN and TLI updated properly than to figure out > whether cur_wal is always NULL at exactly the right times. > I am not sure how to calculate the LSN of the very first page it reads at this point, since we don't have the segment size yet. Furthermore, the previous implementation using the segment number as the hash key was incorrect, as extracting the segment number and TLI is impossible without knowing the segment size, IIUC. In the attached version, I am using the WAL segment name as the hash key, which is much more straightforward. I have rewritten read_archive_wal_page(), and it looks much cleaner than before. The logic to discard irrelevant WAL files is still within get_archive_wal_entry. I added an explanation for setting cur_wal to NULL, which is now handled in the separate function I mentioned previously. Kindly have a look at the attached version; let me know if you are still not happy with the current approach for filtering/discarding irrelevant WAL segments. It isn't much different from the previous version, but I have tried to keep it in a separate routine for better code readability, with comments to make it easier to understand. I also added a comment for ArchivedWALFile. I have included your patch (removing WalSegSz) to the attached patch set as 0001; all my subsequent patches are now bumped by one. Regards, Amul Attachments: [application/octet-stream] v11-0001-Remove-file-level-global-WalSegSz.patch (4.6K, 2-v11-0001-Remove-file-level-global-WalSegSz.patch) download | inline diff: From 7a06dc03d198fd6776d5afba44d1ea16ee715699 Mon Sep 17 00:00:00 2001 From: Robert Haas <[email protected]> Date: Fri, 16 Jan 2026 08:35:14 -0500 Subject: [PATCH v11 1/9] Remove file-level global WalSegSz. It's better style to pass the value around to just the places that need it. This makes it easier to determine whether the value is always properly initialized before use. --- src/bin/pg_waldump/pg_waldump.c | 37 +++++++++++++++++---------------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index aae655966ef..f3446385d6a 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -39,7 +39,6 @@ static const char *progname; -static int WalSegSz; static volatile sig_atomic_t time_to_stop = false; static const RelFileLocator emptyRelFileLocator = {0, 0, 0}; @@ -203,11 +202,11 @@ open_file_in_directory(const char *directory, const char *fname) /* * Try to find fname in the given directory. Returns true if it is found, * false otherwise. If fname is NULL, search the complete directory for any - * file with a valid WAL file name. If file is successfully opened, set the - * wal segment size. + * file with a valid WAL file name. If file is successfully opened, set + * *WaSegSz to the WAL segment size. */ static bool -search_directory(const char *directory, const char *fname) +search_directory(const char *directory, const char *fname, int *WalSegSz) { int fd = -1; DIR *xldir; @@ -249,17 +248,17 @@ search_directory(const char *directory, const char *fname) { XLogLongPageHeader longhdr = (XLogLongPageHeader) buf.data; - WalSegSz = longhdr->xlp_seg_size; - - if (!IsValidWalSegSize(WalSegSz)) + if (!IsValidWalSegSize(longhdr->xlp_seg_size)) { pg_log_error(ngettext("invalid WAL segment size in WAL file \"%s\" (%d byte)", "invalid WAL segment size in WAL file \"%s\" (%d bytes)", - WalSegSz), - fname, WalSegSz); + longhdr->xlp_seg_size), + fname, longhdr->xlp_seg_size); pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB."); exit(1); } + + *WalSegSz = longhdr->xlp_seg_size; } else if (r < 0) pg_fatal("could not read file \"%s\": %m", @@ -286,21 +285,22 @@ search_directory(const char *directory, const char *fname) * XLOGDIR / * $PGDATA / XLOGDIR / * - * The valid target directory is returned. + * The valid target directory is returned, and *WalSegSz is set to the + * size of the WAL segment found in that directory. */ static char * -identify_target_directory(char *directory, char *fname) +identify_target_directory(char *directory, char *fname, int *WalSegSz) { char fpath[MAXPGPATH]; if (directory != NULL) { - if (search_directory(directory, fname)) + if (search_directory(directory, fname, WalSegSz)) return pg_strdup(directory); /* directory / XLOGDIR */ snprintf(fpath, MAXPGPATH, "%s/%s", directory, XLOGDIR); - if (search_directory(fpath, fname)) + if (search_directory(fpath, fname, WalSegSz)) return pg_strdup(fpath); } else @@ -308,10 +308,10 @@ identify_target_directory(char *directory, char *fname) const char *datadir; /* current directory */ - if (search_directory(".", fname)) + if (search_directory(".", fname, WalSegSz)) return pg_strdup("."); /* XLOGDIR */ - if (search_directory(XLOGDIR, fname)) + if (search_directory(XLOGDIR, fname, WalSegSz)) return pg_strdup(XLOGDIR); datadir = getenv("PGDATA"); @@ -319,7 +319,7 @@ identify_target_directory(char *directory, char *fname) if (datadir != NULL) { snprintf(fpath, MAXPGPATH, "%s/%s", datadir, XLOGDIR); - if (search_directory(fpath, fname)) + if (search_directory(fpath, fname, WalSegSz)) return pg_strdup(fpath); } } @@ -801,6 +801,7 @@ main(int argc, char **argv) XLogRecPtr first_record; char *waldir = NULL; char *errormsg; + int WalSegSz; static struct option long_options[] = { {"bkp-details", no_argument, NULL, 'b'}, @@ -1127,7 +1128,7 @@ main(int argc, char **argv) pg_fatal("could not open directory \"%s\": %m", waldir); } - waldir = identify_target_directory(waldir, fname); + waldir = identify_target_directory(waldir, fname, &WalSegSz); fd = open_file_in_directory(waldir, fname); if (fd < 0) pg_fatal("could not open file \"%s\"", fname); @@ -1189,7 +1190,7 @@ main(int argc, char **argv) } } else - waldir = identify_target_directory(waldir, NULL); + waldir = identify_target_directory(waldir, NULL, &WalSegSz); /* we don't know what to print */ if (!XLogRecPtrIsValid(private.startptr)) -- 2.47.1 [application/octet-stream] v11-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 3-v11-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch) download | inline diff: From 1e6d7f2ba4550eaedd6eb0fdcf309db2637a6a38 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 22 Jan 2026 10:28:32 +0530 Subject: [PATCH v11 2/9] Refactor: pg_waldump: Move some declarations to new pg_waldump.h This change prepares for a second source file in this directory to support reading WAL from tar files. Common structures, declarations, and functions are being exported through this include file so they can be used in both files. --- src/bin/pg_waldump/pg_waldump.c | 9 +-------- src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 8 deletions(-) create mode 100644 src/bin/pg_waldump/pg_waldump.h diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index f3446385d6a..4b7411a6498 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -29,6 +29,7 @@ #include "common/logging.h" #include "common/relpath.h" #include "getopt_long.h" +#include "pg_waldump.h" #include "rmgrdesc.h" #include "storage/bufpage.h" @@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false; static const RelFileLocator emptyRelFileLocator = {0, 0, 0}; -typedef struct XLogDumpPrivate -{ - TimeLineID timeline; - XLogRecPtr startptr; - XLogRecPtr endptr; - bool endptr_reached; -} XLogDumpPrivate; - typedef struct XLogDumpConfig { /* display options */ diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h new file mode 100644 index 00000000000..b88543856e5 --- /dev/null +++ b/src/bin/pg_waldump/pg_waldump.h @@ -0,0 +1,25 @@ +/*------------------------------------------------------------------------- + * + * pg_waldump.h - decode and display WAL + * + * Copyright (c) 2026, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/bin/pg_waldump/pg_waldump.h + *------------------------------------------------------------------------- + */ +#ifndef PG_WALDUMP_H +#define PG_WALDUMP_H + +#include "access/xlogdefs.h" + +/* Contains the necessary information to drive WAL decoding */ +typedef struct XLogDumpPrivate +{ + TimeLineID timeline; + XLogRecPtr startptr; + XLogRecPtr endptr; + bool endptr_reached; +} XLogDumpPrivate; + +#endif /* end of PG_WALDUMP_H */ -- 2.47.1 [application/octet-stream] v11-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 4-v11-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch) download | inline diff: From 1ead3c07617cdf6357ea05296c5e81229c631564 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 22 Jan 2026 10:38:16 +0530 Subject: [PATCH v11 3/9] Refactor: pg_waldump: Separate logic used to calculate the required read size. This refactoring prepares the codebase for an upcoming patch that will support reading WAL from tar files. The logic for calculating the required read size has been updated to handle both normal WAL files and WAL files located inside a tar archive. --- src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 4b7411a6498..958a71a01cf 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz) return NULL; /* not reached */ } +/* + * Returns the size in bytes of the data to be read. Returns -1 if the end + * point has already been reached. + */ +static inline int +required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr, + int reqLen) +{ + int count = XLOG_BLCKSZ; + + if (XLogRecPtrIsValid(private->endptr)) + { + if (targetPagePtr + XLOG_BLCKSZ <= private->endptr) + count = XLOG_BLCKSZ; + else if (targetPagePtr + reqLen <= private->endptr) + count = private->endptr - targetPagePtr; + else + { + private->endptr_reached = true; + return -1; + } + } + + return count; +} + /* pg_waldump's XLogReaderRoutine->segment_open callback */ static void WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo, @@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetPtr, char *readBuff) { XLogDumpPrivate *private = state->private_data; - int count = XLOG_BLCKSZ; + int count = required_read_len(private, targetPagePtr, reqLen); WALReadError errinfo; - if (XLogRecPtrIsValid(private->endptr)) - { - if (targetPagePtr + XLOG_BLCKSZ <= private->endptr) - count = XLOG_BLCKSZ; - else if (targetPagePtr + reqLen <= private->endptr) - count = private->endptr - targetPagePtr; - else - { - private->endptr_reached = true; - return -1; - } - } + /* Bail out if the count to be read is not valid */ + if (count < 0) + return -1; if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline, &errinfo)) -- 2.47.1 [application/octet-stream] v11-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch (5.6K, 5-v11-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch) download | inline diff: From 1230820381263aaf109abb142e1e613ca2bd0cc9 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 22 Jan 2026 11:06:05 +0530 Subject: [PATCH v11 4/9] Refactor: pg_waldump: Restructure TAP tests. Restructured tests that do not have a WAL file argument to run within a loop, facilitating their re-execution for decoding WAL from tar archives. == NOTE == This is not intended to be committed separately. It can be merged with the next patch, which is the main patch implementing this feature. --- src/bin/pg_waldump/t/001_basic.pl | 123 ++++++++++++++++-------------- 1 file changed, 67 insertions(+), 56 deletions(-) diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 5db5d20136f..3288fadcf48 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -198,28 +198,6 @@ command_like( ], qr/./, 'runs with start and end segment specified'); -command_fails_like( - [ 'pg_waldump', '--path' => $node->data_dir ], - qr/error: no start WAL location given/, - 'path option requires start location'); -command_like( - [ - 'pg_waldump', - '--path' => $node->data_dir, - '--start' => $start_lsn, - '--end' => $end_lsn, - ], - qr/./, - 'runs with path option and start and end locations'); -command_fails_like( - [ - 'pg_waldump', - '--path' => $node->data_dir, - '--start' => $start_lsn, - ], - qr/error: error in WAL record at/, - 'falling off the end of the WAL results in an error'); - command_like( [ 'pg_waldump', '--quiet', @@ -227,15 +205,6 @@ command_like( ], qr/^$/, 'no output with --quiet option'); -command_fails_like( - [ - 'pg_waldump', '--quiet', - '--path' => $node->data_dir, - '--start' => $start_lsn - ], - qr/error: error in WAL record at/, - 'errors are shown with --quiet'); - # Test for: Display a message that we're skipping data if `from` # wasn't a pointer to the start of a record. @@ -272,7 +241,6 @@ sub test_pg_waldump my $result = IPC::Run::run [ 'pg_waldump', - '--path' => $node->data_dir, '--start' => $start_lsn, '--end' => $end_lsn, @opts @@ -288,38 +256,81 @@ sub test_pg_waldump my @lines; -@lines = test_pg_waldump; -is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); +my @scenarios = ( + { + 'path' => $node->data_dir + }); -@lines = test_pg_waldump('--limit' => 6); -is(@lines, 6, 'limit option observed'); +for my $scenario (@scenarios) +{ + my $path = $scenario->{'path'}; -@lines = test_pg_waldump('--fullpage'); -is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW'); + SKIP: + { + command_fails_like( + [ 'pg_waldump', '--path' => $path ], + qr/error: no start WAL location given/, + 'path option requires start location'); + command_like( + [ + 'pg_waldump', + '--path' => $path, + '--start' => $start_lsn, + '--end' => $end_lsn, + ], + qr/./, + 'runs with path option and start and end locations'); + command_fails_like( + [ + 'pg_waldump', + '--path' => $path, + '--start' => $start_lsn, + ], + qr/error: error in WAL record at/, + 'falling off the end of the WAL results in an error'); -@lines = test_pg_waldump('--stats'); -like($lines[0], qr/WAL statistics/, "statistics on stdout"); -is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); + command_fails_like( + [ + 'pg_waldump', '--quiet', + '--path' => $path, + '--start' => $start_lsn + ], + qr/error: error in WAL record at/, + 'errors are shown with --quiet'); -@lines = test_pg_waldump('--stats=record'); -like($lines[0], qr/WAL statistics/, "statistics on stdout"); -is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); + @lines = test_pg_waldump('--path' => $path); + is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); -@lines = test_pg_waldump('--rmgr' => 'Btree'); -is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines'); + @lines = test_pg_waldump('--path' => $path, '--limit' => 6); + is(@lines, 6, 'limit option observed'); -@lines = test_pg_waldump('--fork' => 'init'); -is(grep(!/fork init/, @lines), 0, 'only init fork lines'); + @lines = test_pg_waldump('--path' => $path, '--fullpage'); + is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW'); -@lines = test_pg_waldump( - '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid"); -is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines), - 0, 'only lines for selected relation'); + @lines = test_pg_waldump('--path' => $path, '--stats'); + like($lines[0], qr/WAL statistics/, "statistics on stdout"); + is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); -@lines = test_pg_waldump( - '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid", - '--block' => 1); -is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block'); + @lines = test_pg_waldump('--path' => $path, '--stats=record'); + like($lines[0], qr/WAL statistics/, "statistics on stdout"); + is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); + @lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree'); + is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines'); + + @lines = test_pg_waldump('--path' => $path, '--fork' => 'init'); + is(grep(!/fork init/, @lines), 0, 'only init fork lines'); + + @lines = test_pg_waldump('--path' => $path, + '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid"); + is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines), + 0, 'only lines for selected relation'); + + @lines = test_pg_waldump('--path' => $path, + '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid", + '--block' => 1); + is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block'); + } +} done_testing(); -- 2.47.1 [application/octet-stream] v11-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch (39.3K, 6-v11-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch) download | inline diff: From 60a31941765d31420e62b77dcc607973823521e7 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Tue, 27 Jan 2026 15:44:09 +0530 Subject: [PATCH v11 5/9] pg_waldump: Add support for archived WAL decoding. pg_waldump can now accept the path to a tar archive containing WAL files and decode them. This feature was added primarily for pg_verifybackup, which previously disabled WAL parsing for tar-formatted backups. Note that this patch requires that the WAL files within the archive be in sequential order; an error will be reported otherwise. The next patch is planned to remove this restriction. --- doc/src/sgml/ref/pg_waldump.sgml | 8 +- src/bin/pg_waldump/Makefile | 7 +- src/bin/pg_waldump/archive_waldump.c | 615 +++++++++++++++++++++++++++ src/bin/pg_waldump/meson.build | 4 +- src/bin/pg_waldump/pg_waldump.c | 235 +++++++--- src/bin/pg_waldump/pg_waldump.h | 44 ++ src/bin/pg_waldump/t/001_basic.pl | 84 +++- src/tools/pgindent/typedefs.list | 3 + 8 files changed, 925 insertions(+), 75 deletions(-) create mode 100644 src/bin/pg_waldump/archive_waldump.c diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index ce23add5577..d004bb0f67e 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -141,13 +141,17 @@ PostgreSQL documentation <term><option>--path=<replaceable>path</replaceable></option></term> <listitem> <para> - Specifies a directory to search for WAL segment files or a - directory with a <literal>pg_wal</literal> subdirectory that + Specifies a tar archive or a directory to search for WAL segment files + or a directory with a <literal>pg_wal</literal> subdirectory that contains such files. The default is to search in the current directory, the <literal>pg_wal</literal> subdirectory of the current directory, and the <literal>pg_wal</literal> subdirectory of <envar>PGDATA</envar>. </para> + <para> + If a tar archive is provided, its WAL segment files must be in + sequential order; otherwise, an error will be reported. + </para> </listitem> </varlistentry> diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile index 4c1ee649501..aabb87566a2 100644 --- a/src/bin/pg_waldump/Makefile +++ b/src/bin/pg_waldump/Makefile @@ -3,6 +3,9 @@ PGFILEDESC = "pg_waldump - decode and display WAL" PGAPPICON=win32 +# make these available to TAP test scripts +export TAR + subdir = src/bin/pg_waldump top_builddir = ../../.. include $(top_builddir)/src/Makefile.global @@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global OBJS = \ $(RMGRDESCOBJS) \ $(WIN32RES) \ + archive_waldump.o \ compat.o \ pg_waldump.o \ rmgrdesc.o \ xlogreader.o \ xlogstats.o -override CPPFLAGS := -DFRONTEND $(CPPFLAGS) +override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS) +LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c))) RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES)) diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c new file mode 100644 index 00000000000..0b8352ff6e8 --- /dev/null +++ b/src/bin/pg_waldump/archive_waldump.c @@ -0,0 +1,615 @@ +/*------------------------------------------------------------------------- + * + * archive_waldump.c + * A generic facility for reading WAL data from tar archives via archive + * streamer. + * + * Portions Copyright (c) 2026, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/bin/pg_waldump/archive_waldump.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres_fe.h" + +#include <unistd.h> + +#include "access/xlog_internal.h" +#include "common/hashfn.h" +#include "common/logging.h" +#include "fe_utils/simple_list.h" +#include "pg_waldump.h" + +/* + * How many bytes should we try to read from a file at once? + */ +#define READ_CHUNK_SIZE (128 * 1024) + +/* + * Hash entry representing a WAL segment retrieved from the archive. + * + * While WAL segments are typically read sequentially, individual entries + * maintain their own buffers for the following reasons: + * + * 1. Boundary Handling: The archive streamer provides a continuous byte + * stream. A single streaming chunk may contain the end of one WAL segment + * and the start of the next. Separate buffers allow us to easily + * partition and track these bytes by their respective segments. + * + * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments + * are ever archived or retrieved out of sequence. + * + * To minimize the memory footprint, entries and their associated buffers are + * freed immediately once consumed. Since pg_waldump does not request the same + * bytes twice, a segment is discarded as soon as it moves past it. + */ +typedef struct ArchivedWALFile +{ + uint32 status; /* hash status */ + const char *fname; /* hash key: WAL segment name */ + + StringInfo buf; + + int total_read; /* total size of a WAL read from archive */ +} ArchivedWALFile; + +static uint32 hash_string_pointer(const char *s); +#define SH_PREFIX ArchivedWAL +#define SH_ELEMENT_TYPE ArchivedWALFile +#define SH_KEY_TYPE const char * +#define SH_KEY fname +#define SH_HASH_KEY(tb, key) hash_string_pointer(key) +#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0) +#define SH_SCOPE static inline +#define SH_RAW_ALLOCATOR pg_malloc0 +#define SH_DECLARE +#define SH_DEFINE +#include "lib/simplehash.h" + +typedef struct astreamer_waldump +{ + astreamer base; + XLogDumpPrivate *privateInfo; +} astreamer_waldump; + +static ArchivedWALFile *get_archive_wal_entry(const char *fname, + XLogDumpPrivate *privateInfo, + int WalSegSz); +static int read_archive_file(XLogDumpPrivate *privateInfo, Size count); + +static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo); +static void astreamer_waldump_content(astreamer *streamer, + astreamer_member *member, + const char *data, int len, + astreamer_archive_context context); +static void astreamer_waldump_finalize(astreamer *streamer); +static void astreamer_waldump_free(astreamer *streamer); + +static bool member_is_wal_file(astreamer_waldump *mystreamer, + astreamer_member *member, + char **fname); + +static const astreamer_ops astreamer_waldump_ops = { + .content = astreamer_waldump_content, + .finalize = astreamer_waldump_finalize, + .free = astreamer_waldump_free +}; + +/* + * Returns true if the given file is a tar archive and outputs its compression + * algorithm. + */ +bool +is_archive_file(const char *fname, pg_compress_algorithm *compression) +{ + int fname_len = strlen(fname); + pg_compress_algorithm compress_algo; + + /* Now, check the compression type of the tar */ + if (fname_len > 4 && + strcmp(fname + fname_len - 4, ".tar") == 0) + compress_algo = PG_COMPRESSION_NONE; + else if (fname_len > 4 && + strcmp(fname + fname_len - 4, ".tgz") == 0) + compress_algo = PG_COMPRESSION_GZIP; + else if (fname_len > 7 && + strcmp(fname + fname_len - 7, ".tar.gz") == 0) + compress_algo = PG_COMPRESSION_GZIP; + else if (fname_len > 8 && + strcmp(fname + fname_len - 8, ".tar.lz4") == 0) + compress_algo = PG_COMPRESSION_LZ4; + else if (fname_len > 8 && + strcmp(fname + fname_len - 8, ".tar.zst") == 0) + compress_algo = PG_COMPRESSION_ZSTD; + else + return false; + + *compression = compress_algo; + + return true; +} + +/* + * Initializes the tar archive reader, creates a hash table for WAL entries, + * checks for existing valid WAL segments in the archive file and retrieves the + * segment size, and sets up filters for relevant entries. + */ +void +init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir, + int *WalSegSz, pg_compress_algorithm compression) +{ + int fd; + astreamer *streamer; + ArchivedWALFile *entry = NULL; + XLogLongPageHeader longhdr; + + /* Open tar archive and store its file descriptor */ + fd = open_file_in_directory(waldir, privateInfo->archive_name); + + if (fd < 0) + pg_fatal("could not open file \"%s\"", privateInfo->archive_name); + + privateInfo->archive_fd = fd; + + streamer = astreamer_waldump_new(privateInfo); + + /* Before that we must parse the tar archive. */ + streamer = astreamer_tar_parser_new(streamer); + + /* Before that we must decompress, if archive is compressed. */ + if (compression == PG_COMPRESSION_GZIP) + streamer = astreamer_gzip_decompressor_new(streamer); + else if (compression == PG_COMPRESSION_LZ4) + streamer = astreamer_lz4_decompressor_new(streamer); + else if (compression == PG_COMPRESSION_ZSTD) + streamer = astreamer_zstd_decompressor_new(streamer); + + privateInfo->archive_streamer = streamer; + + /* Hash table storing WAL entries read from the archive */ + privateInfo->archive_wal_htab = ArchivedWAL_create(16, NULL); + + /* + * Verify that the archive contains valid WAL files and fetch WAL segment + * size + */ + while (entry == NULL || entry->buf->len < XLOG_BLCKSZ) + { + if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0) + pg_fatal("could not find WAL in \"%s\" archive", + privateInfo->archive_name); + + entry = privateInfo->cur_file; + } + + /* Set WalSegSz if WAL data is successfully read */ + longhdr = (XLogLongPageHeader) entry->buf->data; + + if (!IsValidWalSegSize(longhdr->xlp_seg_size)) + { + pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)", + "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)", + longhdr->xlp_seg_size), + privateInfo->archive_name, longhdr->xlp_seg_size); + pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB."); + exit(1); + } + + *WalSegSz = longhdr->xlp_seg_size; + + /* + * With the WAL segment size available, we can now initialize the + * dependent start and end segment numbers. + */ + Assert(!XLogRecPtrIsInvalid(privateInfo->startptr)); + XLByteToSeg(privateInfo->startptr, privateInfo->startSegNo, *WalSegSz); + + if (XLogRecPtrIsInvalid(privateInfo->endptr)) + privateInfo->endSegNo = UINT64_MAX; + else + XLByteToSeg(privateInfo->endptr, privateInfo->endSegNo, *WalSegSz); +} + +/* + * Release the archive streamer chain and close the archive file. + */ +void +free_archive_reader(XLogDumpPrivate *privateInfo) +{ + /* + * NB: Normally, astreamer_finalize() is called before astreamer_free() to + * flush any remaining buffered data or to ensure the end of the tar + * archive is reached. However, when decoding a WAL file, once we hit the + * end LSN, any remaining WAL data in the buffer or the tar archive's + * unreached end can be safely ignored. + */ + astreamer_free(privateInfo->archive_streamer); + + /* Close the file. */ + if (close(privateInfo->archive_fd) != 0) + pg_log_error("could not close file \"%s\": %m", + privateInfo->archive_name); +} + +/* + * Copies WAL data from astreamer to readBuff; if unavailable, fetches more + * from the tar archive via astreamer. + */ +int +read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr, + Size count, char *readBuff, int WalSegSz) +{ + char *p = readBuff; + Size nbytes = count; + XLogRecPtr recptr = targetPagePtr; + XLogSegNo segno; + char fname[MAXFNAMELEN]; + ArchivedWALFile *entry; + + /* Identify the segment and locate its entry in the archive hash */ + XLByteToSeg(targetPagePtr, segno, WalSegSz); + XLogFileName(fname, privateInfo->timeline, segno, WalSegSz); + entry = get_archive_wal_entry(fname, privateInfo, WalSegSz); + + while (nbytes > 0) + { + char *buf = entry->buf->data; + int bufLen = entry->buf->len; + XLogRecPtr endPtr; + XLogRecPtr startPtr; + + /* Calculate the LSN range currently residing in the buffer */ + XLogSegNoOffsetToRecPtr(segno, entry->total_read, WalSegSz, endPtr); + startPtr = endPtr - bufLen; + + /* + * Copy the requested WAL record if it exists in the buffer. + */ + if (bufLen > 0 && startPtr <= recptr && recptr < endPtr) + { + int copyBytes; + int offset = recptr - startPtr; + + /* + * Given startPtr <= recptr < endPtr and a total buffer size + * 'bufLen', the offset (recptr - startPtr) will always be less + * than 'bufLen'. + */ + Assert(offset < bufLen); + + copyBytes = Min(nbytes, bufLen - offset); + memcpy(p, buf + offset, copyBytes); + + /* Update state for read */ + recptr += copyBytes; + nbytes -= copyBytes; + p += copyBytes; + } + else + { + /* + * pg_waldump may request to re-read the currently active page, + * but never a page older than the current one. Therefore, any + * fully consumed WAL data preceding the current page can be + * safely discarded. + */ + resetStringInfo(entry->buf); + + /* + * Push back the partial page data for the current page to the + * buffer, ensuring it remains full page available for re-reading + * if requested. + */ + if (p > readBuff) + { + Assert((count - nbytes) > 0); + appendBinaryStringInfo(entry->buf, readBuff, count - nbytes); + } + + /* + * Now, fetch more data; raise an error if it's not the current + * segment being read by the archive streamer or if reading of the + * archived file has finished. + */ + if (privateInfo->cur_file != entry || + read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) + pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld", + fname, privateInfo->archive_name, + (long long int) count - nbytes, + (long long int) nbytes); + } + } + + /* + * Should have either have successfully read all the requested bytes or + * reported a failure before this point. + */ + Assert(nbytes == 0); + + /* + * NB: We return the fixed value provided as input. Although we could + * return a boolean since we either successfully read the WAL page or + * raise an error, but the caller expects this value to be returned. The + * routine that reads WAL pages from the physical WAL file follows the + * same convention. + */ + return count; +} + +/* + * Clears the buffer of a WAL entry that is being ignored. This frees up memory + * and prevents the accumulation of irrelevant WAL data. Additionally, + * conditionally setting cur_file within privateinfo to NULL ensures the + * archive streamer skips unnecessary copy operations + */ +void +free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) +{ + ArchivedWALFile *entry; + + entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname); + + if (entry == NULL) + return; + + /* Destroy the buffer */ + destroyStringInfo(entry->buf); + entry->buf = NULL; + + /* Set cur_file to NULL if it matches the entry being ignored */ + if (privateInfo->cur_file == entry) + privateInfo->cur_file = NULL; + + ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry); +} + +/* + * Returns the archived WAL entry from the hash table if it exists. Otherwise, + * it invokes the routine to read the archived file, which then populates the + * entry in the hash table if that WAL exists in the archive. + */ +static ArchivedWALFile * +get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, + int WalSegSz) +{ + ArchivedWALFile *entry = NULL; + XLogSegNo segno; + TimeLineID timeline; + + /* Search hash table */ + entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname); + + if (entry != NULL) + return entry; + + /* Get segment number for faster comparison than string-based names */ + XLogFromFileName(fname, &timeline, &segno, WalSegSz); + + /* Needed WAL yet to be decoded from archive, do the same */ + while (1) + { + XLogSegNo curSegNo; + TimeLineID curSegTimeline; + + /* Fetch more data */ + if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) + break; /* archive file ended */ + + /* + * Either, here for the first time, or the archived streamer is + * reading a non-WAL file or an irrelevant WAL file. + */ + if (privateInfo->cur_file == NULL) + continue; + + entry = privateInfo->cur_file; + + /* + * Ignore if the timeline is different or the current segment is not + * the desired one. + */ + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); + if (privateInfo->timeline != curSegTimeline || + privateInfo->startSegNo > curSegNo || + privateInfo->endSegNo < curSegNo || + segno > curSegNo) + { + free_archive_wal_entry(entry->fname, privateInfo); + continue; + } + + /* Found the required entry */ + if (curSegNo == segno) + return entry; + + /* WAL segments must be archived in order */ + pg_log_error("WAL files are not archived in sequential order"); + pg_log_error_detail("Expecting segment number " UINT64_FORMAT " but found " UINT64_FORMAT ".", + segno, curSegNo); + exit(1); + } + + /* Requested WAL segment not found */ + pg_fatal("could not find file \"%s\" in archive", fname); +} + +/* + * Reads the archive file and passes it to the archive streamer for + * decompression. + */ +static int +read_archive_file(XLogDumpPrivate *privateInfo, Size count) +{ + int rc; + char *buffer; + + buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8)); + + rc = read(privateInfo->archive_fd, buffer, count); + if (rc < 0) + pg_fatal("could not read file \"%s\": %m", + privateInfo->archive_name); + + /* + * Decompress (if required), and then parse the previously read contents + * of the tar file. + */ + if (rc > 0) + astreamer_content(privateInfo->archive_streamer, NULL, + buffer, rc, ASTREAMER_UNKNOWN); + pg_free(buffer); + + return rc; +} + +/* + * Create an astreamer that can read WAL from a tar file. + */ +static astreamer * +astreamer_waldump_new(XLogDumpPrivate *privateInfo) +{ + astreamer_waldump *streamer; + + streamer = palloc0(sizeof(astreamer_waldump)); + *((const astreamer_ops **) &streamer->base.bbs_ops) = + &astreamer_waldump_ops; + + streamer->privateInfo = privateInfo; + + return &streamer->base; +} + +/* + * Main entry point of the archive streamer for reading WAL data from a tar + * file. If a member is identified as a valid WAL file, a hash entry is created + * for it, and its contents are copied into that entry's buffer, making them + * accessible to the decoding routine. + */ +static void +astreamer_waldump_content(astreamer *streamer, astreamer_member *member, + const char *data, int len, + astreamer_archive_context context) +{ + astreamer_waldump *mystreamer = (astreamer_waldump *) streamer; + XLogDumpPrivate *privateInfo = mystreamer->privateInfo; + + Assert(context != ASTREAMER_UNKNOWN); + + switch (context) + { + case ASTREAMER_MEMBER_HEADER: + { + char *fname = NULL; + ArchivedWALFile *entry; + bool found; + + pg_log_debug("reading \"%s\"", member->pathname); + + if (!member_is_wal_file(mystreamer, member, &fname)) + break; + + entry = ArchivedWAL_insert(privateInfo->archive_wal_htab, + fname, &found); + + /* + * Shouldn't happen, but if it does, simply ignore the + * duplicate WAL file. + */ + if (found) + { + pg_log_warning("ignoring duplicate WAL file found in archive: \"%s\"", + member->pathname); + break; + } + + entry->buf = makeStringInfo(); + privateInfo->cur_file = entry; + } + break; + + case ASTREAMER_MEMBER_CONTENTS: + if (privateInfo->cur_file) + { + appendBinaryStringInfo(privateInfo->cur_file->buf, data, len); + privateInfo->cur_file->total_read += len; + } + break; + + case ASTREAMER_MEMBER_TRAILER: + privateInfo->cur_file = NULL; + break; + + case ASTREAMER_ARCHIVE_TRAILER: + break; + + default: + /* Shouldn't happen. */ + pg_fatal("unexpected state while parsing tar file"); + } +} + +/* + * End-of-stream processing for an astreamer_waldump stream. + */ +static void +astreamer_waldump_finalize(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); +} + +/* + * Free memory associated with a astreamer_waldump stream. + */ +static void +astreamer_waldump_free(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); + pfree(streamer); +} + +/* + * Returns true if the archive member name matches the WAL naming format. If + * successful, it also outputs the WAL segment name. + */ +static bool +member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member, + char **fname) +{ + int pathlen; + char *filename; + + /* We are only interested in normal files. */ + if (member->is_directory || member->is_link) + return false; + + pathlen = strlen(member->pathname); + if (pathlen < XLOG_FNAME_LEN) + return false; + + /* WAL files from the top-level or pg_wal directory will be decoded */ + if (pathlen > XLOG_FNAME_LEN && + strncmp(member->pathname, XLOGDIR, strlen(XLOGDIR)) != 0) + return false; + + /* WAL file could be with full path */ + filename = member->pathname + (pathlen - XLOG_FNAME_LEN); + if (!IsXLogFileName(filename)) + return false; + + *fname = pnstrdup(filename, XLOG_FNAME_LEN); + + return true; +} + +/* + * Helper function for filemap hash table. + */ +static uint32 +hash_string_pointer(const char *s) +{ + unsigned char *ss = (unsigned char *) s; + + return hash_bytes(ss, strlen(s)); +} diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build index 633a9874bb5..a93bd947ca2 100644 --- a/src/bin/pg_waldump/meson.build +++ b/src/bin/pg_waldump/meson.build @@ -1,6 +1,7 @@ # Copyright (c) 2022-2026, PostgreSQL Global Development Group pg_waldump_sources = files( + 'archive_waldump.c', 'compat.c', 'pg_waldump.c', 'rmgrdesc.c', @@ -18,7 +19,7 @@ endif pg_waldump = executable('pg_waldump', pg_waldump_sources, - dependencies: [frontend_code, lz4, zstd], + dependencies: [frontend_code, lz4, zstd, libpq], c_args: ['-DFRONTEND'], # needed for xlogreader et al kwargs: default_bin_args, ) @@ -29,6 +30,7 @@ tests += { 'sd': meson.current_source_dir(), 'bd': meson.current_build_dir(), 'tap': { + 'env': {'TAR': tar.found() ? tar.full_path() : ''}, 'tests': [ 't/001_basic.pl', 't/002_save_fullpage.pl', diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 958a71a01cf..3cd279caded 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname) * * return a read only fd */ -static int +int open_file_in_directory(const char *directory, const char *fname) { int fd = -1; @@ -440,6 +440,63 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, return count; } +/* + * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL + * files from tar archives. + */ +static void +TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo, + TimeLineID *tli_p) +{ + /* No action needed */ +} + +/* + * pg_waldump's XLogReaderRoutine->segment_close callback. + */ +static void +TarWALDumpCloseSegment(XLogReaderState *state) +{ + /* No action needed */ +} + +/* + * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL + * files from tar archives. + */ +static int +TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, + XLogRecPtr targetPtr, char *readBuff) +{ + XLogDumpPrivate *private = state->private_data; + int count = required_read_len(private, targetPagePtr, reqLen); + int WalSegSz = state->segcxt.ws_segsize; + XLogSegNo nextSegNo; + + /* Bail out if the count to be read is not valid */ + if (count < 0) + return -1; + + /* + * If the target page is in a different segment, free the buffer space + * occupied by the previous segment data. Since pg_waldump never requests + * the same WAL bytes twice, moving to a new segment implies the previous + * buffer's data and that segment will not be needed again. + */ + nextSegNo = state->seg.ws_segno; + if (!XLByteInSeg(targetPagePtr, nextSegNo, WalSegSz)) + { + char fname[MAXFNAMELEN]; + + XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz); + free_archive_wal_entry(fname, private); + } + + /* Read the WAL page from the archive streamer */ + return read_archive_wal_page(private, targetPagePtr, count, readBuff, + WalSegSz); +} + /* * Boolean to return whether the given WAL record matches a specific relation * and optionally block. @@ -777,8 +834,8 @@ usage(void) printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n" " valid names are main, fsm, vm, init\n")); printf(_(" -n, --limit=N number of records to display\n")); - printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n" - " directory with a ./pg_wal that contains such files\n" + printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n" + " a directory with a ./pg_wal that contains such files\n" " (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n" @@ -810,8 +867,11 @@ main(int argc, char **argv) XLogRecord *record; XLogRecPtr first_record; char *waldir = NULL; + char *walpath = NULL; char *errormsg; int WalSegSz; + bool is_archive = false; + pg_compress_algorithm compression; static struct option long_options[] = { {"bkp-details", no_argument, NULL, 'b'}, @@ -943,7 +1003,7 @@ main(int argc, char **argv) } break; case 'p': - waldir = pg_strdup(optarg); + walpath = pg_strdup(optarg); break; case 'q': config.quiet = true; @@ -1107,10 +1167,20 @@ main(int argc, char **argv) goto bad_argument; } - if (waldir != NULL) + if (walpath != NULL) { + /* validate path points to tar archive */ + if (is_archive_file(walpath, &compression)) + { + char *fname = NULL; + + split_path(walpath, &waldir, &fname); + + private.archive_name = fname; + is_archive = true; + } /* validate path points to directory */ - if (!verify_directory(waldir)) + else if (!verify_directory(walpath)) { pg_log_error("could not open directory \"%s\": %m", waldir); goto bad_argument; @@ -1128,6 +1198,17 @@ main(int argc, char **argv) int fd; XLogSegNo segno; + /* + * If a tar archive is passed using the --path option, all other + * arguments become unnecessary. + */ + if (is_archive) + { + pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")", + argv[optind]); + goto bad_argument; + } + split_path(argv[optind], &directory, &fname); if (waldir == NULL && directory != NULL) @@ -1138,69 +1219,77 @@ main(int argc, char **argv) pg_fatal("could not open directory \"%s\": %m", waldir); } - waldir = identify_target_directory(waldir, fname, &WalSegSz); - fd = open_file_in_directory(waldir, fname); - if (fd < 0) - pg_fatal("could not open file \"%s\"", fname); - close(fd); - - /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &segno, WalSegSz); - - if (!XLogRecPtrIsValid(private.startptr)) - XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr); - else if (!XLByteInSeg(private.startptr, segno, WalSegSz)) + if (fname != NULL && is_archive_file(fname, &compression)) { - pg_log_error("start WAL location %X/%08X is not inside file \"%s\"", - LSN_FORMAT_ARGS(private.startptr), - fname); - goto bad_argument; + private.archive_name = fname; + is_archive = true; } - - /* no second file specified, set end position */ - if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr); - - /* parse ENDSEG if passed */ - if (optind + 1 < argc) + else { - XLogSegNo endsegno; - - /* ignore directory, already have that */ - split_path(argv[optind + 1], &directory, &fname); - + waldir = identify_target_directory(waldir, fname, &WalSegSz); fd = open_file_in_directory(waldir, fname); if (fd < 0) pg_fatal("could not open file \"%s\"", fname); close(fd); /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz); + XLogFromFileName(fname, &private.timeline, &segno, WalSegSz); - if (endsegno < segno) - pg_fatal("ENDSEG %s is before STARTSEG %s", - argv[optind + 1], argv[optind]); + if (!XLogRecPtrIsValid(private.startptr)) + XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr); + else if (!XLByteInSeg(private.startptr, segno, WalSegSz)) + { + pg_log_error("start WAL location %X/%08X is not inside file \"%s\"", + LSN_FORMAT_ARGS(private.startptr), + fname); + goto bad_argument; + } - if (!XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz, - private.endptr); + /* no second file specified, set end position */ + if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr)) + XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr); - /* set segno to endsegno for check of --end */ - segno = endsegno; - } + /* parse ENDSEG if passed */ + if (optind + 1 < argc) + { + XLogSegNo endsegno; + /* ignore directory, already have that */ + split_path(argv[optind + 1], &directory, &fname); - if (!XLByteInSeg(private.endptr, segno, WalSegSz) && - private.endptr != (segno + 1) * WalSegSz) - { - pg_log_error("end WAL location %X/%08X is not inside file \"%s\"", - LSN_FORMAT_ARGS(private.endptr), - argv[argc - 1]); - goto bad_argument; + fd = open_file_in_directory(waldir, fname); + if (fd < 0) + pg_fatal("could not open file \"%s\"", fname); + close(fd); + + /* parse position from file */ + XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz); + + if (endsegno < segno) + pg_fatal("ENDSEG %s is before STARTSEG %s", + argv[optind + 1], argv[optind]); + + if (!XLogRecPtrIsValid(private.endptr)) + XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz, + private.endptr); + + /* set segno to endsegno for check of --end */ + segno = endsegno; + } + + + if (!XLByteInSeg(private.endptr, segno, WalSegSz) && + private.endptr != (segno + 1) * WalSegSz) + { + pg_log_error("end WAL location %X/%08X is not inside file \"%s\"", + LSN_FORMAT_ARGS(private.endptr), + argv[argc - 1]); + goto bad_argument; + } } } - else - waldir = identify_target_directory(waldir, NULL, &WalSegSz); + else if (!is_archive) + waldir = identify_target_directory(walpath, NULL, &WalSegSz); /* we don't know what to print */ if (!XLogRecPtrIsValid(private.startptr)) @@ -1212,12 +1301,37 @@ main(int argc, char **argv) /* done with argument parsing, do the actual work */ /* we have everything we need, start reading */ - xlogreader_state = - XLogReaderAllocate(WalSegSz, waldir, - XL_ROUTINE(.page_read = WALDumpReadPage, - .segment_open = WALDumpOpenSegment, - .segment_close = WALDumpCloseSegment), - &private); + if (is_archive) + { + /* + * A NULL WAL directory indicates that the archive file is located in + * the current working directory of the pg_waldump execution + */ + if (waldir == NULL) + waldir = pg_strdup("."); + + /* Set up for reading tar file */ + init_archive_reader(&private, waldir, &WalSegSz, compression); + + /* Routine to decode WAL files in tar archive */ + xlogreader_state = + XLogReaderAllocate(WalSegSz, waldir, + XL_ROUTINE(.page_read = TarWALDumpReadPage, + .segment_open = TarWALDumpOpenSegment, + .segment_close = TarWALDumpCloseSegment), + &private); + } + else + { + /* Routine to decode WAL files */ + xlogreader_state = + XLogReaderAllocate(WalSegSz, waldir, + XL_ROUTINE(.page_read = WALDumpReadPage, + .segment_open = WALDumpOpenSegment, + .segment_close = WALDumpCloseSegment), + &private); + } + if (!xlogreader_state) pg_fatal("out of memory while allocating a WAL reading processor"); @@ -1326,6 +1440,9 @@ main(int argc, char **argv) XLogReaderFree(xlogreader_state); + if (is_archive) + free_archive_reader(&private); + return EXIT_SUCCESS; bad_argument: diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h index b88543856e5..e98a20152da 100644 --- a/src/bin/pg_waldump/pg_waldump.h +++ b/src/bin/pg_waldump/pg_waldump.h @@ -12,6 +12,11 @@ #define PG_WALDUMP_H #include "access/xlogdefs.h" +#include "fe_utils/astreamer.h" + +/* Forward declaration */ +struct ArchivedWALFile; +struct ArchivedWAL_hash; /* Contains the necessary information to drive WAL decoding */ typedef struct XLogDumpPrivate @@ -20,6 +25,45 @@ typedef struct XLogDumpPrivate XLogRecPtr startptr; XLogRecPtr endptr; bool endptr_reached; + + /* Fields required to read WAL from archive */ + char *archive_name; /* Tar archive name */ + int archive_fd; /* File descriptor for the open tar file */ + + astreamer *archive_streamer; + + /* What the archive streamer is currently reading */ + struct ArchivedWALFile *cur_file; + + /* + * Hash table of all WAL files that the archive stream has read, including + * the one currently in progress. + */ + struct ArchivedWAL_hash *archive_wal_htab; + + /* + * Although these values can be easily derived from startptr and endptr, + * doing so repeatedly for each archived member would be inefficient, as + * it would involve recalculating and filtering out irrelevant WAL + * segments. + */ + XLogSegNo startSegNo; + XLogSegNo endSegNo; } XLogDumpPrivate; +extern int open_file_in_directory(const char *directory, const char *fname); + +extern bool is_archive_file(const char *fname, + pg_compress_algorithm *compression); +extern void init_archive_reader(XLogDumpPrivate *privateInfo, + const char *waldir, int *WalSegSz, + pg_compress_algorithm compression); +extern void free_archive_reader(XLogDumpPrivate *privateInfo); +extern int read_archive_wal_page(XLogDumpPrivate *privateInfo, + XLogRecPtr targetPagePtr, + Size count, char *readBuff, + int WalSegSz); +extern void free_archive_wal_entry(const char *fname, + XLogDumpPrivate *privateInfo); + #endif /* end of PG_WALDUMP_H */ diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 3288fadcf48..13567fbdba1 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -3,10 +3,13 @@ use strict; use warnings FATAL => 'all'; +use Cwd; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +my $tar = $ENV{TAR}; + program_help_ok('pg_waldump'); program_version_ok('pg_waldump'); program_options_handling_ok('pg_waldump'); @@ -235,7 +238,7 @@ command_like( sub test_pg_waldump { local $Test::Builder::Level = $Test::Builder::Level + 1; - my @opts = @_; + my ($path, @opts) = @_; my ($stdout, $stderr); @@ -243,6 +246,7 @@ sub test_pg_waldump 'pg_waldump', '--start' => $start_lsn, '--end' => $end_lsn, + '--path' => $path, @opts ], '>' => \$stdout, @@ -254,11 +258,50 @@ sub test_pg_waldump return @lines; } -my @lines; +# Create a tar archive, sorting the file order +sub generate_archive +{ + my ($archive, $directory, $compression_flags) = @_; + + my @files; + opendir my $dh, $directory or die "opendir: $!"; + while (my $entry = readdir $dh) { + # Skip '.' and '..' + next if $entry eq '.' || $entry eq '..'; + push @files, $entry; + } + closedir $dh; + + @files = sort @files; + + # move into the WAL directory before archiving files + my $cwd = getcwd; + chdir($directory) || die "chdir: $!"; + command_ok([$tar, $compression_flags, $archive, @files]); + chdir($cwd) || die "chdir: $!"; +} + +my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short(); my @scenarios = ( { - 'path' => $node->data_dir + 'path' => $node->data_dir, + 'is_archive' => 0, + 'enabled' => 1 + }, + { + 'path' => "$tmp_dir/pg_wal.tar", + 'compression_method' => 'none', + 'compression_flags' => '-cf', + 'is_archive' => 1, + 'enabled' => 1 + }, + { + 'path' => "$tmp_dir/pg_wal.tar.gz", + 'compression_method' => 'gzip', + 'compression_flags' => '-czf', + 'is_archive' => 1, + 'enabled' => check_pg_config("#define HAVE_LIBZ 1") }); for my $scenario (@scenarios) @@ -267,6 +310,19 @@ for my $scenario (@scenarios) SKIP: { + skip "tar command is not available", 3 + if !defined $tar; + skip "$scenario->{'compression_method'} compression not supported by this build", 3 + if !$scenario->{'enabled'} && $scenario->{'is_archive'}; + + # create pg_wal archive + if ($scenario->{'is_archive'}) + { + generate_archive($path, + $node->data_dir . '/pg_wal', + $scenario->{'compression_flags'}); + } + command_fails_like( [ 'pg_waldump', '--path' => $path ], qr/error: no start WAL location given/, @@ -298,38 +354,42 @@ for my $scenario (@scenarios) qr/error: error in WAL record at/, 'errors are shown with --quiet'); - @lines = test_pg_waldump('--path' => $path); + my @lines; + @lines = test_pg_waldump($path); is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); - @lines = test_pg_waldump('--path' => $path, '--limit' => 6); + @lines = test_pg_waldump($path, '--limit' => 6); is(@lines, 6, 'limit option observed'); - @lines = test_pg_waldump('--path' => $path, '--fullpage'); + @lines = test_pg_waldump($path, '--fullpage'); is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW'); - @lines = test_pg_waldump('--path' => $path, '--stats'); + @lines = test_pg_waldump($path, '--stats'); like($lines[0], qr/WAL statistics/, "statistics on stdout"); is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); - @lines = test_pg_waldump('--path' => $path, '--stats=record'); + @lines = test_pg_waldump($path, '--stats=record'); like($lines[0], qr/WAL statistics/, "statistics on stdout"); is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); - @lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree'); + @lines = test_pg_waldump($path, '--rmgr' => 'Btree'); is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines'); - @lines = test_pg_waldump('--path' => $path, '--fork' => 'init'); + @lines = test_pg_waldump($path, '--fork' => 'init'); is(grep(!/fork init/, @lines), 0, 'only init fork lines'); - @lines = test_pg_waldump('--path' => $path, + @lines = test_pg_waldump($path, '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid"); is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines), 0, 'only lines for selected relation'); - @lines = test_pg_waldump('--path' => $path, + @lines = test_pg_waldump($path, '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid", '--block' => 1); is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block'); + + # Cleanup. + unlink $path if $scenario->{'is_archive'}; } } diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index ddbe4c64971..c322001e076 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -144,6 +144,8 @@ ArchiveOpts ArchiveShutdownCB ArchiveStartupCB ArchiveStreamState +ArchivedWALFile +ArchivedWAL_hash ArchiverOutput ArchiverStage ArrayAnalyzeExtraData @@ -3504,6 +3506,7 @@ astreamer_recovery_injector astreamer_tar_archiver astreamer_tar_parser astreamer_verify +astreamer_waldump astreamer_zstd_frame auth_password_hook_typ autovac_table -- 2.47.1 [application/octet-stream] v11-0006-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.4K, 7-v11-0006-pg_waldump-Remove-the-restriction-on-the-order-o.patch) download | inline diff: From 1baf6ee367d5e73222a7f3be7afdce0b273d586e Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Tue, 27 Jan 2026 15:38:34 +0530 Subject: [PATCH v11 6/9] pg_waldump: Remove the restriction on the order of archived WAL files. With previous patch, pg_waldump would stop decoding if WAL files were not in the required sequence. With this patch, decoding will now continue. Any WAL file that is out of order will be written to a temporary location, from which it will be read later. Once a temporary file has been read, it will be removed. --- doc/src/sgml/ref/pg_waldump.sgml | 8 +- src/bin/pg_waldump/archive_waldump.c | 173 ++++++++++++++++++++++++--- src/bin/pg_waldump/pg_waldump.c | 35 +++++- src/bin/pg_waldump/pg_waldump.h | 3 + src/bin/pg_waldump/t/001_basic.pl | 3 +- 5 files changed, 200 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index d004bb0f67e..27adf77755c 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -149,8 +149,12 @@ PostgreSQL documentation of <envar>PGDATA</envar>. </para> <para> - If a tar archive is provided, its WAL segment files must be in - sequential order; otherwise, an error will be reported. + If a tar archive is provided and its WAL segment files are not in + sequential order, those files will be written to a temporary directory + named starting with <filename>waldump_tmp</filename>. This directory will be + created inside the directory specified by the <envar>TMPDIR</envar> + environment variable if it is set; otherwise, it will be created within + the same directory as the tar archive. </para> </listitem> </varlistentry> diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c index 0b8352ff6e8..b524fdad37f 100644 --- a/src/bin/pg_waldump/archive_waldump.c +++ b/src/bin/pg_waldump/archive_waldump.c @@ -17,6 +17,7 @@ #include <unistd.h> #include "access/xlog_internal.h" +#include "common/file_perm.h" #include "common/hashfn.h" #include "common/logging.h" #include "fe_utils/simple_list.h" @@ -27,6 +28,9 @@ */ #define READ_CHUNK_SIZE (128 * 1024) +/* Temporary exported WAL file directory */ +char *TmpWalSegDir = NULL; + /* * Hash entry representing a WAL segment retrieved from the archive. * @@ -51,6 +55,7 @@ typedef struct ArchivedWALFile const char *fname; /* hash key: WAL segment name */ StringInfo buf; + bool tmpfile_exists; /* spill file exists? */ int total_read; /* total size of a WAL read from archive */ } ArchivedWALFile; @@ -78,6 +83,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, int WalSegSz); static int read_archive_file(XLogDumpPrivate *privateInfo, Size count); +static void setup_tmpwal_dir(const char *waldir); +static void cleanup_tmpwal_dir_atexit(void); + +static FILE *prepare_tmp_write(const char *fname); +static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file); static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo); static void astreamer_waldump_content(astreamer *streamer, @@ -134,7 +144,9 @@ is_archive_file(const char *fname, pg_compress_algorithm *compression) /* * Initializes the tar archive reader, creates a hash table for WAL entries, * checks for existing valid WAL segments in the archive file and retrieves the - * segment size, and sets up filters for relevant entries. + * segment size, and sets up filters for relevant entries. It also configures a + * temporary directory for out-of-order WAL data and registers an exit callback + * to clean up temporary files. */ void init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir, @@ -210,6 +222,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir, privateInfo->endSegNo = UINT64_MAX; else XLByteToSeg(privateInfo->endptr, privateInfo->endSegNo, *WalSegSz); + + /* + * Setup temporary directory to store WAL segments and set up an exit + * callback to remove it upon completion. + */ + setup_tmpwal_dir(waldir); + atexit(cleanup_tmpwal_dir_atexit); } /* @@ -358,6 +377,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) destroyStringInfo(entry->buf); entry->buf = NULL; + /* Remove temporary file if any */ + if (entry->tmpfile_exists) + { + char fpath[MAXPGPATH]; + + snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname); + + if (unlink(fpath) == 0) + pg_log_debug("removed file \"%s\"", fpath); + } + /* Set cur_file to NULL if it matches the entry being ignored */ if (privateInfo->cur_file == entry) privateInfo->cur_file = NULL; @@ -369,6 +399,9 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) * Returns the archived WAL entry from the hash table if it exists. Otherwise, * it invokes the routine to read the archived file, which then populates the * entry in the hash table if that WAL exists in the archive. + * If the archive streamer happens to be reading a + * WAL from archive file that is not currently needed, that WAL data is written + * to a temporary file. */ static ArchivedWALFile * get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, @@ -377,6 +410,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, ArchivedWALFile *entry = NULL; XLogSegNo segno; TimeLineID timeline; + FILE *write_fp = NULL; /* Search hash table */ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname); @@ -393,19 +427,29 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, XLogSegNo curSegNo; TimeLineID curSegTimeline; + /* + * The WAL file entry currently being processed may change during + * archive streamer execution. Therefore, maintain a local variable to + * reference the previous entry, ensuring that any remaining data in + * its buffer is successfully flushed to the temporary file before + * switching to the next WAL entry. + */ + entry = privateInfo->cur_file; + /* Fetch more data */ - if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) - break; /* archive file ended */ + if (entry == NULL || entry->buf->len == 0) + { + if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) + break; /* archive file ended */ + } /* * Either, here for the first time, or the archived streamer is * reading a non-WAL file or an irrelevant WAL file. */ - if (privateInfo->cur_file == NULL) + if (entry == NULL) continue; - entry = privateInfo->cur_file; - /* * Ignore if the timeline is different or the current segment is not * the desired one. @@ -413,8 +457,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); if (privateInfo->timeline != curSegTimeline || privateInfo->startSegNo > curSegNo || - privateInfo->endSegNo < curSegNo || - segno > curSegNo) + privateInfo->endSegNo < curSegNo) { free_archive_wal_entry(entry->fname, privateInfo); continue; @@ -424,11 +467,32 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, if (curSegNo == segno) return entry; - /* WAL segments must be archived in order */ - pg_log_error("WAL files are not archived in sequential order"); - pg_log_error_detail("Expecting segment number " UINT64_FORMAT " but found " UINT64_FORMAT ".", - segno, curSegNo); - exit(1); + /* + * Archive streamer is currently reading a file that isn't the one + * asked for, but it's required in the future. It should be written to + * a temporary location for retrieval when needed. + */ + + /* Create a temporary file if one does not already exist */ + if (!entry->tmpfile_exists) + { + write_fp = prepare_tmp_write(entry->fname); + entry->tmpfile_exists = true; + } + + /* Flush data from the buffer to the file */ + perform_tmp_write(entry->fname, entry->buf, write_fp); + resetStringInfo(entry->buf); + + /* + * The change in the current segment entry indicates that the reading + * of this file has ended. + */ + if (entry != privateInfo->cur_file && write_fp != NULL) + { + fclose(write_fp); + write_fp = NULL; + } } /* Requested WAL segment not found */ @@ -465,7 +529,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count) } /* - * Create an astreamer that can read WAL from a tar file. + * Set up a temporary directory to temporarily store WAL segments. + */ +static void +setup_tmpwal_dir(const char *waldir) +{ + char *template; + + /* + * Use the directory specified by the TMPDIR environment variable. If it's + * not set, use the provided WAL directory to extract WAL file + * temporarily. + */ + template = psprintf("%s/waldump_tmp-XXXXXX", + getenv("TMPDIR") ? getenv("TMPDIR") : waldir); + TmpWalSegDir = mkdtemp(template); + + if (TmpWalSegDir == NULL) + pg_fatal("could not create directory \"%s\": %m", template); + + canonicalize_path(TmpWalSegDir); + + pg_log_debug("created directory \"%s\"", TmpWalSegDir); +} + +/* + * Remove temporary directory at exit, if any. + */ +static void +cleanup_tmpwal_dir_atexit(void) +{ + rmtree(TmpWalSegDir, true); +} + +/* + * Create an empty placeholder file and return its handle. + */ +static FILE * +prepare_tmp_write(const char *fname) +{ + char fpath[MAXPGPATH]; + FILE *file; + + snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname); + + /* Create an empty placeholder */ + file = fopen(fpath, PG_BINARY_W); + if (file == NULL) + pg_fatal("could not create file \"%s\": %m", fpath); + +#ifndef WIN32 + if (chmod(fpath, pg_file_create_mode)) + pg_fatal("could not set permissions on file \"%s\": %m", + fpath); +#endif + + pg_log_debug("temporarily exporting file \"%s\"", fpath); + + return file; +} + +/* + * Write buffer data to the given file handle. + */ +static void +perform_tmp_write(const char *fname, StringInfo buf, FILE *file) +{ + Assert(file); + + errno = 0; + if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1) + { + /* + * If write didn't set errno, assume problem is no disk space + */ + if (errno == 0) + errno = ENOSPC; + pg_fatal("could not write to file \"%s\": %m", fname); + } +} + +/* + * Create an astreamer that can read WAL from tar file. */ static astreamer * astreamer_waldump_new(XLogDumpPrivate *privateInfo) diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 3cd279caded..6056a9c14e8 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -478,21 +478,46 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, return -1; /* - * If the target page is in a different segment, free the buffer space - * occupied by the previous segment data. Since pg_waldump never requests - * the same WAL bytes twice, moving to a new segment implies the previous - * buffer's data and that segment will not be needed again. + * If the target page is in a different segment, free the buffer and/or + * temporary file disk space occupied by the previous segment's data. + * Since pg_waldump never requests the same WAL bytes twice, moving to a + * new segment implies the previous buffer's data and that segment will + * not be needed again. + * + * Afterward, check for the next required WAL segment's physical existence + * in the temporary directory first before invoking the archive streamer. */ nextSegNo = state->seg.ws_segno; if (!XLByteInSeg(targetPagePtr, nextSegNo, WalSegSz)) { char fname[MAXFNAMELEN]; + if (state->seg.ws_file >= 0) + { + close(state->seg.ws_file); + state->seg.ws_file = -1; + } + XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz); free_archive_wal_entry(fname, private); + + XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz); + state->seg.ws_tli = private->timeline; + state->seg.ws_segno = nextSegNo; + + /* + * If the next segment exists, open it and continue reading from there + */ + XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz); + state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname); } - /* Read the WAL page from the archive streamer */ + /* Continue reading from the open WAL segment, if any */ + if (state->seg.ws_file >= 0) + return WALDumpReadPage(state, targetPagePtr, count, targetPtr, + readBuff); + + /* Otherwise, read the WAL page from the archive streamer */ return read_archive_wal_page(private, targetPagePtr, count, readBuff, WalSegSz); } diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h index e98a20152da..13187802009 100644 --- a/src/bin/pg_waldump/pg_waldump.h +++ b/src/bin/pg_waldump/pg_waldump.h @@ -18,6 +18,9 @@ struct ArchivedWALFile; struct ArchivedWAL_hash; +/* Temporary directory */ +extern char *TmpWalSegDir; + /* Contains the necessary information to drive WAL decoding */ typedef struct XLogDumpPrivate { diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 13567fbdba1..68b0cdd29e5 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -7,6 +7,7 @@ use Cwd; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +use List::Util qw(shuffle); my $tar = $ENV{TAR}; @@ -272,7 +273,7 @@ sub generate_archive } closedir $dh; - @files = sort @files; + @files = shuffle @files; # move into the WAL directory before archiving files my $cwd = getcwd; -- 2.47.1 [application/octet-stream] v11-0007-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 8-v11-0007-pg_verifybackup-Delay-default-WAL-directory-prep.patch) download | inline diff: From a7d540d3dfde29630ed86103175c6293def31128 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Wed, 16 Jul 2025 14:47:43 +0530 Subject: [PATCH v11 7/9] pg_verifybackup: Delay default WAL directory preparation. We are not sure whether to parse WAL from a directory or an archive until the backup format is known. Therefore, we delay preparing the default WAL directory until the point of parsing. This delay is harmless, as the WAL directory is not used elsewhere. --- src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index f9f2d457f2f..ab01c4d003a 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -285,10 +285,6 @@ main(int argc, char **argv) manifest_path = psprintf("%s/backup_manifest", context.backup_directory); - /* By default, look for the WAL in the backup directory, too. */ - if (wal_directory == NULL) - wal_directory = psprintf("%s/pg_wal", context.backup_directory); - /* * Try to read the manifest. We treat any errors encountered while parsing * the manifest as fatal; there doesn't seem to be much point in trying to @@ -368,6 +364,10 @@ main(int argc, char **argv) if (context.format == 'p' && !context.skip_checksums) verify_backup_checksums(&context); + /* By default, look for the WAL in the backup directory, too. */ + if (wal_directory == NULL) + wal_directory = psprintf("%s/pg_wal", context.backup_directory); + /* * Try to parse the required ranges of WAL records, unless we were told * not to do so. -- 2.47.1 [application/octet-stream] v11-0008-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 9-v11-0008-pg_verifybackup-Rename-the-wal-directory-switch-.patch) download | inline diff: From 298182611c3e355cdb85ce48b725a1677096c2c7 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Tue, 25 Nov 2025 17:32:14 +0530 Subject: [PATCH v11 8/9] pg_verifybackup: Rename the wal-directory switch to wal-path With previous patches to pg_waldump can now decode WAL directly from tar files. This means you'll be able to specify a tar archive path instead of a traditional WAL directory. To keep things consistent and more versatile, we should also generalize the input switch for pg_verifybackup. It should accept either a directory or a tar file path that contains WALs. This change will also aligning it with the existing manifest-path switch naming. == NOTE == The corresponding PO files require updating due to this change. --- doc/src/sgml/ref/pg_verifybackup.sgml | 2 +- src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++----------- src/bin/pg_verifybackup/t/007_wal.pl | 4 ++-- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index 61c12975e4a..e9b8bfd51b1 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -261,7 +261,7 @@ PostgreSQL documentation <varlistentry> <term><option>-w <replaceable class="parameter">path</replaceable></option></term> - <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term> + <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term> <listitem> <para> Try to parse WAL files stored in the specified directory, rather than diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index ab01c4d003a..3103d36f1b9 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context, uint8 *buffer); static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory); + char *wal_path); static astreamer *create_archive_verifier(verifier_context *context, char *archive_name, Oid tblspc_oid, @@ -126,7 +126,7 @@ main(int argc, char **argv) {"progress", no_argument, NULL, 'P'}, {"quiet", no_argument, NULL, 'q'}, {"skip-checksums", no_argument, NULL, 's'}, - {"wal-directory", required_argument, NULL, 'w'}, + {"wal-path", required_argument, NULL, 'w'}, {NULL, 0, NULL, 0} }; @@ -135,7 +135,7 @@ main(int argc, char **argv) char *manifest_path = NULL; bool no_parse_wal = false; bool quiet = false; - char *wal_directory = NULL; + char *wal_path = NULL; char *pg_waldump_path = NULL; DIR *dir; @@ -221,8 +221,8 @@ main(int argc, char **argv) context.skip_checksums = true; break; case 'w': - wal_directory = pstrdup(optarg); - canonicalize_path(wal_directory); + wal_path = pstrdup(optarg); + canonicalize_path(wal_path); break; default: /* getopt_long already emitted a complaint */ @@ -365,15 +365,15 @@ main(int argc, char **argv) verify_backup_checksums(&context); /* By default, look for the WAL in the backup directory, too. */ - if (wal_directory == NULL) - wal_directory = psprintf("%s/pg_wal", context.backup_directory); + if (wal_path == NULL) + wal_path = psprintf("%s/pg_wal", context.backup_directory); /* * Try to parse the required ranges of WAL records, unless we were told * not to do so. */ if (!no_parse_wal) - parse_required_wal(&context, pg_waldump_path, wal_directory); + parse_required_wal(&context, pg_waldump_path, wal_path); /* * If everything looks OK, tell the user this, unless we were asked to @@ -1198,7 +1198,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m, */ static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory) + char *wal_path) { manifest_data *manifest = context->manifest; manifest_wal_range *this_wal_range = manifest->first_wal_range; @@ -1208,7 +1208,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path, char *pg_waldump_cmd; pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n", - pg_waldump_path, wal_directory, this_wal_range->tli, + pg_waldump_path, wal_path, this_wal_range->tli, LSN_FORMAT_ARGS(this_wal_range->start_lsn), LSN_FORMAT_ARGS(this_wal_range->end_lsn)); fflush(NULL); @@ -1376,7 +1376,7 @@ usage(void) printf(_(" -P, --progress show progress information\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); printf(_(" -s, --skip-checksums skip checksum verification\n")); - printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n")); + printf(_(" -w, --wal-path=PATH use specified path for WAL files\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT); diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl index 79087a1f6be..8ad2234453d 100644 --- a/src/bin/pg_verifybackup/t/007_wal.pl +++ b/src/bin/pg_verifybackup/t/007_wal.pl @@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ], command_ok( [ 'pg_verifybackup', - '--wal-directory' => $relocated_pg_wal, + '--wal-path' => $relocated_pg_wal, $backup_path ], - '--wal-directory can be used to specify WAL directory'); + '--wal-path can be used to specify WAL directory'); # Move directory back to original location. rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!"; -- 2.47.1 [application/octet-stream] v11-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch (9.9K, 10-v11-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch) download | inline diff: From fc8da7124b665d32849e6eec401728ee58748788 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Tue, 25 Nov 2025 17:34:26 +0530 Subject: [PATCH v11 9/9] pg_verifybackup: enabled WAL parsing for tar-format backup Now that pg_waldump supports decoding from tar archives, we should leverage this functionality to remove the previous restriction on WAL parsing for tar-backed formats. --- doc/src/sgml/ref/pg_verifybackup.sgml | 5 +- src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------ src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -- src/bin/pg_verifybackup/t/003_corruption.pl | 4 +- src/bin/pg_verifybackup/t/008_untar.pl | 5 +- src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +- 6 files changed, 50 insertions(+), 39 deletions(-) diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index e9b8bfd51b1..16b50b5a4df 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -36,10 +36,7 @@ PostgreSQL documentation <literal>backup_manifest</literal> generated by the server at the time of the backup. The backup may be stored either in the "plain" or the "tar" format; this includes tar-format backups compressed with any algorithm - supported by <application>pg_basebackup</application>. However, at present, - <literal>WAL</literal> verification is supported only for plain-format - backups. Therefore, if the backup is stored in tar-format, the - <literal>-n, --no-parse-wal</literal> option should be used. + supported by <application>pg_basebackup</application>. </para> <para> diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 3103d36f1b9..cc492728ae8 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context, const char *fmt,...) pg_attribute_printf(2, 3); -static void verify_tar_backup(verifier_context *context, DIR *dir); +static void verify_tar_backup(verifier_context *context, DIR *dir, + char **base_archive_path, + char **wal_archive_path); static void verify_plain_backup_directory(verifier_context *context, char *relpath, char *fullpath, DIR *dir); @@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath, static void verify_control_file(const char *controlpath, uint64 manifest_system_identifier); static void precheck_tar_backup_file(verifier_context *context, char *relpath, - char *fullpath, SimplePtrList *tarfiles); + char *fullpath, SimplePtrList *tarfiles, + char **base_archive_path, + char **wal_archive_path); static void verify_tar_file(verifier_context *context, char *relpath, char *fullpath, astreamer *streamer); static void report_extra_backup_files(verifier_context *context); @@ -136,6 +140,8 @@ main(int argc, char **argv) bool no_parse_wal = false; bool quiet = false; char *wal_path = NULL; + char *base_archive_path = NULL; + char *wal_archive_path = NULL; char *pg_waldump_path = NULL; DIR *dir; @@ -327,17 +333,6 @@ main(int argc, char **argv) pfree(path); } - /* - * XXX: In the future, we should consider enhancing pg_waldump to read WAL - * files from an archive. - */ - if (!no_parse_wal && context.format == 't') - { - pg_log_error("pg_waldump cannot read tar files"); - pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup."); - exit(1); - } - /* * Perform the appropriate type of verification appropriate based on the * backup format. This will close 'dir'. @@ -346,7 +341,7 @@ main(int argc, char **argv) verify_plain_backup_directory(&context, NULL, context.backup_directory, dir); else - verify_tar_backup(&context, dir); + verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path); /* * The "matched" flag should now be set on every entry in the hash table. @@ -364,9 +359,28 @@ main(int argc, char **argv) if (context.format == 'p' && !context.skip_checksums) verify_backup_checksums(&context); - /* By default, look for the WAL in the backup directory, too. */ + /* + * By default, WAL files are expected to be found in the backup directory + * for plain-format backups. In the case of tar-format backups, if a + * separate WAL archive is not found, the WAL files are most likely + * included within the main data directory archive. + */ if (wal_path == NULL) - wal_path = psprintf("%s/pg_wal", context.backup_directory); + { + if (context.format == 'p') + wal_path = psprintf("%s/pg_wal", context.backup_directory); + else if (wal_archive_path) + wal_path = wal_archive_path; + else if (base_archive_path) + wal_path = base_archive_path; + else + { + pg_log_error("WAL archive not found"); + pg_log_error_hint("Specify the correct path using the option -w/--wal-path. " + "Or you must use -n/--no-parse-wal when verifying a tar-format backup."); + exit(1); + } + } /* * Try to parse the required ranges of WAL records, unless we were told @@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier) * close when we're done with it. */ static void -verify_tar_backup(verifier_context *context, DIR *dir) +verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path, + char **wal_archive_path) { struct dirent *dirent; SimplePtrList tarfiles = {NULL, NULL}; @@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir) char *fullpath; fullpath = psprintf("%s/%s", context->backup_directory, filename); - precheck_tar_backup_file(context, filename, fullpath, &tarfiles); + precheck_tar_backup_file(context, filename, fullpath, &tarfiles, + base_archive_path, wal_archive_path); pfree(fullpath); } } @@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir) * * The arguments to this function are mostly the same as the * verify_plain_backup_file. The additional argument outputs a list of valid - * tar files. + * tar files, along with the full paths to the main archive and the WAL + * directory archive. */ static void precheck_tar_backup_file(verifier_context *context, char *relpath, - char *fullpath, SimplePtrList *tarfiles) + char *fullpath, SimplePtrList *tarfiles, + char **base_archive_path, char **wal_archive_path) { struct stat sb; Oid tblspc_oid = InvalidOid; @@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath, * extension such as .gz, .lz4, or .zst. */ if (strncmp("base", relpath, 4) == 0) + { suffix = relpath + 4; + + *base_archive_path = pstrdup(fullpath); + } else if (strncmp("pg_wal", relpath, 6) == 0) + { suffix = relpath + 6; + + *wal_archive_path = pstrdup(fullpath); + } else { /* Expected a <tablespaceoid>.tar file here. */ diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl index 0556191ec9d..edc515d5904 100644 --- a/src/bin/pg_verifybackup/t/002_algorithm.pl +++ b/src/bin/pg_verifybackup/t/002_algorithm.pl @@ -30,10 +30,6 @@ sub test_checksums { # Add switch to get a tar-format backup push @backup, ('--format' => 'tar'); - - # Add switch to skip WAL verification, which is not yet supported for - # tar-format backups - push @verify, ('--no-parse-wal'); } # A backup with a bogus algorithm should fail. diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl index b1d65b8aa0f..882d75d9dc2 100644 --- a/src/bin/pg_verifybackup/t/003_corruption.pl +++ b/src/bin/pg_verifybackup/t/003_corruption.pl @@ -193,10 +193,8 @@ for my $scenario (@scenario) command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]); chdir($cwd) || die "chdir: $!"; - # Now check that the backup no longer verifies. We must use -n - # here, because pg_waldump can't yet read WAL from a tarfile. command_fails_like( - [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ], + [ 'pg_verifybackup', $tar_backup_path ], $scenario->{'fails_like'}, "corrupt backup fails verification: $name"); diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl index ae67ae85a31..161c08c190d 100644 --- a/src/bin/pg_verifybackup/t/008_untar.pl +++ b/src/bin/pg_verifybackup/t/008_untar.pl @@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql( SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1')); my $backup_path = $primary->backup_dir . '/server-backup'; -my $extract_path = $primary->backup_dir . '/extracted-backup'; my @test_configuration = ( { @@ -123,14 +122,12 @@ for my $tc (@test_configuration) # Verify tar backup. $primary->command_ok( [ - 'pg_verifybackup', '--no-parse-wal', - '--exit-on-error', $backup_path, + 'pg_verifybackup', '--exit-on-error', $backup_path, ], "verify backup, compression $method"); # Cleanup. rmtree($backup_path); - rmtree($extract_path); } } diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl index 1ac7b5db75a..9670fbe4fda 100644 --- a/src/bin/pg_verifybackup/t/010_client_untar.pl +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl @@ -32,7 +32,6 @@ print $jf $junk_data; close $jf; my $backup_path = $primary->backup_dir . '/client-backup'; -my $extract_path = $primary->backup_dir . '/extracted-backup'; my @test_configuration = ( { @@ -137,13 +136,11 @@ for my $tc (@test_configuration) # Verify tar backup. $primary->command_ok( [ - 'pg_verifybackup', '--no-parse-wal', - '--exit-on-error', $backup_path, + 'pg_verifybackup', '--exit-on-error', $backup_path, ], "verify backup, compression $method"); # Cleanup. - rmtree($extract_path); rmtree($backup_path); } } -- 2.47.1 ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-01-27 21:11 Robert Haas <[email protected]> parent: Amul Sul <[email protected]> 1 sibling, 1 reply; 8+ messages in thread From: Robert Haas @ 2026-01-27 21:11 UTC (permalink / raw) To: Amul Sul <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote: > In the attached version, I am using the WAL segment name as the hash > key, which is much more straightforward. I have rewritten > read_archive_wal_page(), and it looks much cleaner than before. The > logic to discard irrelevant WAL files is still within > get_archive_wal_entry. I added an explanation for setting cur_wal to > NULL, which is now handled in the separate function I mentioned > previously. > > Kindly have a look at the attached version; let me know if you are > still not happy with the current approach for filtering/discarding > irrelevant WAL segments. It isn't much different from the previous > version, but I have tried to keep it in a separate routine for better > code readability, with comments to make it easier to understand. I > also added a comment for ArchivedWALFile. I feel like the division of labor between get_archive_wal_entry() and read_archive_wal_page() is odd. I noticed this in the last version, too, and it still seems to be the case. get_archive_wal_entry() first calls ArchivedWAL_lookup(). If that finds an entry, it just returns. If it doesn't, it loops until an entry for the requested file shows up and then returns it. Then control returns to read_archive_wal_page() which loops some more until we have all the data we need for the requested file. But it seems odd to me to have two separate loops here. I think that the first loop is going to call read_archive_file() until we find the beginning of the file that we care about and then the second one is going to call read_archive_file() some more until we have read enough of it to satisfy the request. It feels odd to me to do it that way, as if we told somebody to first wait until 9 o'clock and then wait another 30 minutes, instead of just telling them to wait until 9:30. I realize it's not quite the same thing, because apart from calling read_archive_file(), the two loops do different things, but I still think it looks odd. + /* + * Ignore if the timeline is different or the current segment is not + * the desired one. + */ + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); + if (privateInfo->timeline != curSegTimeline || + privateInfo->startSegNo > curSegNo || + privateInfo->endSegNo < curSegNo || + segno > curSegNo) + { + free_archive_wal_entry(entry->fname, privateInfo); + continue; + } The comment doesn't match the code. If it did, the test would be (privateInfo->timeline != curSegTimeline || segno != curSegno). But instead the segno test is > rather than !=, and the checks against startSegNo and endSegNo aren't explained at all. I think I understand why the segno test uses > rather than !=, but it's the point of the comment to explain things like that, rather than leaving the reader to guess. And I don't know why we also need to test startSegNo and endSegNo. I also wonder what the point is of doing XLogFromFileName() on the fname provided by the caller and then again on entry->fname. Couldn't you just compare the strings? Again, the division of labor is really odd here. It's the job of astreamer_waldump_content() to skip things that aren't WAL files at all, but it's the job of get_archive_wal_entry() to skip things that are WAL files but not the one we want. I disagree with putting those checks in completely separate parts of the code. +static bool +member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member, + char **fname) +{ + int pathlen; + char *filename; + + /* We are only interested in normal files. */ + if (member->is_directory || member->is_link) + return false; + + pathlen = strlen(member->pathname); + if (pathlen < XLOG_FNAME_LEN) + return false; + + /* WAL files from the top-level or pg_wal directory will be decoded */ + if (pathlen > XLOG_FNAME_LEN && + strncmp(member->pathname, XLOGDIR, strlen(XLOGDIR)) != 0) + return false; + + /* WAL file could be with full path */ + filename = member->pathname + (pathlen - XLOG_FNAME_LEN); + if (!IsXLogFileName(filename)) + return false; + + *fname = pnstrdup(filename, XLOG_FNAME_LEN); + + return true; +} I don't think this code is fully correct. Note that pg_verifybackup's member_verify_header() prepends "./" and then applies canonicalize_path(). I think the same considerations apply here. Consider: [rhaas pgdata]$ (cd pg_wal; tar -cf ../pg_wal1.tar 00*) [rhaas pgdata]$ (cd pg_wal; tar -cf ../pg_wal2.tar .) [rhaas pgdata]$ tar tf pg_wal1.tar 00000001000000000000002C 00000001000000000000002D 00000001000000000000002E 00000001000000000000002F 000000010000000000000030 000000010000000000000031 000000010000000000000032 000000010000000000000033 [rhaas pgdata]$ tar tf pg_wal2.tar ./ ./00000001000000000000002C ./00000001000000000000002D ./00000001000000000000002E ./archive_status/ ./000000010000000000000031 ./000000010000000000000030 ./summaries/ ./00000001000000000000002F ./000000010000000000000032 ./000000010000000000000033 -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-01-28 03:01 Euler Taveira <[email protected]> parent: Amul Sul <[email protected]> 1 sibling, 2 replies; 8+ messages in thread From: Euler Taveira @ 2026-01-28 03:01 UTC (permalink / raw) To: Amul Sul <[email protected]>; Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Jan 27, 2026, at 9:06 AM, Amul Sul wrote: > > I have included your patch (removing WalSegSz) to the attached patch > set as 0001; all my subsequent patches are now bumped by one. > I started looking at this patch. This is not a complete review. 0002. LGTM. 0003. If this is a refactor, you should move the unlikely logic (if necessary) to 0005. 0004. You mentioned in the commit message that this patch should be merged into 0005. Reviewing it is a bit hard if you just looked at the patch. However, it is clear if you use 'git show -w $HASH0004'. I would expect that this patch contains +my $tar = $ENV{TAR}; + ... + skip "tar command is not available", 3 + if !defined $tar; because you add the SKIP. As a refactor I expect it to work independently. 0005. + pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")", + argv[optind]); Which command-line arguments? This message isn't clear. How about using "option %s cannot be used together with tar archive". - if (waldir != NULL) + if (walpath != NULL) { + /* validate path points to tar archive */ + if (is_archive_file(walpath, &compression)) + { + char *fname = NULL; + + split_path(walpath, &waldir, &fname); + + private.archive_name = fname; + is_archive = true; + } /* validate path points to directory */ - if (!verify_directory(waldir)) + else if (!verify_directory(walpath)) Maybe I'm confused about the fact that the function name is is_archive_file() and there is a variable called is_archive. Couldn't you test private.archive_name != NULL instead of using is_archive variable? - if (!verify_directory(waldir)) + else if (!verify_directory(walpath)) { pg_log_error("could not open directory \"%s\": %m", waldir); goto bad_argument; This is not a problem of this patch but I just want to point out that it should be pg_fatal() just like similar code path a few lines below. + /* set segno to endsegno for check of --end */ + segno = endsegno; + } + + + if (!XLByteInSeg(private.endptr, segno, WalSegSz) && + private.endptr != (segno + 1) * WalSegSz) 2 blank lines. Remove one. + * archive_waldump.c + * A generic facility for reading WAL data from tar archives via archive + * streamer. The other tools (pg_basebackup and pg_verifybackup) that also use astreamer API named this similar file as astreamer_SOMETHING.c. It seems a good idea to follow the same pattern, no? Maybe astreamer_tar_archive.c or astreamer_archive.c. +/* Hash entry structure for holding WAL segment data read from the archive */ +typedef struct ArchivedWALEntry +{ + uint32 status; /* hash status */ + XLogSegNo segno; /* hash key: WAL segment number */ + TimeLineID timeline; /* timeline of this wal file */ + + StringInfoData buf; + bool tmpseg_exists; /* spill file exists? */ + + int total_read; /* total read of archived WAL segment */ +} ArchivedWALEntry; s/wal file/WAL file/ What buf is for? It is the only member that doesn't have a description. I think total_read gives the impression that you've read the file and that's the number of bytes it got. That's not true; it represents the accumulative length. Maybe read_len is a good candidate. +bool +is_archive_file(const char *fname, pg_compress_algorithm *compression) +{ + int fname_len = strlen(fname); + pg_compress_algorithm compress_algo; + Why do you need an extra variable here? I think compress_algo is better than compression. Besides that, it is a good opportunity to move this function to a common file (common/compression.c?) and use it in other places like CreateBackupStreamer() -- pg_basebackup. + /* Before that we must parse the tar archive. */ + streamer = astreamer_tar_parser_new(streamer); + + /* Before that we must decompress, if archive is compressed. */ + if (compression == PG_COMPRESSION_GZIP) It reads better if you suppress 'Before that we must'. (I think you want to say 'After' instead of 'Before'.) + /* Hash table storing WAL entries read from the archive */ + ArchivedWAL_HTAB = ArchivedWAL_create(16, NULL); Any reason for 16? Comment says nothing about it. + if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0) + pg_fatal("could not find WAL in \"%s\" archive", + privateInfo->archive_name); Could we have a better message here? The read_archive_file() is already testing the error cases and you are testing the EOF case. I would use 'archive \"%s\"' or even 'archive file \"%s\". + /* Needed WAL yet to be decoded from archive, do the same */ + while (1) + { + entry = privateInfo->cur_wal; This comment is not clear. Could you rephrase it? + /* + * XXX: If the segment being read not the requested one, the data must + * be buffered, as we currently lack the mechanism to write it to a + * temporary file. This is a known limitation that will be fixed in the + * next patch, as the buffer could grow up to the full WAL segment + * size. + */ + if (segno > entry->segno) + continue; + + /* WAL segments must be archived in order */ + pg_log_error("WAL files are not archived in sequential order"); + pg_log_error_detail("Expecting segment number " UINT64_FORMAT " but found " UINT64_FORMAT ".", + segno, entry->segno); Can it enforce a specific order? tar follows an arbitrary order in which the files is returned by the filesystem. You've been debating a solution to buffer the WAL contents using memory or spilled files. If it always create the tar in an alphabetical order, you can reduce the scope of this patch. (Didn't look what challenges are expected to use a sorted list to generate the tar file.) - dependencies: [frontend_code, lz4, zstd], + dependencies: [frontend_code, lz4, zstd, libpq], Put libpq after frontend_code. + + /* Fields required to read WAL from archive */ + char *archive_name; /* Tar archive name */ + int archive_fd; /* File descriptor for the open tar file */ + + astreamer *archive_streamer; + + /* What the archive streamer is currently reading */ + struct ArchivedWALEntry *cur_wal; + + /* + * Although these values can be easily derived from startptr and endptr, + * doing so repeatedly for each archived member would be inefficient, as + * it would involve recalculating and filtering out irrelevant WAL + * segments. + */ + XLogSegNo startSegNo; + XLogSegNo endSegNo; It is a matter of style but consistency is good. Per current style snake case is preferred instead of CamelCase for these last members. - @lines = test_pg_waldump('--path' => $path); + my @lines; + @lines = test_pg_waldump($path); my @lines = test_pg_waldump($path); +/* Forward declaration */ +struct ArchivedWALFile; Why don't you move ArchivedWALFile to pg_waldump.h? pg_waldump.h is included into archive_waldump.c. 0006. As I said before I would avoid increasing the size of this patch if an ordered tar file is viable. Didn't look in deep this patch but I have a few suggestions: * I don't like tmpfile_exists as a member name. A suggestion is 'spilled'. +#ifndef WIN32 + if (chmod(fpath, pg_file_create_mode)) + pg_fatal("could not set permissions on file \"%s\": %m", + fpath); +#endif s/on/of/ My suggestion is to use the same sentence in initdb. could not change permissions of \"%s\": %m + pg_log_debug("temporarily exporting file \"%s\"", fpath); spilling to temporary file \"%s\" 0007. LGTM. 0008. I'm concerned about this patch. It is breaking backward compatibility if you are using a long option (--wal-directory). Your proposal is a generic word that represents both cases (file and directory). I agree. However, I wouldn't remove --wal-directory from the tool. Instead, I would keep it with the same short option ('w') but add a sentence saying this long option is deprecated and will be removed in the future or even remove any traces of this long option from the help and documentation but silently accept the old long option. I prefer the latter because it is not a required argument so a deprecation warning is not necessary IMO. 0009. LGTM. -- Euler Taveira EDB https://www.enterprisedb.com/ ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-01-28 13:03 Robert Haas <[email protected]> parent: Euler Taveira <[email protected]> 1 sibling, 0 replies; 8+ messages in thread From: Robert Haas @ 2026-01-28 13:03 UTC (permalink / raw) To: Euler Taveira <[email protected]>; +Cc: Amul Sul <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Tue, Jan 27, 2026 at 10:02 PM Euler Taveira <[email protected]> wrote: > + * archive_waldump.c > + * A generic facility for reading WAL data from tar archives via archive > + * streamer. > > The other tools (pg_basebackup and pg_verifybackup) that also use astreamer API > named this similar file as astreamer_SOMETHING.c. It seems a good idea to > follow the same pattern, no? Maybe astreamer_tar_archive.c or > astreamer_archive.c. There shouldn't be anything specific to tar files in here, and astreamer_archive would be meaningless, since the "a" in "astreamer" stands for archive. What this file is is an archive streamer specific to pg_waldump, hence the name. > Can it enforce a specific order? tar follows an arbitrary order in which the > files is returned by the filesystem. You've been debating a solution to buffer > the WAL contents using memory or spilled files. If it always create the tar in > an alphabetical order, you can reduce the scope of this patch. (Didn't look > what challenges are expected to use a sorted list to generate the tar file.) It's posible to create a tar file in a specific order by specifying command-line arguments to tar in the order you want the tar file to be built. But I think the real thing here is that this limitation is lifted by the following patch. Whether it's worth splitting it apart into two patches this way is debatable. As I have pointed out in my previous reviews, the split hasn't been done very cleanly. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-02-04 13:09 Amul Sul <[email protected]> parent: Robert Haas <[email protected]> 0 siblings, 0 replies; 8+ messages in thread From: Amul Sul @ 2026-02-04 13:09 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote: > > On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote: > > In the attached version, I am using the WAL segment name as the hash > > key, which is much more straightforward. I have rewritten > > read_archive_wal_page(), and it looks much cleaner than before. The > > logic to discard irrelevant WAL files is still within > > get_archive_wal_entry. I added an explanation for setting cur_wal to > > NULL, which is now handled in the separate function I mentioned > > previously. > > > > Kindly have a look at the attached version; let me know if you are > > still not happy with the current approach for filtering/discarding > > irrelevant WAL segments. It isn't much different from the previous > > version, but I have tried to keep it in a separate routine for better > > code readability, with comments to make it easier to understand. I > > also added a comment for ArchivedWALFile. > > I feel like the division of labor between get_archive_wal_entry() and > read_archive_wal_page() is odd. I noticed this in the last version, > too, and it still seems to be the case. get_archive_wal_entry() first > calls ArchivedWAL_lookup(). If that finds an entry, it just returns. > If it doesn't, it loops until an entry for the requested file shows up > and then returns it. Then control returns to read_archive_wal_page() > which loops some more until we have all the data we need for the > requested file. But it seems odd to me to have two separate loops > here. I think that the first loop is going to call read_archive_file() > until we find the beginning of the file that we care about and then > the second one is going to call read_archive_file() some more until we > have read enough of it to satisfy the request. It feels odd to me to > do it that way, as if we told somebody to first wait until 9 o'clock > and then wait another 30 minutes, instead of just telling them to wait > until 9:30. I realize it's not quite the same thing, because apart > from calling read_archive_file(), the two loops do different things, > but I still think it looks odd. > > + /* > + * Ignore if the timeline is different or the current segment is not > + * the desired one. > + */ > + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); > + if (privateInfo->timeline != curSegTimeline || > + privateInfo->startSegNo > curSegNo || > + privateInfo->endSegNo < curSegNo || > + segno > curSegNo) > + { > + free_archive_wal_entry(entry->fname, privateInfo); > + continue; > + } > > The comment doesn't match the code. If it did, the test would be > (privateInfo->timeline != curSegTimeline || segno != curSegno). But > instead the segno test is > rather than !=, and the checks against > startSegNo and endSegNo aren't explained at all. I think I understand > why the segno test uses > rather than !=, but it's the point of the > comment to explain things like that, rather than leaving the reader to > guess. And I don't know why we also need to test startSegNo and > endSegNo. > > I also wonder what the point is of doing XLogFromFileName() on the > fname provided by the caller and then again on entry->fname. Couldn't > you just compare the strings? > > Again, the division of labor is really odd here. It's the job of > astreamer_waldump_content() to skip things that aren't WAL files at > all, but it's the job of get_archive_wal_entry() to skip things that > are WAL files but not the one we want. I disagree with putting those > checks in completely separate parts of the code. > Keeping the timeline and segment start-end range checks inside the archive streamer creates a circular dependency that cannot be resolved without a 'dirty hack'. We must read the first available WAL file page to determine the wal_segment_size before it can calculate the target segment range. Moving the checks inside the streamer would make it impossible to process that initial file, as the necessary filtering parameters -- would still be unknown which would need to be skipped for the first read somehow. What if later we realized that the first WAL file which was allowed to be streamed by skipping that check is irrelevant and doesn't fall under the start-end segment range? > +static bool > +member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member, > + char **fname) > +{ > + int pathlen; > + char *filename; > + > + /* We are only interested in normal files. */ > + if (member->is_directory || member->is_link) > + return false; > + > + pathlen = strlen(member->pathname); > + if (pathlen < XLOG_FNAME_LEN) > + return false; > + > + /* WAL files from the top-level or pg_wal directory will be decoded */ > + if (pathlen > XLOG_FNAME_LEN && > + strncmp(member->pathname, XLOGDIR, strlen(XLOGDIR)) != 0) > + return false; > + > + /* WAL file could be with full path */ > + filename = member->pathname + (pathlen - XLOG_FNAME_LEN); > + if (!IsXLogFileName(filename)) > + return false; > + > + *fname = pnstrdup(filename, XLOG_FNAME_LEN); > + > + return true; > +} > > I don't think this code is fully correct. Note that pg_verifybackup's > member_verify_header() prepends "./" and then applies > canonicalize_path(). I think the same considerations apply here. > Consider: > Understood, will fix that. Regards, Amul ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-02-04 13:52 Amul Sul <[email protected]> parent: Euler Taveira <[email protected]> 1 sibling, 1 reply; 8+ messages in thread From: Amul Sul @ 2026-02-04 13:52 UTC (permalink / raw) To: Euler Taveira <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, Jan 28, 2026 at 8:32 AM Euler Taveira <[email protected]> wrote: > > On Tue, Jan 27, 2026, at 9:06 AM, Amul Sul wrote: > > Hi Euler, Thanks for looking into this, and apologies for the delayed response due to some unavoidable circumstances. The updated version of the patch could take a bit more time, as I am trying to fix and improve the code based on the concerns Robert raised in his previous post. > > I have included your patch (removing WalSegSz) to the attached patch > > set as 0001; all my subsequent patches are now bumped by one. > > > 0003. If this is a refactor, you should move the unlikely logic (if necessary) > to 0005. > Yes, that can be done. I have kept them separate for now to make the review easier. I can merge them once the patch is ready for the committer stage, unless the committer prefers to keep them as separate commit. > 0004. You mentioned in the commit message that this patch should be merged into > 0005. Reviewing it is a bit hard if you just looked at the patch. However, it > is clear if you use 'git show -w $HASH0004'. I would expect that this patch > contains > > +my $tar = $ENV{TAR}; > + > > ... > > + skip "tar command is not available", 3 > + if !defined $tar; > > because you add the SKIP. As a refactor I expect it to work independently. > Okay, will do it in the next updated version. > 0005. > > + pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")", > + argv[optind]); > > Which command-line arguments? This message isn't clear. How about using "option > %s cannot be used together with tar archive". > In the error message, we simply print only the first one -- we do have similar error messages that use "command-line arguments". > - if (waldir != NULL) > + if (walpath != NULL) > { > + /* validate path points to tar archive */ > + if (is_archive_file(walpath, &compression)) > + { > + char *fname = NULL; > + > + split_path(walpath, &waldir, &fname); > + > + private.archive_name = fname; > + is_archive = true; > + } > /* validate path points to directory */ > - if (!verify_directory(waldir)) > + else if (!verify_directory(walpath)) > > Maybe I'm confused about the fact that the function name is is_archive_file() > and there is a variable called is_archive. Couldn't you test > private.archive_name != NULL instead of using is_archive variable? > Yes, can do that. I hope that doesn't confuse anyone. > - if (!verify_directory(waldir)) > + else if (!verify_directory(walpath)) > { > pg_log_error("could not open directory \"%s\": %m", waldir); > goto bad_argument; > > This is not a problem of this patch but I just want to point out that it should > be pg_fatal() just like similar code path a few lines below. > > + /* set segno to endsegno for check of --end */ > + segno = endsegno; > + } > + > + > + if (!XLByteInSeg(private.endptr, segno, WalSegSz) && > + private.endptr != (segno + 1) * WalSegSz) > > 2 blank lines. Remove one. > That's pre-existing; I'm not sure if I should change this. > + * archive_waldump.c > + * A generic facility for reading WAL data from tar archives via archive > + * streamer. > > The other tools (pg_basebackup and pg_verifybackup) that also use astreamer API > named this similar file as astreamer_SOMETHING.c. It seems a good idea to > follow the same pattern, no? Maybe astreamer_tar_archive.c or > astreamer_archive.c. > Robert responded to this -- thanks, Robert. > +/* Hash entry structure for holding WAL segment data read from the archive */ > +typedef struct ArchivedWALEntry > +{ > + uint32 status; /* hash status */ > + XLogSegNo segno; /* hash key: WAL segment number */ > + TimeLineID timeline; /* timeline of this wal file */ > + > + StringInfoData buf; > + bool tmpseg_exists; /* spill file exists? */ > + > + int total_read; /* total read of archived WAL segment */ > +} ArchivedWALEntry; > > s/wal file/WAL file/ > Okay. > What buf is for? It is the only member that doesn't have a description. I think > total_read gives the impression that you've read the file and that's the number > of bytes it got. That's not true; it represents the accumulative length. Maybe > read_len is a good candidate. > Okay. > +bool > +is_archive_file(const char *fname, pg_compress_algorithm *compression) > +{ > + int fname_len = strlen(fname); > + pg_compress_algorithm compress_algo; > + > > Why do you need an extra variable here? I think compress_algo is better than > compression. Well, I follow the practice of updating the output variable at the end to avoid any garbage assignment once all operations are successful. If there are strong objections to this, I am happy to change it. > Besides that, it is a good opportunity to move this function to a > common file (common/compression.c?) and use it in other places like > CreateBackupStreamer() -- pg_basebackup. > Yes, it would be a good idea to make it common, let me think about this. > + /* :. */ > + streamer = astreamer_tar_parser_new(streamer); > + > + /* Before that we must decompress, if archive is compressed. */ > + if (compression == PG_COMPRESSION_GZIP) > > It reads better if you suppress 'Before that we must'. (I think you want to say > 'After' instead of 'Before'.) > I meant to say 'Before'. This is the archive streamer stack, so the latter one is executed before it. > + /* Hash table storing WAL entries read from the archive */ > + ArchivedWAL_HTAB = ArchivedWAL_create(16, NULL); > > Any reason for 16? Comment says nothing about it. > It's an arbitrary choice; I will update the comment. > + if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0) > + pg_fatal("could not find WAL in \"%s\" archive", > + privateInfo->archive_name); > > Could we have a better message here? The read_archive_file() is already testing > the error cases and you are testing the EOF case. I would use 'archive \"%s\"' > or even 'archive file \"%s\". > Okay. > + /* Needed WAL yet to be decoded from archive, do the same */ > + while (1) > + { > + entry = privateInfo->cur_wal; > > This comment is not clear. Could you rephrase it? > Okay. > - dependencies: [frontend_code, lz4, zstd], > + dependencies: [frontend_code, lz4, zstd, libpq], > > Put libpq after frontend_code. > Okay. > + > + /* Fields required to read WAL from archive */ > + char *archive_name; /* Tar archive name */ > + int archive_fd; /* File descriptor for the open tar file */ > + > + astreamer *archive_streamer; > + > + /* What the archive streamer is currently reading */ > + struct ArchivedWALEntry *cur_wal; > + > + /* > + * Although these values can be easily derived from startptr and endptr, > + * doing so repeatedly for each archived member would be inefficient, as > + * it would involve recalculating and filtering out irrelevant WAL > + * segments. > + */ > + XLogSegNo startSegNo; > + XLogSegNo endSegNo; > > It is a matter of style but consistency is good. Per current style snake case > is preferred instead of CamelCase for these last members. > Okay. > - @lines = test_pg_waldump('--path' => $path); > + my @lines; > + @lines = test_pg_waldump($path); > > my @lines = test_pg_waldump($path); > Okay. > +/* Forward declaration */ > +struct ArchivedWALFile; > > Why don't you move ArchivedWALFile to pg_waldump.h? pg_waldump.h is included > into archive_waldump.c. > The full structure is needed in archive_waldump.c only, so I added a forward declaration to declare the pointer inside XLogDumpPrivate. I believe we follow such practices elsewhere in the codebase as well. > 0006. As I said before I would avoid increasing the size of this patch if an > ordered tar file is viable. Didn't look in deep this patch but I have a few > suggestions: > > * I don't like tmpfile_exists as a member name. A suggestion is 'spilled'. > Okay. > +#ifndef WIN32 > + if (chmod(fpath, pg_file_create_mode)) > + pg_fatal("could not set permissions on file \"%s\": %m", > + fpath); > +#endif > > s/on/of/ > My suggestion is to use the same sentence in initdb. > I think 'on' seems to be much more relevant than 'of,' and we do have 'could not set permissions on' error messages in other places as wel > could not change permissions of \"%s\": %m > > + pg_log_debug("temporarily exporting file \"%s\"", fpath); > > spilling to temporary file \"%s\" > Okay. > 0008. I'm concerned about this patch. It is breaking backward compatibility if > you are using a long option (--wal-directory). Your proposal is a generic word > that represents both cases (file and directory). I agree. However, I wouldn't > remove --wal-directory from the tool. Instead, I would keep it with the same > short option ('w') but add a sentence saying this long option is deprecated and > will be removed in the future or even remove any traces of this long option > from the help and documentation but silently accept the old long option. I > prefer the latter because it is not a required argument so a deprecation > warning is not necessary IMO. > Yeah, that was discussed with Robert offline and we believe that it is better to make it more generalized; since we can now use the same option to accept both wal-directory and wal-archived. pg_waldump has much more generic options for the same, such as -- path=PATH. Regards, Amul ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile @ 2026-02-07 13:04 Robert Haas <[email protected]> parent: Amul Sul <[email protected]> 0 siblings, 0 replies; 8+ messages in thread From: Robert Haas @ 2026-02-07 13:04 UTC (permalink / raw) To: Amul Sul <[email protected]>; +Cc: Euler Taveira <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]> On Wed, Feb 4, 2026 at 8:53 AM Amul Sul <[email protected]> wrote: > > 0008. I'm concerned about this patch. It is breaking backward compatibility if > > you are using a long option (--wal-directory). Your proposal is a generic word > > that represents both cases (file and directory). I agree. However, I wouldn't > > remove --wal-directory from the tool. Instead, I would keep it with the same > > short option ('w') but add a sentence saying this long option is deprecated and > > will be removed in the future or even remove any traces of this long option > > from the help and documentation but silently accept the old long option. I > > prefer the latter because it is not a required argument so a deprecation > > warning is not necessary IMO. > > Yeah, that was discussed with Robert offline and we believe that it is > better to make it more generalized; since we can now use the same > option to accept both wal-directory and wal-archived. pg_waldump has > much more generic options for the same, such as -- path=PATH. Of course, the fact that we discussed it doesn't mean that the issue is completely settled. However, I don't think there would be general support from other people on the project for the idea of getting rid of the long option entirely, or even just the documentation for it. We have long options for almost all short options these days, and I agree with that as a general practice. Sometimes we have ONLY a long option, but we very rarely have ONLY a short option, which is good because sometimes we have tools with too many options for it to be viable to give everything a short option, and the long options really help to make things more self-documenting. What I would consider a more viable option is to not do the rename and leave this as --wal-directory even though the argument could really be a directory or a file. That would avoid the backward compatibility break that is troubling Euler. And, you could argue that a tar file is enough like a directory that it won't really cause much confusion. Personally, I favor renaming it. I think the number of people using pg_waldump is fairly small, and the number of those people who are using the long form of the option is very small. Hence, I don't believe think the rename will inconvenience many users, and I think it will improve clarity for future users. But, Euler (or someone else) might take the opposite viewpoint. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2026-02-07 13:04 UTC | newest] Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-01-26 17:22 Re: pg_waldump: support decoding of WAL inside tarfile Robert Haas <[email protected]> 2026-01-27 12:06 ` Amul Sul <[email protected]> 2026-01-27 21:11 ` Robert Haas <[email protected]> 2026-02-04 13:09 ` Amul Sul <[email protected]> 2026-01-28 03:01 ` Euler Taveira <[email protected]> 2026-01-28 13:03 ` Robert Haas <[email protected]> 2026-02-04 13:52 ` Amul Sul <[email protected]> 2026-02-07 13:04 ` Robert Haas <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox