public inbox for [email protected]
help / color / mirror / Atom feedRe: pg_waldump: support decoding of WAL inside tarfile
29+ messages / 7 participants
[nested] [flat]
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-02-10 09:36 Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-02-10 09:36 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:
>
> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:
> >
> > On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:
> > > In the attached version, I am using the WAL segment name as the hash
> > > key, which is much more straightforward. I have rewritten
> > > read_archive_wal_page(), and it looks much cleaner than before. The
> > > logic to discard irrelevant WAL files is still within
> > > get_archive_wal_entry. I added an explanation for setting cur_wal to
> > > NULL, which is now handled in the separate function I mentioned
> > > previously.
> > >
> > > Kindly have a look at the attached version; let me know if you are
> > > still not happy with the current approach for filtering/discarding
> > > irrelevant WAL segments. It isn't much different from the previous
> > > version, but I have tried to keep it in a separate routine for better
> > > code readability, with comments to make it easier to understand. I
> > > also added a comment for ArchivedWALFile.
> >
> > I feel like the division of labor between get_archive_wal_entry() and
> > read_archive_wal_page() is odd. I noticed this in the last version,
> > too, and it still seems to be the case. get_archive_wal_entry() first
> > calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> > If it doesn't, it loops until an entry for the requested file shows up
> > and then returns it. Then control returns to read_archive_wal_page()
> > which loops some more until we have all the data we need for the
> > requested file. But it seems odd to me to have two separate loops
> > here. I think that the first loop is going to call read_archive_file()
> > until we find the beginning of the file that we care about and then
> > the second one is going to call read_archive_file() some more until we
> > have read enough of it to satisfy the request. It feels odd to me to
> > do it that way, as if we told somebody to first wait until 9 o'clock
> > and then wait another 30 minutes, instead of just telling them to wait
> > until 9:30. I realize it's not quite the same thing, because apart
> > from calling read_archive_file(), the two loops do different things,
> > but I still think it looks odd.
> >
> > + /*
> > + * Ignore if the timeline is different or the current segment is not
> > + * the desired one.
> > + */
> > + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> > + if (privateInfo->timeline != curSegTimeline ||
> > + privateInfo->startSegNo > curSegNo ||
> > + privateInfo->endSegNo < curSegNo ||
> > + segno > curSegNo)
> > + {
> > + free_archive_wal_entry(entry->fname, privateInfo);
> > + continue;
> > + }
> >
> > The comment doesn't match the code. If it did, the test would be
> > (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> > instead the segno test is > rather than !=, and the checks against
> > startSegNo and endSegNo aren't explained at all. I think I understand
> > why the segno test uses > rather than !=, but it's the point of the
> > comment to explain things like that, rather than leaving the reader to
> > guess. And I don't know why we also need to test startSegNo and
> > endSegNo.
> >
> > I also wonder what the point is of doing XLogFromFileName() on the
> > fname provided by the caller and then again on entry->fname. Couldn't
> > you just compare the strings?
> >
> > Again, the division of labor is really odd here. It's the job of
> > astreamer_waldump_content() to skip things that aren't WAL files at
> > all, but it's the job of get_archive_wal_entry() to skip things that
> > are WAL files but not the one we want. I disagree with putting those
> > checks in completely separate parts of the code.
> >
>
> Keeping the timeline and segment start-end range checks inside the
> archive streamer creates a circular dependency that cannot be resolved
> without a 'dirty hack'. We must read the first available WAL file page
> to determine the wal_segment_size before it can calculate the target
> segment range. Moving the checks inside the streamer would make it
> impossible to process that initial file, as the necessary filtering
> parameters -- would still be unknown which would need to be skipped
> for the first read somehow. What if later we realized that the first
> WAL file which was allowed to be streamed by skipping that check is
> irrelevant and doesn't fall under the start-end segment range?
>
Please have a look at the attached version, specifically patch 0005.
In astreamer_waldump_content(), I have moved the WAL file filtration
check from get_archive_wal_entry(). This check will be skipped during
the initial read in init_archive_reader(), which instead performs it
explicitly once it determines the WAL segment size and the start/end
segments.
To access the WAL segment size inside astreamer_waldump_content(), I
have moved the WAL segment size variable into the XLogDumpPrivate
structure in the separate 0004 patch.
Regards,
Amul
Attachments:
[application/x-patch] v12-0001-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 2-v12-0001-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
download | inline diff:
From 0731db48bb8d154aa72d2c956dec95a8127ae07d Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v12 1/9] Refactor: pg_waldump: Move some declarations to new
pg_waldump.h
This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
src/bin/pg_waldump/pg_waldump.c | 9 +--------
src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..4b7411a6498 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..b88543856e5
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* end of PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v12-0002-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 3-v12-0002-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
download | inline diff:
From 09f887f7f6b4c142b527dc6410bc781884646681 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:38:16 +0530
Subject: [PATCH v12 2/9] Refactor: pg_waldump: Separate logic used to
calculate the required read size.
This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b7411a6498..958a71a01cf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
--
2.47.1
[application/x-patch] v12-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch (5.6K, 4-v12-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch)
download | inline diff:
From 05d2c4218d9c3496878f342c9c32ff5148d3d6a4 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 11:06:05 +0530
Subject: [PATCH v12 3/9] Refactor: pg_waldump: Restructure TAP tests.
Restructured tests that do not have a WAL file argument to run within
a loop, facilitating their re-execution for decoding WAL from tar
archives.
== NOTE ==
This is not intended to be committed separately. It can be merged
with the next patch, which is the main patch implementing this
feature.
---
src/bin/pg_waldump/t/001_basic.pl | 123 ++++++++++++++++--------------
1 file changed, 67 insertions(+), 56 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..3288fadcf48 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,15 +205,6 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
@@ -272,7 +241,6 @@ sub test_pg_waldump
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
'--start' => $start_lsn,
'--end' => $end_lsn,
@opts
@@ -288,38 +256,81 @@ sub test_pg_waldump
my @lines;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir
+ });
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ SKIP:
+ {
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump('--path' => $path);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+ @lines = test_pg_waldump('--path' => $path, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ @lines = test_pg_waldump('--path' => $path, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ @lines = test_pg_waldump('--path' => $path, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ @lines = test_pg_waldump('--path' => $path, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump('--path' => $path, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump('--path' => $path,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump('--path' => $path,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ }
+}
done_testing();
--
2.47.1
[application/x-patch] v12-0004-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch (5.1K, 5-v12-0004-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch)
download | inline diff:
From c00a6728cde927f4bb092a95694d1fcd2c17205f Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 4 Feb 2026 15:31:51 +0530
Subject: [PATCH v12 4/9] Refactor: pg_waldump: Move WAL segment size to
XLogDumpPrivate.
Relocate the WAL segment size variable to the XLogDumpPrivate
structure and rename it to segsize for consistency. This change is
required to make the segment size accessible to the archive streamer
code, where passing it as a function argument is not feasible.
---
src/bin/pg_waldump/pg_waldump.c | 26 +++++++++++++-------------
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 958a71a01cf..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -811,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -865,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1138,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1159,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1175,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1190,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1200,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1213,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1234,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index b88543856e5..4f1b2ab668b 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -17,6 +17,7 @@
typedef struct XLogDumpPrivate
{
TimeLineID timeline;
+ int segsize;
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
--
2.47.1
[application/x-patch] v12-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch (42.3K, 6-v12-0005-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
download | inline diff:
From d1bb58eb796b8d1d25eb836c21dd874cb46f6359 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 10 Feb 2026 11:42:36 +0530
Subject: [PATCH v12 5/9] pg_waldump: Add support for archived WAL decoding.
pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.
Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 669 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 242 +++++++---
src/bin/pg_waldump/pg_waldump.h | 45 ++
src/bin/pg_waldump/t/001_basic.pl | 83 +++-
src/tools/pgindent/typedefs.list | 3 +
8 files changed, 986 insertions(+), 75 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index ce23add5577..d004bb0f67e 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided, its WAL segment files must be in
+ sequential order; otherwise, an error will be reported.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..27a5a5c6d5d
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,669 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as it moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Returns true if the given file is a tar archive and outputs its compression
+ * algorithm.
+ */
+bool
+is_archive_file(const char *fname, pg_compress_algorithm *compression)
+{
+ int fname_len = strlen(fname);
+
+ /* Now, check the compression type of the tar */
+ if (fname_len > 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *compression = PG_COMPRESSION_NONE;
+ else if (fname_len > 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *compression = PG_COMPRESSION_GZIP;
+ else if (fname_len > 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *compression = PG_COMPRESSION_GZIP;
+ else if (fname_len > 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *compression = PG_COMPRESSION_LZ4;
+ else if (fname_len > 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *compression = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* Before that we must parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data; raise an error if it's not the current
+ * segment being read by the archive streamer or if reading of the
+ * archived file has finished.
+ */
+ if (privateInfo->cur_file != entry ||
+ read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+ fname, privateInfo->archive_name,
+ (long long int) count - nbytes,
+ (long long int) nbytes);
+ }
+ }
+
+ /*
+ * Should have either have successfully read all the requested bytes or
+ * reported a failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. Although we could
+ * return a boolean since we either successfully read the WAL page or
+ * raise an error, but the caller expects this value to be returned. The
+ * routine that reads WAL pages from the physical WAL file follows the
+ * same convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateinfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /* Fetch more data */
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (privateInfo->cur_file == NULL)
+ continue;
+
+ entry = privateInfo->cur_file;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /* WAL segments must be archived in order */
+ pg_log_error("WAL files are not archived in sequential order");
+ pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
+ fname, entry->fname);
+ exit(1);
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+ char *buffer;
+
+ buffer = pg_malloc(count * sizeof(uint8));
+
+ rc = read(privateInfo->archive_fd, buffer, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ buffer, rc, ASTREAMER_UNKNOWN);
+ pg_free(buffer);
+
+ return rc;
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0(sizeof(astreamer_waldump));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ free(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..6d04462d039 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,67 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo nextSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer space
+ * occupied by the previous segment data. Since pg_waldump never requests
+ * the same WAL bytes twice, moving to a new segment implies the previous
+ * buffer's data and that segment will not be needed again.
+ */
+ nextSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, nextSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+ }
+
+ /* Read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +838,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +871,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +931,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1010,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,10 +1174,19 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (is_archive_file(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
pg_log_error("could not open directory \"%s\": %m", waldir);
goto bad_argument;
@@ -1128,6 +1204,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1225,76 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && is_archive_file(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1212,12 +1306,36 @@ main(int argc, char **argv)
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1363,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1447,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 4f1b2ab668b..02da2c43b08 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,11 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +26,46 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern bool is_archive_file(const char *fname,
+ pg_compress_algorithm *compression);
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* end of PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 3288fadcf48..cae543c8990 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
program_options_handling_ok('pg_waldump');
@@ -235,7 +238,7 @@ command_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, @opts) = @_;
my ($stdout, $stderr);
@@ -243,6 +246,7 @@ sub test_pg_waldump
'pg_waldump',
'--start' => $start_lsn,
'--end' => $end_lsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -254,11 +258,50 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = sort @files;
+
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
my @scenarios = (
{
- 'path' => $node->data_dir
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
});
for my $scenario (@scenarios)
@@ -267,6 +310,19 @@ for my $scenario (@scenarios)
SKIP:
{
+ skip "tar command is not available", 3
+ if !defined $tar;
+ skip "$scenario->{'compression_method'} compression not supported by this build", 3
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
+
command_fails_like(
[ 'pg_waldump', '--path' => $path ],
qr/error: no start WAL location given/,
@@ -298,38 +354,41 @@ for my $scenario (@scenarios)
qr/error: error in WAL record at/,
'errors are shown with --quiet');
- @lines = test_pg_waldump('--path' => $path);
+ my @lines = test_pg_waldump($path);
is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
- @lines = test_pg_waldump('--path' => $path, '--limit' => 6);
+ @lines = test_pg_waldump($path, '--limit' => 6);
is(@lines, 6, 'limit option observed');
- @lines = test_pg_waldump('--path' => $path, '--fullpage');
+ @lines = test_pg_waldump($path, '--fullpage');
is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
- @lines = test_pg_waldump('--path' => $path, '--stats');
+ @lines = test_pg_waldump($path, '--stats');
like($lines[0], qr/WAL statistics/, "statistics on stdout");
is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
- @lines = test_pg_waldump('--path' => $path, '--stats=record');
+ @lines = test_pg_waldump($path, '--stats=record');
like($lines[0], qr/WAL statistics/, "statistics on stdout");
is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
- @lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree');
+ @lines = test_pg_waldump($path, '--rmgr' => 'Btree');
is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
- @lines = test_pg_waldump('--path' => $path, '--fork' => 'init');
+ @lines = test_pg_waldump($path, '--fork' => 'init');
is(grep(!/fork init/, @lines), 0, 'only init fork lines');
- @lines = test_pg_waldump('--path' => $path,
+ @lines = test_pg_waldump($path,
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
0, 'only lines for selected relation');
- @lines = test_pg_waldump('--path' => $path,
+ @lines = test_pg_waldump($path,
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
'--block' => 1);
is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
}
}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f5ee8fd482..2cd87de84ee 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -144,6 +144,8 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3506,6 +3508,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v12-0006-pg_waldump-Remove-the-restriction-on-the-order-o.patch (12.8K, 7-v12-0006-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
download | inline diff:
From 77e29d3162afec46d9ed9f3d592bdbeee6347b37 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 27 Jan 2026 15:38:34 +0530
Subject: [PATCH v12 6/9] pg_waldump: Remove the restriction on the order of
archived WAL files.
With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue. Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/archive_waldump.c | 171 +++++++++++++++++++++++++--
src/bin/pg_waldump/pg_waldump.c | 31 ++++-
src/bin/pg_waldump/pg_waldump.h | 3 +
src/bin/pg_waldump/t/001_basic.pl | 3 +-
5 files changed, 196 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d004bb0f67e..27adf77755c 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
of <envar>PGDATA</envar>.
</para>
<para>
- If a tar archive is provided, its WAL segment files must be in
- sequential order; otherwise, an error will be reported.
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 27a5a5c6d5d..b1353088c4a 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "access/xlog_internal.h"
+#include "common/file_perm.h"
#include "common/hashfn.h"
#include "common/logging.h"
#include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
/*
* Check if the start segment number is zero; this indicates a request to read
* any WAL file.
@@ -57,6 +61,8 @@ typedef struct ArchivedWALFile
const char *fname; /* hash key: WAL segment name */
StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
int read_len; /* total bytes of a WAL read from archive */
} ArchivedWALFile;
@@ -84,6 +90,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo,
int WalSegSz);
static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
static void astreamer_waldump_content(astreamer *streamer,
@@ -137,7 +148,9 @@ is_archive_file(const char *fname, pg_compress_algorithm *compression)
/*
* Initializes the tar archive reader, creates a hash table for WAL entries,
* checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
*/
void
init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -230,6 +243,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
}
/*
@@ -396,6 +416,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
destroyStringInfo(entry->buf);
entry->buf = NULL;
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
/* Set cur_file to NULL if it matches the entry being ignored */
if (privateInfo->cur_file == entry)
privateInfo->cur_file = NULL;
@@ -407,12 +438,16 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* Returns the archived WAL entry from the hash table if it exists. Otherwise,
* it invokes the routine to read the archived file, which then populates the
* entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
int WalSegSz)
{
ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
/* Search hash table */
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
@@ -426,28 +461,59 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
*/
while (1)
{
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
/* Fetch more data */
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
- break; /* archive file ended */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
/*
* Archived streamer is reading a non-WAL file or an irrelevant WAL
* file.
*/
- if (privateInfo->cur_file == NULL)
+ if (entry == NULL)
continue;
- entry = privateInfo->cur_file;
-
/* Found the required entry */
if (strcmp(fname, entry->fname) == 0)
return entry;
- /* WAL segments must be archived in order */
- pg_log_error("WAL files are not archived in sequential order");
- pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
- fname, entry->fname);
- exit(1);
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
}
/* Requested WAL segment not found */
@@ -485,7 +551,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
}
/*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
*/
static astreamer *
astreamer_waldump_new(XLogDumpPrivate *privateInfo)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 6d04462d039..faf300af2be 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -478,25 +478,46 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return -1;
/*
- * If the target page is in a different segment, free the buffer space
- * occupied by the previous segment data. Since pg_waldump never requests
- * the same WAL bytes twice, moving to a new segment implies the previous
- * buffer's data and that segment will not be needed again.
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
*/
nextSegNo = state->seg.ws_segno;
if (!XLByteInSeg(targetPagePtr, nextSegNo, WalSegSz))
{
char fname[MAXFNAMELEN];
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
free_archive_wal_entry(fname, private);
XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
state->seg.ws_tli = private->timeline;
state->seg.ws_segno = nextSegNo;
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
}
- /* Read the WAL page from the archive streamer */
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
return read_archive_wal_page(private, targetPagePtr, count, readBuff,
WalSegSz);
}
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 02da2c43b08..476f74e2846 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -18,6 +18,9 @@
struct ArchivedWALFile;
struct ArchivedWAL_hash;
+/* Temporary directory */
+extern char *TmpWalSegDir;
+
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
{
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index cae543c8990..55a21c71208 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
my $tar = $ENV{TAR};
@@ -272,7 +273,7 @@ sub generate_archive
}
closedir $dh;
- @files = sort @files;
+ @files = shuffle @files;
# move into the WAL directory before archiving files
my $cwd = getcwd;
--
2.47.1
[application/x-patch] v12-0007-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 8-v12-0007-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
download | inline diff:
From 8cb70851571e268a6be7763c845d79b9a8b50cc0 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v12 7/9] pg_verifybackup: Delay default WAL directory
preparation.
We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f9f2d457f2f..ab01c4d003a 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /* By default, look for the WAL in the backup directory, too. */
+ if (wal_directory == NULL)
+ wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
--
2.47.1
[application/x-patch] v12-0008-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 9-v12-0008-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
download | inline diff:
From c2b0309deeb401d1cfd0ed6140d0a1d37bdd7e27 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v12 8/9] pg_verifybackup: Rename the wal-directory switch to
wal-path
With previous patches to pg_waldump can now decode WAL directly from
tar files. This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.
To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.
== NOTE ==
The corresponding PO files require updating due to this change.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++-----------
src/bin/pg_verifybackup/t/007_wal.pl | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index ab01c4d003a..3103d36f1b9 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +126,7 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -135,7 +135,7 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +221,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -365,15 +365,15 @@ main(int argc, char **argv)
verify_backup_checksums(&context);
/* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+ if (wal_path == NULL)
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -1198,7 +1198,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1208,7 +1208,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1376,7 +1376,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
--
2.47.1
[application/x-patch] v12-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch (9.9K, 10-v12-0009-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch)
download | inline diff:
From 8abeadee490f1d7d751d71c3a1490986cbe26f36 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v12 9/9] pg_verifybackup: enabled WAL parsing for tar-format
backup
Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 5 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 --
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
6 files changed, 50 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..16b50b5a4df 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 3103d36f1b9..cc492728ae8 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -136,6 +140,8 @@ main(int argc, char **argv)
bool no_parse_wal = false;
bool quiet = false;
char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -327,17 +333,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -346,7 +341,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -364,9 +359,28 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
- /* By default, look for the WAL in the backup directory, too. */
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
if (wal_path == NULL)
- wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-02-18 06:58 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-02-18 06:58 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <[email protected]> wrote:
>
> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:
> >
> > On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:
> > >
> > > On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:
> > > > In the attached version, I am using the WAL segment name as the hash
> > > > key, which is much more straightforward. I have rewritten
> > > > read_archive_wal_page(), and it looks much cleaner than before. The
> > > > logic to discard irrelevant WAL files is still within
> > > > get_archive_wal_entry. I added an explanation for setting cur_wal to
> > > > NULL, which is now handled in the separate function I mentioned
> > > > previously.
> > > >
> > > > Kindly have a look at the attached version; let me know if you are
> > > > still not happy with the current approach for filtering/discarding
> > > > irrelevant WAL segments. It isn't much different from the previous
> > > > version, but I have tried to keep it in a separate routine for better
> > > > code readability, with comments to make it easier to understand. I
> > > > also added a comment for ArchivedWALFile.
> > >
> > > I feel like the division of labor between get_archive_wal_entry() and
> > > read_archive_wal_page() is odd. I noticed this in the last version,
> > > too, and it still seems to be the case. get_archive_wal_entry() first
> > > calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> > > If it doesn't, it loops until an entry for the requested file shows up
> > > and then returns it. Then control returns to read_archive_wal_page()
> > > which loops some more until we have all the data we need for the
> > > requested file. But it seems odd to me to have two separate loops
> > > here. I think that the first loop is going to call read_archive_file()
> > > until we find the beginning of the file that we care about and then
> > > the second one is going to call read_archive_file() some more until we
> > > have read enough of it to satisfy the request. It feels odd to me to
> > > do it that way, as if we told somebody to first wait until 9 o'clock
> > > and then wait another 30 minutes, instead of just telling them to wait
> > > until 9:30. I realize it's not quite the same thing, because apart
> > > from calling read_archive_file(), the two loops do different things,
> > > but I still think it looks odd.
> > >
> > > + /*
> > > + * Ignore if the timeline is different or the current segment is not
> > > + * the desired one.
> > > + */
> > > + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> > > + if (privateInfo->timeline != curSegTimeline ||
> > > + privateInfo->startSegNo > curSegNo ||
> > > + privateInfo->endSegNo < curSegNo ||
> > > + segno > curSegNo)
> > > + {
> > > + free_archive_wal_entry(entry->fname, privateInfo);
> > > + continue;
> > > + }
> > >
> > > The comment doesn't match the code. If it did, the test would be
> > > (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> > > instead the segno test is > rather than !=, and the checks against
> > > startSegNo and endSegNo aren't explained at all. I think I understand
> > > why the segno test uses > rather than !=, but it's the point of the
> > > comment to explain things like that, rather than leaving the reader to
> > > guess. And I don't know why we also need to test startSegNo and
> > > endSegNo.
> > >
> > > I also wonder what the point is of doing XLogFromFileName() on the
> > > fname provided by the caller and then again on entry->fname. Couldn't
> > > you just compare the strings?
> > >
> > > Again, the division of labor is really odd here. It's the job of
> > > astreamer_waldump_content() to skip things that aren't WAL files at
> > > all, but it's the job of get_archive_wal_entry() to skip things that
> > > are WAL files but not the one we want. I disagree with putting those
> > > checks in completely separate parts of the code.
> > >
> >
> > Keeping the timeline and segment start-end range checks inside the
> > archive streamer creates a circular dependency that cannot be resolved
> > without a 'dirty hack'. We must read the first available WAL file page
> > to determine the wal_segment_size before it can calculate the target
> > segment range. Moving the checks inside the streamer would make it
> > impossible to process that initial file, as the necessary filtering
> > parameters -- would still be unknown which would need to be skipped
> > for the first read somehow. What if later we realized that the first
> > WAL file which was allowed to be streamed by skipping that check is
> > irrelevant and doesn't fall under the start-end segment range?
> >
>
> Please have a look at the attached version, specifically patch 0005.
> In astreamer_waldump_content(), I have moved the WAL file filtration
> check from get_archive_wal_entry(). This check will be skipped during
> the initial read in init_archive_reader(), which instead performs it
> explicitly once it determines the WAL segment size and the start/end
> segments.
>
> To access the WAL segment size inside astreamer_waldump_content(), I
> have moved the WAL segment size variable into the XLogDumpPrivate
> structure in the separate 0004 patch.
Attached is an updated version including the aforesaid changes. It
includes a new refactoring patch (0001) that moves the logic for
identifying tar archives and their compression types from
pg_basebackup and pg_verifybackup into a separate-reusable function,
per a suggestion from Euler [1]. Additionally, I have added a test
for the contrecord decoding to the main patch (now 0006).
1] http://postgr.es/m/[email protected]
Regards,
Amul
Attachments:
[application/x-patch] v13-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch (6.7K, 2-v13-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch)
download | inline diff:
From 6e4331bcbb98160365fe02502ae0a5cd2aea1726 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v13 01/10] Refactor: Move tar archive parsing into a common
location.
pg_basebackup and pg_verifybackup both require logic to identify tar
files and determine their compression types. Similar functionality
will be needed for pg_waldump when it gets the capability to decode
WAL files from tar archives. Moving this logic to a common location
allows for reuse and prevents code duplication.
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 1e3a8203f77..8911b8b921d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check weather it is tar archive and its compress type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index f9f2d457f2f..5ddc4c33feb 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..f117e21237f 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..50f21656b88 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v13-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 3-v13-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
download | inline diff:
From 17fe3b9a8b649f0146fe237f8ed2960246f26907 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v13 02/10] Refactor: pg_waldump: Move some declarations to new
pg_waldump.h
This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
src/bin/pg_waldump/pg_waldump.c | 9 +--------
src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..4b7411a6498 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..b88543856e5
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* end of PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v13-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 4-v13-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
download | inline diff:
From 0184347f5582b5973b80056d03eb6cacc85725c2 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:38:16 +0530
Subject: [PATCH v13 03/10] Refactor: pg_waldump: Separate logic used to
calculate the required read size.
This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b7411a6498..958a71a01cf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
--
2.47.1
[application/x-patch] v13-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch (6.6K, 5-v13-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch)
download | inline diff:
From b5298076adc315659a6cde3ae8f6883ee025f3d6 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v13 04/10] Refactor: pg_waldump: Restructure TAP tests.
Restructured tests that do not have a WAL file argument to run within
a loop, facilitating their re-execution for decoding WAL from tar
archives.
== NOTE ==
This is not intended to be committed separately. It can be merged
with the next patch, which is the main patch implementing this
feature.
---
src/bin/pg_waldump/t/001_basic.pl | 140 +++++++++++++++++-------------
1 file changed, 79 insertions(+), 61 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..f12ba52cbfc 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +205,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +224,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +239,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -288,38 +261,83 @@ sub test_pg_waldump
my @lines;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir
+ });
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ SKIP:
+ {
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ }
+}
done_testing();
--
2.47.1
[application/x-patch] v13-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch (5.1K, 6-v13-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch)
download | inline diff:
From 6b38a9b7bd81a15ba8647f51d53156b8a9dd405c Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 4 Feb 2026 15:31:51 +0530
Subject: [PATCH v13 05/10] Refactor: pg_waldump: Move WAL segment size to
XLogDumpPrivate.
Relocate the WAL segment size variable to the XLogDumpPrivate
structure and rename it to segsize for consistency. This change is
required to make the segment size accessible to the archive streamer
code, where passing it as a function argument is not feasible.
---
src/bin/pg_waldump/pg_waldump.c | 26 +++++++++++++-------------
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 958a71a01cf..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -811,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -865,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1138,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1159,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1175,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1190,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1200,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1213,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1234,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index b88543856e5..4f1b2ab668b 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -17,6 +17,7 @@
typedef struct XLogDumpPrivate
{
TimeLineID timeline;
+ int segsize;
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
--
2.47.1
[application/x-patch] v13-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch (41.7K, 7-v13-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
download | inline diff:
From fa0872a5c7714c1958780a95ad0091f0d0c5f94d Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 10 Feb 2026 11:42:36 +0530
Subject: [PATCH v13 06/10] pg_waldump: Add support for archived WAL decoding.
pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.
Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 638 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 255 ++++++++---
src/bin/pg_waldump/pg_waldump.h | 43 ++
src/bin/pg_waldump/t/001_basic.pl | 105 ++++-
src/tools/pgindent/typedefs.list | 3 +
8 files changed, 997 insertions(+), 66 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index ce23add5577..d004bb0f67e 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided, its WAL segment files must be in
+ sequential order; otherwise, an error will be reported.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..ecc022a81a2
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,638 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as it moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* Before that we must parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data; raise an error if it's not the current
+ * segment being read by the archive streamer or if reading of the
+ * archived file has finished.
+ */
+ if (privateInfo->cur_file != entry ||
+ read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+ fname, privateInfo->archive_name,
+ (long long int) count - nbytes,
+ (long long int) nbytes);
+ }
+ }
+
+ /*
+ * Should have either have successfully read all the requested bytes or
+ * reported a failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. Although we could
+ * return a boolean since we either successfully read the WAL page or
+ * raise an error, but the caller expects this value to be returned. The
+ * routine that reads WAL pages from the physical WAL file follows the
+ * same convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateinfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /* Fetch more data */
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (privateInfo->cur_file == NULL)
+ continue;
+
+ entry = privateInfo->cur_file;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /* WAL segments must be archived in order */
+ pg_log_error("WAL files are not archived in sequential order");
+ pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
+ fname, entry->fname);
+ exit(1);
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+ char *buffer;
+
+ buffer = pg_malloc(count * sizeof(uint8));
+
+ rc = read(privateInfo->archive_fd, buffer, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ buffer, rc, ASTREAMER_UNKNOWN);
+ pg_free(buffer);
+
+ return rc;
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0(sizeof(astreamer_waldump));
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ free(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..90fc13f3609 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,80 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer space
+ * occupied by the previous segment data. Since pg_waldump never requests
+ * the same WAL bytes twice, moving to a new segment implies the previous
+ * buffer's data and that segment will not be needed again.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete any
+ * entries that might be requested again once the decoding loop starts.
+ * For more details, see the comments in read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+ }
+
+ /* Read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +851,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +884,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +944,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1023,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,10 +1187,19 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
pg_log_error("could not open directory \"%s\": %m", waldir);
goto bad_argument;
@@ -1128,6 +1217,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1238,76 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1212,12 +1319,36 @@ main(int argc, char **argv)
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1376,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1460,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 4f1b2ab668b..90fe96840e2 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,11 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +26,44 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* end of PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index f12ba52cbfc..9ab7457e9e2 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
program_options_handling_ok('pg_waldump');
@@ -162,6 +165,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -259,11 +298,50 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = sort @files;
+
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
my @scenarios = (
{
- 'path' => $node->data_dir
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
});
for my $scenario (@scenarios)
@@ -272,6 +350,19 @@ for my $scenario (@scenarios)
SKIP:
{
+ skip "tar command is not available", 3
+ if !defined $tar;
+ skip "$scenario->{'compression_method'} compression not supported by this build", 3
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
+
command_fails_like(
[ 'pg_waldump', '--path' => $path ],
qr/error: no start WAL location given/,
@@ -305,9 +396,14 @@ for my $scenario (@scenarios)
test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
is(@lines, 6, 'limit option observed');
@@ -337,6 +433,9 @@ for my $scenario (@scenarios)
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
'--block' => 1);
is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
}
}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 241945734ec..18ab2e848b6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -145,6 +145,8 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3511,6 +3513,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v13-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.1K, 8-v13-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
download | inline diff:
From c36ba0507caaf04b4377cc83c4e63be187943367 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 27 Jan 2026 15:38:34 +0530
Subject: [PATCH v13 07/10] pg_waldump: Remove the restriction on the order of
archived WAL files.
With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue. Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/archive_waldump.c | 171 +++++++++++++++++++++++++--
src/bin/pg_waldump/pg_waldump.c | 32 ++++-
src/bin/pg_waldump/pg_waldump.h | 3 +
src/bin/pg_waldump/t/001_basic.pl | 3 +-
5 files changed, 197 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d004bb0f67e..27adf77755c 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
of <envar>PGDATA</envar>.
</para>
<para>
- If a tar archive is provided, its WAL segment files must be in
- sequential order; otherwise, an error will be reported.
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index ecc022a81a2..84fae87492e 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "access/xlog_internal.h"
+#include "common/file_perm.h"
#include "common/hashfn.h"
#include "common/logging.h"
#include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
/*
* Check if the start segment number is zero; this indicates a request to read
* any WAL file.
@@ -57,6 +61,8 @@ typedef struct ArchivedWALFile
const char *fname; /* hash key: WAL segment name */
StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
int read_len; /* total bytes of a WAL read from archive */
} ArchivedWALFile;
@@ -84,6 +90,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo,
int WalSegSz);
static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
static void astreamer_waldump_content(astreamer *streamer,
@@ -106,7 +117,9 @@ static const astreamer_ops astreamer_waldump_ops = {
/*
* Initializes the tar archive reader, creates a hash table for WAL entries,
* checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
*/
void
init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -199,6 +212,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
}
/*
@@ -365,6 +385,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
destroyStringInfo(entry->buf);
entry->buf = NULL;
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
/* Set cur_file to NULL if it matches the entry being ignored */
if (privateInfo->cur_file == entry)
privateInfo->cur_file = NULL;
@@ -376,12 +407,16 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* Returns the archived WAL entry from the hash table if it exists. Otherwise,
* it invokes the routine to read the archived file, which then populates the
* entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
int WalSegSz)
{
ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
/* Search hash table */
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
@@ -395,28 +430,59 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
*/
while (1)
{
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
/* Fetch more data */
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
- break; /* archive file ended */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
/*
* Archived streamer is reading a non-WAL file or an irrelevant WAL
* file.
*/
- if (privateInfo->cur_file == NULL)
+ if (entry == NULL)
continue;
- entry = privateInfo->cur_file;
-
/* Found the required entry */
if (strcmp(fname, entry->fname) == 0)
return entry;
- /* WAL segments must be archived in order */
- pg_log_error("WAL files are not archived in sequential order");
- pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
- fname, entry->fname);
- exit(1);
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
}
/* Requested WAL segment not found */
@@ -454,7 +520,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
}
/*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
*/
static astreamer *
astreamer_waldump_new(XLogDumpPrivate *privateInfo)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 90fc13f3609..114969217d8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -478,10 +478,14 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return -1;
/*
- * If the target page is in a different segment, free the buffer space
- * occupied by the previous segment data. Since pg_waldump never requests
- * the same WAL bytes twice, moving to a new segment implies the previous
- * buffer's data and that segment will not be needed again.
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
*/
curSegNo = state->seg.ws_segno;
if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
@@ -497,6 +501,13 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
state->seg.ws_tli = private->timeline;
state->seg.ws_segno = nextSegNo;
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
/*
* If in pre-reading mode (prior to actual decoding), do not delete any
* entries that might be requested again once the decoding loop starts.
@@ -507,9 +518,20 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
free_archive_wal_entry(fname, private);
}
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
}
- /* Read the WAL page from the archive streamer */
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
return read_archive_wal_page(private, targetPagePtr, count, readBuff,
WalSegSz);
}
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 90fe96840e2..75ed0f37538 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -18,6 +18,9 @@
struct ArchivedWALFile;
struct ArchivedWAL_hash;
+/* Temporary directory */
+extern char *TmpWalSegDir;
+
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
{
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 9ab7457e9e2..9854c939007 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
my $tar = $ENV{TAR};
@@ -312,7 +313,7 @@ sub generate_archive
}
closedir $dh;
- @files = sort @files;
+ @files = shuffle @files;
# move into the WAL directory before archiving files
my $cwd = getcwd;
--
2.47.1
[application/x-patch] v13-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 9-v13-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
download | inline diff:
From 15ab0e423bffb5985bea622aa7c243cbd3316c6b Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v13 08/10] pg_verifybackup: Delay default WAL directory
preparation.
We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 5ddc4c33feb..04cca3bc0f5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /* By default, look for the WAL in the backup directory, too. */
+ if (wal_directory == NULL)
+ wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
--
2.47.1
[application/x-patch] v13-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 10-v13-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
download | inline diff:
From c9d1a4072da9bfa132e2e99238ddbb7e318e933b Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v13 09/10] pg_verifybackup: Rename the wal-directory switch to
wal-path
With previous patches to pg_waldump can now decode WAL directly from
tar files. This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.
To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.
== NOTE ==
The corresponding PO files require updating due to this change.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++-----------
src/bin/pg_verifybackup/t/007_wal.pl | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 04cca3bc0f5..e149ca96050 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +126,7 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -135,7 +135,7 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +221,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -365,15 +365,15 @@ main(int argc, char **argv)
verify_backup_checksums(&context);
/* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+ if (wal_path == NULL)
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -1188,7 +1188,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1198,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1366,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
--
2.47.1
[application/x-patch] v13-0010-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch (9.9K, 11-v13-0010-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch)
download | inline diff:
From 2083da7ca8f5964606a8ec02d58b95ce4e58002e Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v13 10/10] pg_verifybackup: enabled WAL parsing for tar-format
backup
Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 5 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 --
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
6 files changed, 50 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..16b50b5a4df 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index e149ca96050..8dee0043bab 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -136,6 +140,8 @@ main(int argc, char **argv)
bool no_parse_wal = false;
bool quiet = false;
char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -327,17 +333,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -346,7 +341,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -364,9 +359,28 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
- /* By default, look for the WAL in the backup directory, too. */
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
if (wal_path == NULL)
- wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-02 13:00 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-02 13:00 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <[email protected]> wrote:
>
> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <[email protected]> wrote:
> >
> > On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:
> > >
> > > On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:
> > > >
> > > > On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:
> > > > > In the attached version, I am using the WAL segment name as the hash
> > > > > key, which is much more straightforward. I have rewritten
> > > > > read_archive_wal_page(), and it looks much cleaner than before. The
> > > > > logic to discard irrelevant WAL files is still within
> > > > > get_archive_wal_entry. I added an explanation for setting cur_wal to
> > > > > NULL, which is now handled in the separate function I mentioned
> > > > > previously.
> > > > >
> > > > > Kindly have a look at the attached version; let me know if you are
> > > > > still not happy with the current approach for filtering/discarding
> > > > > irrelevant WAL segments. It isn't much different from the previous
> > > > > version, but I have tried to keep it in a separate routine for better
> > > > > code readability, with comments to make it easier to understand. I
> > > > > also added a comment for ArchivedWALFile.
> > > >
> > > > I feel like the division of labor between get_archive_wal_entry() and
> > > > read_archive_wal_page() is odd. I noticed this in the last version,
> > > > too, and it still seems to be the case. get_archive_wal_entry() first
> > > > calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> > > > If it doesn't, it loops until an entry for the requested file shows up
> > > > and then returns it. Then control returns to read_archive_wal_page()
> > > > which loops some more until we have all the data we need for the
> > > > requested file. But it seems odd to me to have two separate loops
> > > > here. I think that the first loop is going to call read_archive_file()
> > > > until we find the beginning of the file that we care about and then
> > > > the second one is going to call read_archive_file() some more until we
> > > > have read enough of it to satisfy the request. It feels odd to me to
> > > > do it that way, as if we told somebody to first wait until 9 o'clock
> > > > and then wait another 30 minutes, instead of just telling them to wait
> > > > until 9:30. I realize it's not quite the same thing, because apart
> > > > from calling read_archive_file(), the two loops do different things,
> > > > but I still think it looks odd.
> > > >
> > > > + /*
> > > > + * Ignore if the timeline is different or the current segment is not
> > > > + * the desired one.
> > > > + */
> > > > + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> > > > + if (privateInfo->timeline != curSegTimeline ||
> > > > + privateInfo->startSegNo > curSegNo ||
> > > > + privateInfo->endSegNo < curSegNo ||
> > > > + segno > curSegNo)
> > > > + {
> > > > + free_archive_wal_entry(entry->fname, privateInfo);
> > > > + continue;
> > > > + }
> > > >
> > > > The comment doesn't match the code. If it did, the test would be
> > > > (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> > > > instead the segno test is > rather than !=, and the checks against
> > > > startSegNo and endSegNo aren't explained at all. I think I understand
> > > > why the segno test uses > rather than !=, but it's the point of the
> > > > comment to explain things like that, rather than leaving the reader to
> > > > guess. And I don't know why we also need to test startSegNo and
> > > > endSegNo.
> > > >
> > > > I also wonder what the point is of doing XLogFromFileName() on the
> > > > fname provided by the caller and then again on entry->fname. Couldn't
> > > > you just compare the strings?
> > > >
> > > > Again, the division of labor is really odd here. It's the job of
> > > > astreamer_waldump_content() to skip things that aren't WAL files at
> > > > all, but it's the job of get_archive_wal_entry() to skip things that
> > > > are WAL files but not the one we want. I disagree with putting those
> > > > checks in completely separate parts of the code.
> > > >
> > >
> > > Keeping the timeline and segment start-end range checks inside the
> > > archive streamer creates a circular dependency that cannot be resolved
> > > without a 'dirty hack'. We must read the first available WAL file page
> > > to determine the wal_segment_size before it can calculate the target
> > > segment range. Moving the checks inside the streamer would make it
> > > impossible to process that initial file, as the necessary filtering
> > > parameters -- would still be unknown which would need to be skipped
> > > for the first read somehow. What if later we realized that the first
> > > WAL file which was allowed to be streamed by skipping that check is
> > > irrelevant and doesn't fall under the start-end segment range?
> > >
> >
> > Please have a look at the attached version, specifically patch 0005.
> > In astreamer_waldump_content(), I have moved the WAL file filtration
> > check from get_archive_wal_entry(). This check will be skipped during
> > the initial read in init_archive_reader(), which instead performs it
> > explicitly once it determines the WAL segment size and the start/end
> > segments.
> >
> > To access the WAL segment size inside astreamer_waldump_content(), I
> > have moved the WAL segment size variable into the XLogDumpPrivate
> > structure in the separate 0004 patch.
>
> Attached is an updated version including the aforesaid changes. It
> includes a new refactoring patch (0001) that moves the logic for
> identifying tar archives and their compression types from
> pg_basebackup and pg_verifybackup into a separate-reusable function,
> per a suggestion from Euler [1]. Additionally, I have added a test
> for the contrecord decoding to the main patch (now 0006).
>
> 1] http://postgr.es/m/[email protected]
>
Rebased against the latest master, fixed typos in code comments, and
replaced palloc0 with palloc0_object.
Regards,
Amul
Attachments:
[application/octet-stream] v14-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch (6.7K, 2-v14-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch)
download | inline diff:
From 54fd70f2b5df10e6df575b4f85eaecb8a3c1ff94 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v14 01/11] Refactor: Move tar archive parsing into a common
location.
pg_basebackup and pg_verifybackup both require logic to identify tar
files and determine their compression types. Similar functionality
will be needed for pg_waldump when it gets the capability to decode
WAL files from tar archives. Moving this logic to a common location
allows for reuse and prevents code duplication.
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..f117e21237f 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..50f21656b88 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/octet-stream] v14-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 3-v14-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
download | inline diff:
From 14706302872c7e35934345fe75e1f24a5857ad16 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v14 02/11] Refactor: pg_waldump: Move some declarations to new
pg_waldump.h
This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
src/bin/pg_waldump/pg_waldump.c | 9 +--------
src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..4b7411a6498 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..64a9109229e
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/octet-stream] v14-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 4-v14-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
download | inline diff:
From e62670767a8164ca8c0a289aad05f24c3e84f8cc Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:38:16 +0530
Subject: [PATCH v14 03/11] Refactor: pg_waldump: Separate logic used to
calculate the required read size.
This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b7411a6498..958a71a01cf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
--
2.47.1
[application/octet-stream] v14-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch (6.6K, 5-v14-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch)
download | inline diff:
From be1fbe441570c0aef766eed410eb3465f2450b53 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v14 04/11] Refactor: pg_waldump: Restructure TAP tests.
Restructured tests that do not have a WAL file argument to run within
a loop, facilitating their re-execution for decoding WAL from tar
archives.
== NOTE ==
This is not intended to be committed separately. It can be merged
with the next patch, which is the main patch implementing this
feature.
---
src/bin/pg_waldump/t/001_basic.pl | 140 +++++++++++++++++-------------
1 file changed, 79 insertions(+), 61 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..f12ba52cbfc 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +205,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +224,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +239,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -288,38 +261,83 @@ sub test_pg_waldump
my @lines;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir
+ });
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ SKIP:
+ {
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ }
+}
done_testing();
--
2.47.1
[application/octet-stream] v14-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch (5.1K, 6-v14-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch)
download | inline diff:
From 8bb8dc6afe753f885520429613966f8cedc2b477 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 4 Feb 2026 15:31:51 +0530
Subject: [PATCH v14 05/11] Refactor: pg_waldump: Move WAL segment size to
XLogDumpPrivate.
Relocate the WAL segment size variable to the XLogDumpPrivate
structure and rename it to segsize for consistency. This change is
required to make the segment size accessible to the archive streamer
code, where passing it as a function argument is not feasible.
---
src/bin/pg_waldump/pg_waldump.c | 26 +++++++++++++-------------
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 958a71a01cf..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -811,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -865,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1138,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1159,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1175,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1190,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1200,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1213,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1234,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 64a9109229e..013b051506f 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -17,6 +17,7 @@
typedef struct XLogDumpPrivate
{
TimeLineID timeline;
+ int segsize;
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
--
2.47.1
[application/octet-stream] v14-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch (41.7K, 7-v14-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
download | inline diff:
From 4322e9804f9bce7f9fb30872c5d64736e91c653b Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 10 Feb 2026 11:42:36 +0530
Subject: [PATCH v14 06/11] pg_waldump: Add support for archived WAL decoding.
pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.
Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 639 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 255 ++++++++---
src/bin/pg_waldump/pg_waldump.h | 43 ++
src/bin/pg_waldump/t/001_basic.pl | 105 ++++-
src/tools/pgindent/typedefs.list | 3 +
8 files changed, 998 insertions(+), 66 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..15fb8d13199 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided, its WAL segment files must be in
+ sequential order; otherwise, an error will be reported.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..17d27ffa520
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,639 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as it moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* Before that we must parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data; raise an error if it's not the current
+ * segment being read by the archive streamer or if reading of the
+ * archived file has finished.
+ */
+ if (privateInfo->cur_file != entry ||
+ read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+ fname, privateInfo->archive_name,
+ (long long int) count - nbytes,
+ (long long int) nbytes);
+ }
+ }
+
+ /*
+ * Should have either have successfully read all the requested bytes or
+ * reported a failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. Although we could
+ * return a boolean since we either successfully read the WAL page or
+ * raise an error, but the caller expects this value to be returned. The
+ * routine that reads WAL pages from the physical WAL file follows the
+ * same convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateinfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /* Fetch more data */
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (privateInfo->cur_file == NULL)
+ continue;
+
+ entry = privateInfo->cur_file;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /* WAL segments must be archived in order */
+ pg_log_error("WAL files are not archived in sequential order");
+ pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
+ fname, entry->fname);
+ exit(1);
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+ char *buffer;
+
+ buffer = pg_malloc(count * sizeof(uint8));
+
+ rc = read(privateInfo->archive_fd, buffer, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ buffer, rc, ASTREAMER_UNKNOWN);
+ pg_free(buffer);
+
+ return rc;
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ free(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..90fc13f3609 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,80 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer space
+ * occupied by the previous segment data. Since pg_waldump never requests
+ * the same WAL bytes twice, moving to a new segment implies the previous
+ * buffer's data and that segment will not be needed again.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete any
+ * entries that might be requested again once the decoding loop starts.
+ * For more details, see the comments in read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+ }
+
+ /* Read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +851,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +884,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +944,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1023,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,10 +1187,19 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
pg_log_error("could not open directory \"%s\": %m", waldir);
goto bad_argument;
@@ -1128,6 +1217,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1238,76 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1212,12 +1319,36 @@ main(int argc, char **argv)
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1376,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1460,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..54d54a8a718 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,11 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +26,44 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index f12ba52cbfc..9ab7457e9e2 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
program_options_handling_ok('pg_waldump');
@@ -162,6 +165,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -259,11 +298,50 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = sort @files;
+
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
my @scenarios = (
{
- 'path' => $node->data_dir
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
});
for my $scenario (@scenarios)
@@ -272,6 +350,19 @@ for my $scenario (@scenarios)
SKIP:
{
+ skip "tar command is not available", 3
+ if !defined $tar;
+ skip "$scenario->{'compression_method'} compression not supported by this build", 3
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
+
command_fails_like(
[ 'pg_waldump', '--path' => $path ],
qr/error: no start WAL location given/,
@@ -305,9 +396,14 @@ for my $scenario (@scenarios)
test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
is(@lines, 6, 'limit option observed');
@@ -337,6 +433,9 @@ for my $scenario (@scenarios)
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
'--block' => 1);
is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
}
}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..595ad7d5c5a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -145,6 +145,8 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3513,6 +3515,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/octet-stream] v14-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.1K, 8-v14-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
download | inline diff:
From d3000a494b5d416d01e48def64a3e54e6b523dab Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 27 Jan 2026 15:38:34 +0530
Subject: [PATCH v14 07/11] pg_waldump: Remove the restriction on the order of
archived WAL files.
With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue. Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/archive_waldump.c | 171 +++++++++++++++++++++++++--
src/bin/pg_waldump/pg_waldump.c | 32 ++++-
src/bin/pg_waldump/pg_waldump.h | 3 +
src/bin/pg_waldump/t/001_basic.pl | 3 +-
5 files changed, 197 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 15fb8d13199..b36323dde92 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
of <envar>PGDATA</envar>.
</para>
<para>
- If a tar archive is provided, its WAL segment files must be in
- sequential order; otherwise, an error will be reported.
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 17d27ffa520..c5a4485b5b1 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "access/xlog_internal.h"
+#include "common/file_perm.h"
#include "common/hashfn.h"
#include "common/logging.h"
#include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
/*
* Check if the start segment number is zero; this indicates a request to read
* any WAL file.
@@ -57,6 +61,8 @@ typedef struct ArchivedWALFile
const char *fname; /* hash key: WAL segment name */
StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
int read_len; /* total bytes of a WAL read from archive */
} ArchivedWALFile;
@@ -84,6 +90,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo,
int WalSegSz);
static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
static void astreamer_waldump_content(astreamer *streamer,
@@ -106,7 +117,9 @@ static const astreamer_ops astreamer_waldump_ops = {
/*
* Initializes the tar archive reader, creates a hash table for WAL entries,
* checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
*/
void
init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -199,6 +212,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
}
/*
@@ -365,6 +385,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
destroyStringInfo(entry->buf);
entry->buf = NULL;
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
/* Set cur_file to NULL if it matches the entry being ignored */
if (privateInfo->cur_file == entry)
privateInfo->cur_file = NULL;
@@ -376,12 +407,16 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* Returns the archived WAL entry from the hash table if it exists. Otherwise,
* it invokes the routine to read the archived file, which then populates the
* entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
int WalSegSz)
{
ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
/* Search hash table */
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
@@ -395,28 +430,59 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
*/
while (1)
{
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
/* Fetch more data */
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
- break; /* archive file ended */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
/*
* Archived streamer is reading a non-WAL file or an irrelevant WAL
* file.
*/
- if (privateInfo->cur_file == NULL)
+ if (entry == NULL)
continue;
- entry = privateInfo->cur_file;
-
/* Found the required entry */
if (strcmp(fname, entry->fname) == 0)
return entry;
- /* WAL segments must be archived in order */
- pg_log_error("WAL files are not archived in sequential order");
- pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
- fname, entry->fname);
- exit(1);
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
}
/* Requested WAL segment not found */
@@ -454,7 +520,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
}
/*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
*/
static astreamer *
astreamer_waldump_new(XLogDumpPrivate *privateInfo)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 90fc13f3609..114969217d8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -478,10 +478,14 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return -1;
/*
- * If the target page is in a different segment, free the buffer space
- * occupied by the previous segment data. Since pg_waldump never requests
- * the same WAL bytes twice, moving to a new segment implies the previous
- * buffer's data and that segment will not be needed again.
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
*/
curSegNo = state->seg.ws_segno;
if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
@@ -497,6 +501,13 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
state->seg.ws_tli = private->timeline;
state->seg.ws_segno = nextSegNo;
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
/*
* If in pre-reading mode (prior to actual decoding), do not delete any
* entries that might be requested again once the decoding loop starts.
@@ -507,9 +518,20 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
free_archive_wal_entry(fname, private);
}
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
}
- /* Read the WAL page from the archive streamer */
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
return read_archive_wal_page(private, targetPagePtr, count, readBuff,
WalSegSz);
}
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 54d54a8a718..6c242b7fcbc 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -18,6 +18,9 @@
struct ArchivedWALFile;
struct ArchivedWAL_hash;
+/* Temporary directory */
+extern char *TmpWalSegDir;
+
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
{
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 9ab7457e9e2..9854c939007 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
my $tar = $ENV{TAR};
@@ -312,7 +313,7 @@ sub generate_archive
}
closedir $dh;
- @files = sort @files;
+ @files = shuffle @files;
# move into the WAL directory before archiving files
my $cwd = getcwd;
--
2.47.1
[application/octet-stream] v14-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 9-v14-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
download | inline diff:
From 3ecf640004c7aaca0430101a2d88e3d010e07440 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v14 08/11] pg_verifybackup: Delay default WAL directory
preparation.
We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..8cc204719ee 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /* By default, look for the WAL in the backup directory, too. */
+ if (wal_directory == NULL)
+ wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
--
2.47.1
[application/octet-stream] v14-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 10-v14-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
download | inline diff:
From 9ab39e96cfecdfb0c3ec1630cc5b4718fa3986de Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v14 09/11] pg_verifybackup: Rename the wal-directory switch to
wal-path
With previous patches to pg_waldump can now decode WAL directly from
tar files. This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.
To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.
== NOTE ==
The corresponding PO files require updating due to this change.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++-----------
src/bin/pg_verifybackup/t/007_wal.pl | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 8cc204719ee..34520546bc3 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +126,7 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -135,7 +135,7 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +221,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -365,15 +365,15 @@ main(int argc, char **argv)
verify_backup_checksums(&context);
/* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+ if (wal_path == NULL)
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -1188,7 +1188,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1198,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1366,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
--
2.47.1
[application/octet-stream] v14-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch (9.9K, 11-v14-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch)
download | inline diff:
From 960405ac3bcaaf514019b0344da9f3e9fbae0e19 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v14 10/11] pg_verifybackup: Enabled WAL parsing for tar-format
backup
Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 5 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 --
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
6 files changed, 50 insertions(+), 39 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..16b50b5a4df 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 34520546bc3..935ab8fafa8 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -136,6 +140,8 @@ main(int argc, char **argv)
bool no_parse_wal = false;
bool quiet = false;
char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -327,17 +333,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -346,7 +341,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -364,9 +359,28 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
- /* By default, look for the WAL in the backup directory, too. */
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
if (wal_path == NULL)
- wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-04 00:37 Andrew Dunstan <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Dunstan @ 2026-03-04 00:37 UTC (permalink / raw)
To: Amul Sul <[email protected]>; Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <[email protected]> wrote:
>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <[email protected]> wrote:
>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:
>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:
>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:
>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>> key, which is much more straightforward. I have rewritten
>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>> logic to discard irrelevant WAL files is still within
>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>> previously.
>>>>>>
>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>> still not happy with the current approach for filtering/discarding
>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>> code readability, with comments to make it easier to understand. I
>>>>>> also added a comment for ArchivedWALFile.
>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>> which loops some more until we have all the data we need for the
>>>>> requested file. But it seems odd to me to have two separate loops
>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>> until we find the beginning of the file that we care about and then
>>>>> the second one is going to call read_archive_file() some more until we
>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>> from calling read_archive_file(), the two loops do different things,
>>>>> but I still think it looks odd.
>>>>>
>>>>> + /*
>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>> + * the desired one.
>>>>> + */
>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>> + segno > curSegNo)
>>>>> + {
>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>> + continue;
>>>>> + }
>>>>>
>>>>> The comment doesn't match the code. If it did, the test would be
>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>> instead the segno test is > rather than !=, and the checks against
>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>> comment to explain things like that, rather than leaving the reader to
>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>> endSegNo.
>>>>>
>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>> you just compare the strings?
>>>>>
>>>>> Again, the division of labor is really odd here. It's the job of
>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>> checks in completely separate parts of the code.
>>>>>
>>>> Keeping the timeline and segment start-end range checks inside the
>>>> archive streamer creates a circular dependency that cannot be resolved
>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>> to determine the wal_segment_size before it can calculate the target
>>>> segment range. Moving the checks inside the streamer would make it
>>>> impossible to process that initial file, as the necessary filtering
>>>> parameters -- would still be unknown which would need to be skipped
>>>> for the first read somehow. What if later we realized that the first
>>>> WAL file which was allowed to be streamed by skipping that check is
>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>
>>> Please have a look at the attached version, specifically patch 0005.
>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>> check from get_archive_wal_entry(). This check will be skipped during
>>> the initial read in init_archive_reader(), which instead performs it
>>> explicitly once it determines the WAL segment size and the start/end
>>> segments.
>>>
>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>> structure in the separate 0004 patch.
>> Attached is an updated version including the aforesaid changes. It
>> includes a new refactoring patch (0001) that moves the logic for
>> identifying tar archives and their compression types from
>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>> per a suggestion from Euler [1]. Additionally, I have added a test
>> for the contrecord decoding to the main patch (now 0006).
>>
>> 1] http://postgr.es/m/[email protected]
>>
> Rebased against the latest master, fixed typos in code comments, and
> replaced palloc0 with palloc0_object.
>
Hi Amul.
I think this looks in pretty good shape.
Attached are patches for a few things I think could be fixed. They are
mostly self-explanatory. The TAP test fix is the only sane way I could
come up with stopping the skip code you had from reporting a wildly
inaccurate number of tests skipped. The sane way to do this from a
Test::More perspective is a subtest, but unfortunately meson does not
like subtest output, which is why we don't use it elsewhere, so the only
way I could come up with was to split this out into a separate test. Of
course, we might just say we don't care about the misreport, in which
case we could just live with things as they are.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Add pg_verifybackup test for tar-format WAL verification
The new tar-format WAL verification in pg_verifybackup had no test
coverage for the case where pg_basebackup produces a separate
pg_wal.tar (--format=tar --wal-method=stream). Add a test that takes
a tar-format backup and verifies it.
---
src/bin/pg_verifybackup/t/007_wal.pl | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 8ad2234453d..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Split pg_waldump TAP tests into directory and archive files
The original 001_basic.pl mixed directory and tar archive tests in a
single SKIP loop with a hardcoded skip count of 3, but each scenario
actually runs ~19 assertions. When tar is unavailable the skip count
was wrong, and the directory scenario was also wrongly guarded by the
tar-availability check.
Move all archive-related tests (tar, tar.gz) into a new
003_archive.pl that uses plan skip_all when tar is unavailable,
cleanly skipping the entire file. 001_basic.pl retains only
directory-based tests with no SKIP blocks needed.
---
src/bin/pg_waldump/meson.build | 1 +
src/bin/pg_waldump/t/001_basic.pl | 221 ++++++++++-----------------
src/bin/pg_waldump/t/003_archive.pl | 320 +++++++++++++++++++++++++++++++++++
3 files changed, 396 insertions(+), 146 deletions(-)
create mode 100644 src/bin/pg_waldump/t/003_archive.pl
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 5296f21b82c..d2b4bd0c048 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -34,6 +34,7 @@ tests += {
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
+ 't/003_archive.pl',
],
},
}
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 9854c939007..282c9a37221 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,13 +3,9 @@
use strict;
use warnings FATAL => 'all';
-use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
-use List::Util qw(shuffle);
-
-my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -195,8 +191,8 @@ END
$$;
});
-my $contrecord_lsn = $node->safe_psql('postgres',
- 'SELECT pg_current_wal_insert_lsn()');
+my $contrecord_lsn =
+ $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
# Generate contrecord record
$node->safe_psql('postgres',
qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
@@ -299,145 +295,78 @@ sub test_pg_waldump
return @lines;
}
-# Create a tar archive, sorting the file order
-sub generate_archive
-{
- my ($archive, $directory, $compression_flags) = @_;
-
- my @files;
- opendir my $dh, $directory or die "opendir: $!";
- while (my $entry = readdir $dh) {
- # Skip '.' and '..'
- next if $entry eq '.' || $entry eq '..';
- push @files, $entry;
- }
- closedir $dh;
-
- @files = shuffle @files;
-
- # move into the WAL directory before archiving files
- my $cwd = getcwd;
- chdir($directory) || die "chdir: $!";
- command_ok([$tar, $compression_flags, $archive, @files]);
- chdir($cwd) || die "chdir: $!";
-}
-
-my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-
-my @scenarios = (
- {
- 'path' => $node->data_dir,
- 'is_archive' => 0,
- 'enabled' => 1
- },
- {
- 'path' => "$tmp_dir/pg_wal.tar",
- 'compression_method' => 'none',
- 'compression_flags' => '-cf',
- 'is_archive' => 1,
- 'enabled' => 1
- },
- {
- 'path' => "$tmp_dir/pg_wal.tar.gz",
- 'compression_method' => 'gzip',
- 'compression_flags' => '-czf',
- 'is_archive' => 1,
- 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
- });
-
-for my $scenario (@scenarios)
-{
- my $path = $scenario->{'path'};
-
- SKIP:
- {
- skip "tar command is not available", 3
- if !defined $tar;
- skip "$scenario->{'compression_method'} compression not supported by this build", 3
- if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-
- # create pg_wal archive
- if ($scenario->{'is_archive'})
- {
- generate_archive($path,
- $node->data_dir . '/pg_wal',
- $scenario->{'compression_flags'});
- }
-
- command_fails_like(
- [ 'pg_waldump', '--path' => $path ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
- command_like(
- [
- 'pg_waldump',
- '--path' => $path,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
- command_fails_like(
- [
- 'pg_waldump',
- '--path' => $path,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
- command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $path,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
- test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-
- my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
- is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-
- @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
- is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-
- test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
- is(@lines, 6, 'limit option observed');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
- is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
- like($lines[0], qr/WAL statistics/, "statistics on stdout");
- is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
- like($lines[0], qr/WAL statistics/, "statistics on stdout");
- is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
- is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
- is(grep(!/fork init/, @lines), 0, 'only init fork lines');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
- is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
- is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
-
- # Cleanup.
- unlink $path if $scenario->{'is_archive'};
- }
-}
+my $path = $node->data_dir;
+
+command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+
+command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+@lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+is(@lines, 6, 'limit option observed');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+@lines = test_pg_waldump(
+ $path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
done_testing();
new file mode 100644
index 00000000000..c615713efd4
--- /dev/null
+++ b/src/bin/pg_waldump/t/003_archive.pl
@@ -0,0 +1,320 @@
+
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test pg_waldump's ability to read WAL from tar archives.
+
+use strict;
+use warnings FATAL => 'all';
+use Cwd;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
+
+if (!defined $tar)
+{
+ plan skip_all => 'tar command is not available';
+}
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/,
+ $node->safe_psql('postgres',
+ q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
+ );
+
+$node->safe_psql(
+ 'postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELECT generate_series(1, 10);
+
+-- gin
+CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb);
+CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data);
+INSERT INTO gin_idx_tbl
+ WITH random_json AS (
+ SELECT json_object_agg(key, trunc(random() * 10)) as json_data
+ FROM unnest(array['a', 'b', 'c']) as u(key))
+ SELECT generate_series(1,500), json_data FROM random_json;
+
+-- gist, spgist
+CREATE TABLE gist_idx_tbl (p point);
+CREATE INDEX gist_idx ON gist_idx_tbl USING gist (p);
+CREATE INDEX spgist_idx ON gist_idx_tbl USING spgist (p);
+INSERT INTO gist_idx_tbl (p) VALUES (point '(1, 1)'), (point '(3, 2)'), (point '(6, 3)');
+
+-- brin
+CREATE TABLE brin_idx_tbl (col1 int, col2 text, col3 text );
+CREATE INDEX brin_idx ON brin_idx_tbl USING brin (col1, col2, col3) WITH (autosummarize=on);
+INSERT INTO brin_idx_tbl SELECT generate_series(1, 10000), 'dummy', 'dummy';
+UPDATE brin_idx_tbl SET col2 = 'updated' WHERE col1 BETWEEN 1 AND 5000;
+SELECT brin_summarize_range('brin_idx', 0);
+SELECT brin_desummarize_range('brin_idx', 0);
+
+VACUUM;
+
+-- logical message
+SELECT pg_logical_emit_message(true, 'foo', 'bar');
+
+-- relmap
+VACUUM FULL pg_authid;
+
+-- database
+CREATE DATABASE d1;
+DROP DATABASE d1;
+});
+
+my $tblspc_path = PostgreSQL::Test::Utils::tempdir_short();
+
+$node->safe_psql(
+ 'postgres', qq{
+CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
+DROP TABLESPACE ts1;
+});
+
+# Consume all remaining room in the current WAL segment, leaving space enough
+# only for the start of a largish record, to test contrecord decoding.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn =
+ $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
+my ($end_lsn, $end_walfile) = split /\|/,
+ $node->safe_psql('postgres',
+ q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
+ );
+
+$node->stop;
+
+
+sub test_pg_waldump_skip_bytes
+{
+ my ($path, $startlsn, $endlsn) = @_;
+
+ my ($part1, $part2) = split qr{/}, $startlsn;
+ my $lsn2 = hex $part2;
+ $lsn2++;
+ my $new_start = sprintf("%s/%X", $part1, $lsn2);
+
+ my ($stdout, $stderr);
+
+ my $result = IPC::Run::run [
+ 'pg_waldump',
+ '--start' => $new_start,
+ '--end' => $endlsn,
+ '--path' => $path,
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr;
+ ok($result, "runs with start segment and start LSN specified");
+ like($stderr, qr/first record is after/, 'info message printed');
+}
+
+sub test_pg_waldump
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
+
+ my ($stdout, $stderr);
+
+ my $result = IPC::Run::run [
+ 'pg_waldump',
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
+ @opts
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr;
+ ok($result, "pg_waldump @opts: runs ok");
+ is($stderr, '', "pg_waldump @opts: no stderr");
+ my @lines = split /\n/, $stdout;
+ ok(@lines > 0, "pg_waldump @opts: some lines are output");
+ return @lines;
+}
+
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh)
+ {
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = shuffle @files;
+
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([ $tar, $compression_flags, $archive, @files ],
+ "create archive $archive");
+ chdir($cwd) || die "chdir: $!";
+}
+
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
+
+my @scenarios = (
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'enabled' => 1,
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1"),
+ });
+
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
+ my $method = $scenario->{'compression_method'};
+
+ SKIP:
+ {
+ skip "$method compression not supported by this build", 1
+ if !$scenario->{'enabled'};
+
+ generate_archive(
+ $path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ "$method: path option requires start location");
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ "$method: runs with path option and start and end locations");
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ "$method: falling off the end of the WAL results in an error");
+
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ "$method: errors are shown with --quiet");
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines),
+ 0, "$method: all output lines are rmgr lines");
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines),
+ 0, "$method: contrecord - all output lines are rmgr lines");
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, "$method: limit option observed");
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines),
+ 0, "$method: all output lines are FPW");
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "$method: statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, "$method: no rmgr lines output");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/,
+ "$method: stats=record on stdout");
+ is(grep(/^rmgr:/, @lines),
+ 0, "$method: no rmgr lines with stats=record");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, "$method: only Btree lines");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, "$method: only init fork lines");
+
+ # Cleanup.
+ unlink $path;
+ }
+}
+
+done_testing();
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix documentation for pg_waldump tar archive support
Two documentation issues with the tar archive reading feature:
- pg_waldump.sgml: When reading WAL from a tar archive with
out-of-order segments, pg_waldump spills to temporary files. TMPDIR
controls where those files are created, but this was not documented
in the Environment section.
- pg_verifybackup.sgml: The --wal-path option description still only
said "directory" even though it now also accepts tar archives.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 7 ++++---
doc/src/sgml/ref/pg_waldump.sgml | 11 +++++++++++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 16b50b5a4df..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,9 +261,10 @@ PostgreSQL documentation
<term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index b36323dde92..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -391,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix bugs in pg_waldump tar archive support
Fix several bugs introduced by the pg_waldump archive WAL reading
feature:
- pg_waldump.c: The error path for verify_directory() printed waldir
(which is NULL when --path is used) instead of walpath.
- archive_waldump.c: The error message for short reads had an operator
precedence bug: (long long int) count - nbytes cast only count, not
the subtraction result. Also reported nbytes (the requested amount)
instead of count (the total file size) for the "of" portion.
- archive_waldump.c: The "ignoring duplicate WAL" code path leaked
fname (allocated via pnstrdup/palloc). Also changed the existing
free(fname) to pfree(fname) for consistency.
- pg_verifybackup.c: The rename from --wal-directory to --wal-path
didn't preserve the old spelling as a backward-compatible alias.
- pg_verifybackup.c: Fix double space before "Or" in --wal-path
error hint message.
---
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 935ab8fafa8..b0b764913cf 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -131,6 +131,7 @@ main(int argc, char **argv)
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -376,7 +377,7 @@ main(int argc, char **argv)
else
{
pg_log_error("WAL archive not found");
- pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
"Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
exit(1);
}
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index c5a4485b5b1..1479efe61f5 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -344,8 +344,8 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
fname, privateInfo->archive_name,
- (long long int) count - nbytes,
- (long long int) nbytes);
+ (long long int) (count - nbytes),
+ (long long int) count);
}
}
@@ -664,7 +664,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
{
- free(fname);
+ pfree(fname);
break;
}
}
@@ -680,6 +680,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
{
pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
member->pathname, privateInfo->archive_name);
+ pfree(fname);
break;
}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 114969217d8..4b438b53ead 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1223,7 +1223,7 @@ main(int argc, char **argv)
/* validate path points to directory */
else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
Attachments:
[text/plain] cf5955-tar-wal-test.patch.no-cfbot (1.4K, 2-cf5955-tar-wal-test.patch.no-cfbot)
download | inline diff:
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Add pg_verifybackup test for tar-format WAL verification
The new tar-format WAL verification in pg_verifybackup had no test
coverage for the case where pg_basebackup produces a separate
pg_wal.tar (--format=tar --wal-method=stream). Add a test that takes
a tar-format backup and verifies it.
---
src/bin/pg_verifybackup/t/007_wal.pl | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 8ad2234453d..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
[text/plain] cf5955-tap-test-fix.patch.no-cfbot (17.5K, 3-cf5955-tap-test-fix.patch.no-cfbot)
download | inline diff:
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Split pg_waldump TAP tests into directory and archive files
The original 001_basic.pl mixed directory and tar archive tests in a
single SKIP loop with a hardcoded skip count of 3, but each scenario
actually runs ~19 assertions. When tar is unavailable the skip count
was wrong, and the directory scenario was also wrongly guarded by the
tar-availability check.
Move all archive-related tests (tar, tar.gz) into a new
003_archive.pl that uses plan skip_all when tar is unavailable,
cleanly skipping the entire file. 001_basic.pl retains only
directory-based tests with no SKIP blocks needed.
---
src/bin/pg_waldump/meson.build | 1 +
src/bin/pg_waldump/t/001_basic.pl | 221 ++++++++++-----------------
src/bin/pg_waldump/t/003_archive.pl | 320 +++++++++++++++++++++++++++++++++++
3 files changed, 396 insertions(+), 146 deletions(-)
create mode 100644 src/bin/pg_waldump/t/003_archive.pl
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 5296f21b82c..d2b4bd0c048 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -34,6 +34,7 @@ tests += {
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
+ 't/003_archive.pl',
],
},
}
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 9854c939007..282c9a37221 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,13 +3,9 @@
use strict;
use warnings FATAL => 'all';
-use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
-use List::Util qw(shuffle);
-
-my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -195,8 +191,8 @@ END
$$;
});
-my $contrecord_lsn = $node->safe_psql('postgres',
- 'SELECT pg_current_wal_insert_lsn()');
+my $contrecord_lsn =
+ $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
# Generate contrecord record
$node->safe_psql('postgres',
qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
@@ -299,145 +295,78 @@ sub test_pg_waldump
return @lines;
}
-# Create a tar archive, sorting the file order
-sub generate_archive
-{
- my ($archive, $directory, $compression_flags) = @_;
-
- my @files;
- opendir my $dh, $directory or die "opendir: $!";
- while (my $entry = readdir $dh) {
- # Skip '.' and '..'
- next if $entry eq '.' || $entry eq '..';
- push @files, $entry;
- }
- closedir $dh;
-
- @files = shuffle @files;
-
- # move into the WAL directory before archiving files
- my $cwd = getcwd;
- chdir($directory) || die "chdir: $!";
- command_ok([$tar, $compression_flags, $archive, @files]);
- chdir($cwd) || die "chdir: $!";
-}
-
-my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-
-my @scenarios = (
- {
- 'path' => $node->data_dir,
- 'is_archive' => 0,
- 'enabled' => 1
- },
- {
- 'path' => "$tmp_dir/pg_wal.tar",
- 'compression_method' => 'none',
- 'compression_flags' => '-cf',
- 'is_archive' => 1,
- 'enabled' => 1
- },
- {
- 'path' => "$tmp_dir/pg_wal.tar.gz",
- 'compression_method' => 'gzip',
- 'compression_flags' => '-czf',
- 'is_archive' => 1,
- 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
- });
-
-for my $scenario (@scenarios)
-{
- my $path = $scenario->{'path'};
-
- SKIP:
- {
- skip "tar command is not available", 3
- if !defined $tar;
- skip "$scenario->{'compression_method'} compression not supported by this build", 3
- if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-
- # create pg_wal archive
- if ($scenario->{'is_archive'})
- {
- generate_archive($path,
- $node->data_dir . '/pg_wal',
- $scenario->{'compression_flags'});
- }
-
- command_fails_like(
- [ 'pg_waldump', '--path' => $path ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
- command_like(
- [
- 'pg_waldump',
- '--path' => $path,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
- command_fails_like(
- [
- 'pg_waldump',
- '--path' => $path,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
- command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $path,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
- test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-
- my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
- is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-
- @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
- is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-
- test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
- is(@lines, 6, 'limit option observed');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
- is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
- like($lines[0], qr/WAL statistics/, "statistics on stdout");
- is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
- like($lines[0], qr/WAL statistics/, "statistics on stdout");
- is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
- is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
- is(grep(!/fork init/, @lines), 0, 'only init fork lines');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
- is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
-
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
- is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
-
- # Cleanup.
- unlink $path if $scenario->{'is_archive'};
- }
-}
+my $path = $node->data_dir;
+
+command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+
+command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+@lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+is(@lines, 6, 'limit option observed');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+like($lines[0], qr/WAL statistics/, "statistics on stdout");
+is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+@lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+@lines = test_pg_waldump(
+ $path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
done_testing();
new file mode 100644
index 00000000000..c615713efd4
--- /dev/null
+++ b/src/bin/pg_waldump/t/003_archive.pl
@@ -0,0 +1,320 @@
+
+# Copyright (c) 2021-2026, PostgreSQL Global Development Group
+
+# Test pg_waldump's ability to read WAL from tar archives.
+
+use strict;
+use warnings FATAL => 'all';
+use Cwd;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
+
+if (!defined $tar)
+{
+ plan skip_all => 'tar command is not available';
+}
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf(
+ 'postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+
+# for standbydesc
+archive_mode=on
+archive_command=''
+
+# for XLOG_HEAP_TRUNCATE
+wal_level=logical
+});
+$node->start;
+
+my ($start_lsn, $start_walfile) = split /\|/,
+ $node->safe_psql('postgres',
+ q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
+ );
+
+$node->safe_psql(
+ 'postgres', q{
+-- heap, btree, hash, sequence
+CREATE TABLE t1 (a int GENERATED ALWAYS AS IDENTITY, b text);
+CREATE INDEX i1a ON t1 USING btree (a);
+CREATE INDEX i1b ON t1 USING hash (b);
+INSERT INTO t1 VALUES (default, 'one'), (default, 'two');
+DELETE FROM t1 WHERE b = 'one';
+TRUNCATE t1;
+
+-- abort
+START TRANSACTION;
+INSERT INTO t1 VALUES (default, 'three');
+ROLLBACK;
+
+-- unlogged/init fork
+CREATE UNLOGGED TABLE t2 (x int);
+CREATE INDEX i2 ON t2 USING btree (x);
+INSERT INTO t2 SELECT generate_series(1, 10);
+
+-- gin
+CREATE TABLE gin_idx_tbl (id bigserial PRIMARY KEY, data jsonb);
+CREATE INDEX gin_idx ON gin_idx_tbl USING gin (data);
+INSERT INTO gin_idx_tbl
+ WITH random_json AS (
+ SELECT json_object_agg(key, trunc(random() * 10)) as json_data
+ FROM unnest(array['a', 'b', 'c']) as u(key))
+ SELECT generate_series(1,500), json_data FROM random_json;
+
+-- gist, spgist
+CREATE TABLE gist_idx_tbl (p point);
+CREATE INDEX gist_idx ON gist_idx_tbl USING gist (p);
+CREATE INDEX spgist_idx ON gist_idx_tbl USING spgist (p);
+INSERT INTO gist_idx_tbl (p) VALUES (point '(1, 1)'), (point '(3, 2)'), (point '(6, 3)');
+
+-- brin
+CREATE TABLE brin_idx_tbl (col1 int, col2 text, col3 text );
+CREATE INDEX brin_idx ON brin_idx_tbl USING brin (col1, col2, col3) WITH (autosummarize=on);
+INSERT INTO brin_idx_tbl SELECT generate_series(1, 10000), 'dummy', 'dummy';
+UPDATE brin_idx_tbl SET col2 = 'updated' WHERE col1 BETWEEN 1 AND 5000;
+SELECT brin_summarize_range('brin_idx', 0);
+SELECT brin_desummarize_range('brin_idx', 0);
+
+VACUUM;
+
+-- logical message
+SELECT pg_logical_emit_message(true, 'foo', 'bar');
+
+-- relmap
+VACUUM FULL pg_authid;
+
+-- database
+CREATE DATABASE d1;
+DROP DATABASE d1;
+});
+
+my $tblspc_path = PostgreSQL::Test::Utils::tempdir_short();
+
+$node->safe_psql(
+ 'postgres', qq{
+CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
+DROP TABLESPACE ts1;
+});
+
+# Consume all remaining room in the current WAL segment, leaving space enough
+# only for the start of a largish record, to test contrecord decoding.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn =
+ $node->safe_psql('postgres', 'SELECT pg_current_wal_insert_lsn()');
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
+my ($end_lsn, $end_walfile) = split /\|/,
+ $node->safe_psql('postgres',
+ q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
+ );
+
+$node->stop;
+
+
+sub test_pg_waldump_skip_bytes
+{
+ my ($path, $startlsn, $endlsn) = @_;
+
+ my ($part1, $part2) = split qr{/}, $startlsn;
+ my $lsn2 = hex $part2;
+ $lsn2++;
+ my $new_start = sprintf("%s/%X", $part1, $lsn2);
+
+ my ($stdout, $stderr);
+
+ my $result = IPC::Run::run [
+ 'pg_waldump',
+ '--start' => $new_start,
+ '--end' => $endlsn,
+ '--path' => $path,
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr;
+ ok($result, "runs with start segment and start LSN specified");
+ like($stderr, qr/first record is after/, 'info message printed');
+}
+
+sub test_pg_waldump
+{
+ local $Test::Builder::Level = $Test::Builder::Level + 1;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
+
+ my ($stdout, $stderr);
+
+ my $result = IPC::Run::run [
+ 'pg_waldump',
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
+ @opts
+ ],
+ '>' => \$stdout,
+ '2>' => \$stderr;
+ ok($result, "pg_waldump @opts: runs ok");
+ is($stderr, '', "pg_waldump @opts: no stderr");
+ my @lines = split /\n/, $stdout;
+ ok(@lines > 0, "pg_waldump @opts: some lines are output");
+ return @lines;
+}
+
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh)
+ {
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = shuffle @files;
+
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([ $tar, $compression_flags, $archive, @files ],
+ "create archive $archive");
+ chdir($cwd) || die "chdir: $!";
+}
+
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
+
+my @scenarios = (
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'enabled' => 1,
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1"),
+ });
+
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
+ my $method = $scenario->{'compression_method'};
+
+ SKIP:
+ {
+ skip "$method compression not supported by this build", 1
+ if !$scenario->{'enabled'};
+
+ generate_archive(
+ $path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ "$method: path option requires start location");
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ "$method: runs with path option and start and end locations");
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ "$method: falling off the end of the WAL results in an error");
+
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ "$method: errors are shown with --quiet");
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines),
+ 0, "$method: all output lines are rmgr lines");
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines),
+ 0, "$method: contrecord - all output lines are rmgr lines");
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, "$method: limit option observed");
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines),
+ 0, "$method: all output lines are FPW");
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "$method: statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, "$method: no rmgr lines output");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/,
+ "$method: stats=record on stdout");
+ is(grep(/^rmgr:/, @lines),
+ 0, "$method: no rmgr lines with stats=record");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, "$method: only Btree lines");
+
+ @lines =
+ test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, "$method: only init fork lines");
+
+ # Cleanup.
+ unlink $path;
+ }
+}
+
+done_testing();
[text/plain] cf5955-docs.patch.no-cfbot (2.4K, 4-cf5955-docs.patch.no-cfbot)
download | inline diff:
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix documentation for pg_waldump tar archive support
Two documentation issues with the tar archive reading feature:
- pg_waldump.sgml: When reading WAL from a tar archive with
out-of-order segments, pg_waldump spills to temporary files. TMPDIR
controls where those files are created, but this was not documented
in the Environment section.
- pg_verifybackup.sgml: The --wal-path option description still only
said "directory" even though it now also accepts tar archives.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 7 ++++---
doc/src/sgml/ref/pg_waldump.sgml | 11 +++++++++++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 16b50b5a4df..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,9 +261,10 @@ PostgreSQL documentation
<term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index b36323dde92..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -391,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
[text/plain] cf5955-fixes.patch.no-cfbot (3.6K, 5-cf5955-fixes.patch.no-cfbot)
download | inline diff:
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Tue, 3 Mar 2026 00:00:00 +0000
Subject: [PATCH] Fix bugs in pg_waldump tar archive support
Fix several bugs introduced by the pg_waldump archive WAL reading
feature:
- pg_waldump.c: The error path for verify_directory() printed waldir
(which is NULL when --path is used) instead of walpath.
- archive_waldump.c: The error message for short reads had an operator
precedence bug: (long long int) count - nbytes cast only count, not
the subtraction result. Also reported nbytes (the requested amount)
instead of count (the total file size) for the "of" portion.
- archive_waldump.c: The "ignoring duplicate WAL" code path leaked
fname (allocated via pnstrdup/palloc). Also changed the existing
free(fname) to pfree(fname) for consistency.
- pg_verifybackup.c: The rename from --wal-directory to --wal-path
didn't preserve the old spelling as a backward-compatible alias.
- pg_verifybackup.c: Fix double space before "Or" in --wal-path
error hint message.
---
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 935ab8fafa8..b0b764913cf 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -131,6 +131,7 @@ main(int argc, char **argv)
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
{"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -376,7 +377,7 @@ main(int argc, char **argv)
else
{
pg_log_error("WAL archive not found");
- pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
"Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
exit(1);
}
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index c5a4485b5b1..1479efe61f5 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -344,8 +344,8 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
fname, privateInfo->archive_name,
- (long long int) count - nbytes,
- (long long int) nbytes);
+ (long long int) (count - nbytes),
+ (long long int) count);
}
}
@@ -664,7 +664,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
{
- free(fname);
+ pfree(fname);
break;
}
}
@@ -680,6 +680,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
{
pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
member->pathname, privateInfo->archive_name);
+ pfree(fname);
break;
}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 114969217d8..4b438b53ead 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1223,7 +1223,7 @@ main(int argc, char **argv)
/* validate path points to directory */
else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-04 12:52 Amul Sul <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-04 12:52 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan <[email protected]> wrote:
>
>
> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> > On Wed, Feb 18, 2026 at 12:28 PM Amul Sul <[email protected]> wrote:
> >> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul <[email protected]> wrote:
> >>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul <[email protected]> wrote:
> >>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas <[email protected]> wrote:
> >>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul <[email protected]> wrote:
> >>>>>> In the attached version, I am using the WAL segment name as the hash
> >>>>>> key, which is much more straightforward. I have rewritten
> >>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
> >>>>>> logic to discard irrelevant WAL files is still within
> >>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
> >>>>>> NULL, which is now handled in the separate function I mentioned
> >>>>>> previously.
> >>>>>>
> >>>>>> Kindly have a look at the attached version; let me know if you are
> >>>>>> still not happy with the current approach for filtering/discarding
> >>>>>> irrelevant WAL segments. It isn't much different from the previous
> >>>>>> version, but I have tried to keep it in a separate routine for better
> >>>>>> code readability, with comments to make it easier to understand. I
> >>>>>> also added a comment for ArchivedWALFile.
> >>>>> I feel like the division of labor between get_archive_wal_entry() and
> >>>>> read_archive_wal_page() is odd. I noticed this in the last version,
> >>>>> too, and it still seems to be the case. get_archive_wal_entry() first
> >>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> >>>>> If it doesn't, it loops until an entry for the requested file shows up
> >>>>> and then returns it. Then control returns to read_archive_wal_page()
> >>>>> which loops some more until we have all the data we need for the
> >>>>> requested file. But it seems odd to me to have two separate loops
> >>>>> here. I think that the first loop is going to call read_archive_file()
> >>>>> until we find the beginning of the file that we care about and then
> >>>>> the second one is going to call read_archive_file() some more until we
> >>>>> have read enough of it to satisfy the request. It feels odd to me to
> >>>>> do it that way, as if we told somebody to first wait until 9 o'clock
> >>>>> and then wait another 30 minutes, instead of just telling them to wait
> >>>>> until 9:30. I realize it's not quite the same thing, because apart
> >>>>> from calling read_archive_file(), the two loops do different things,
> >>>>> but I still think it looks odd.
> >>>>>
> >>>>> + /*
> >>>>> + * Ignore if the timeline is different or the current segment is not
> >>>>> + * the desired one.
> >>>>> + */
> >>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> >>>>> + if (privateInfo->timeline != curSegTimeline ||
> >>>>> + privateInfo->startSegNo > curSegNo ||
> >>>>> + privateInfo->endSegNo < curSegNo ||
> >>>>> + segno > curSegNo)
> >>>>> + {
> >>>>> + free_archive_wal_entry(entry->fname, privateInfo);
> >>>>> + continue;
> >>>>> + }
> >>>>>
> >>>>> The comment doesn't match the code. If it did, the test would be
> >>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> >>>>> instead the segno test is > rather than !=, and the checks against
> >>>>> startSegNo and endSegNo aren't explained at all. I think I understand
> >>>>> why the segno test uses > rather than !=, but it's the point of the
> >>>>> comment to explain things like that, rather than leaving the reader to
> >>>>> guess. And I don't know why we also need to test startSegNo and
> >>>>> endSegNo.
> >>>>>
> >>>>> I also wonder what the point is of doing XLogFromFileName() on the
> >>>>> fname provided by the caller and then again on entry->fname. Couldn't
> >>>>> you just compare the strings?
> >>>>>
> >>>>> Again, the division of labor is really odd here. It's the job of
> >>>>> astreamer_waldump_content() to skip things that aren't WAL files at
> >>>>> all, but it's the job of get_archive_wal_entry() to skip things that
> >>>>> are WAL files but not the one we want. I disagree with putting those
> >>>>> checks in completely separate parts of the code.
> >>>>>
> >>>> Keeping the timeline and segment start-end range checks inside the
> >>>> archive streamer creates a circular dependency that cannot be resolved
> >>>> without a 'dirty hack'. We must read the first available WAL file page
> >>>> to determine the wal_segment_size before it can calculate the target
> >>>> segment range. Moving the checks inside the streamer would make it
> >>>> impossible to process that initial file, as the necessary filtering
> >>>> parameters -- would still be unknown which would need to be skipped
> >>>> for the first read somehow. What if later we realized that the first
> >>>> WAL file which was allowed to be streamed by skipping that check is
> >>>> irrelevant and doesn't fall under the start-end segment range?
> >>>>
> >>> Please have a look at the attached version, specifically patch 0005.
> >>> In astreamer_waldump_content(), I have moved the WAL file filtration
> >>> check from get_archive_wal_entry(). This check will be skipped during
> >>> the initial read in init_archive_reader(), which instead performs it
> >>> explicitly once it determines the WAL segment size and the start/end
> >>> segments.
> >>>
> >>> To access the WAL segment size inside astreamer_waldump_content(), I
> >>> have moved the WAL segment size variable into the XLogDumpPrivate
> >>> structure in the separate 0004 patch.
> >> Attached is an updated version including the aforesaid changes. It
> >> includes a new refactoring patch (0001) that moves the logic for
> >> identifying tar archives and their compression types from
> >> pg_basebackup and pg_verifybackup into a separate-reusable function,
> >> per a suggestion from Euler [1]. Additionally, I have added a test
> >> for the contrecord decoding to the main patch (now 0006).
> >>
> >> 1] http://postgr.es/m/[email protected]
> >>
> > Rebased against the latest master, fixed typos in code comments, and
> > replaced palloc0 with palloc0_object.
> >
>
> Hi Amul.
>
>
> I think this looks in pretty good shape.
>
Thank you very much for looking at the patch.
> Attached are patches for a few things I think could be fixed. They are
> mostly self-explanatory. The TAP test fix is the only sane way I could
> come up with stopping the skip code you had from reporting a wildly
> inaccurate number of tests skipped. The sane way to do this from a
> Test::More perspective is a subtest, but unfortunately meson does not
> like subtest output, which is why we don't use it elsewhere, so the only
> way I could come up with was to split this out into a separate test. Of
> course, we might just say we don't care about the misreport, in which
> case we could just live with things as they are.
>
I agree that the reported skip number was incorrect, and I have
corrected it in the attached patch. I haven't applied your patch for
the TAP test improvements yet because I wanted to double-check it
first with you; the patch as it stood created duplicate tests already
present in 001_basic.pl. To avoid this duplication, I have added a
loop that performs tests for both plain and tar WAL directory inputs,
similar to the approach used in pg_verifybackup for different
compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
don't have any objection to doing so if you feel the duplication is
acceptable, but I feel that using a loop for the tests in 001_basic.pl
is a bit tidier. Let me know your thoughts.
I have applied all your other patches but skipped the changes to
pg_verifybackup.c from cf5955-fixes.patch.no-cfbot, as they seem
unrelated or perhaps I have misunderstood them.
Regards,
Amul
Attachments:
[application/x-patch] v15-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch (6.7K, 2-v15-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch)
download | inline diff:
From 54fd70f2b5df10e6df575b4f85eaecb8a3c1ff94 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v15 01/11] Refactor: Move tar archive parsing into a common
location.
pg_basebackup and pg_verifybackup both require logic to identify tar
files and determine their compression types. Similar functionality
will be needed for pg_waldump when it gets the capability to decode
WAL files from tar archives. Moving this logic to a common location
allows for reuse and prevents code duplication.
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..f117e21237f 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..50f21656b88 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v15-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 3-v15-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
download | inline diff:
From 14706302872c7e35934345fe75e1f24a5857ad16 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v15 02/11] Refactor: pg_waldump: Move some declarations to new
pg_waldump.h
This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
src/bin/pg_waldump/pg_waldump.c | 9 +--------
src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..4b7411a6498 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..64a9109229e
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v15-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 4-v15-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
download | inline diff:
From e62670767a8164ca8c0a289aad05f24c3e84f8cc Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:38:16 +0530
Subject: [PATCH v15 03/11] Refactor: pg_waldump: Separate logic used to
calculate the required read size.
This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b7411a6498..958a71a01cf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
--
2.47.1
[application/x-patch] v15-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch (6.6K, 5-v15-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch)
download | inline diff:
From be1fbe441570c0aef766eed410eb3465f2450b53 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v15 04/11] Refactor: pg_waldump: Restructure TAP tests.
Restructured tests that do not have a WAL file argument to run within
a loop, facilitating their re-execution for decoding WAL from tar
archives.
== NOTE ==
This is not intended to be committed separately. It can be merged
with the next patch, which is the main patch implementing this
feature.
---
src/bin/pg_waldump/t/001_basic.pl | 140 +++++++++++++++++-------------
1 file changed, 79 insertions(+), 61 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..f12ba52cbfc 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +205,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +224,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +239,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -288,38 +261,83 @@ sub test_pg_waldump
my @lines;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir
+ });
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ SKIP:
+ {
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ }
+}
done_testing();
--
2.47.1
[application/x-patch] v15-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch (5.1K, 6-v15-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch)
download | inline diff:
From 8bb8dc6afe753f885520429613966f8cedc2b477 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 4 Feb 2026 15:31:51 +0530
Subject: [PATCH v15 05/11] Refactor: pg_waldump: Move WAL segment size to
XLogDumpPrivate.
Relocate the WAL segment size variable to the XLogDumpPrivate
structure and rename it to segsize for consistency. This change is
required to make the segment size accessible to the archive streamer
code, where passing it as a function argument is not feasible.
---
src/bin/pg_waldump/pg_waldump.c | 26 +++++++++++++-------------
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 958a71a01cf..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -811,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -865,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1138,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1159,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1175,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1190,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1200,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1213,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1234,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 64a9109229e..013b051506f 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -17,6 +17,7 @@
typedef struct XLogDumpPrivate
{
TimeLineID timeline;
+ int segsize;
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
--
2.47.1
[application/x-patch] v15-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch (41.8K, 7-v15-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
download | inline diff:
From c8ff1a06931bc3690e27c03197f825ce8ca29a27 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 10 Feb 2026 11:42:36 +0530
Subject: [PATCH v15 06/11] pg_waldump: Add support for archived WAL decoding.
pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.
Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 639 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 257 ++++++++---
src/bin/pg_waldump/pg_waldump.h | 43 ++
src/bin/pg_waldump/t/001_basic.pl | 105 ++++-
src/tools/pgindent/typedefs.list | 3 +
8 files changed, 999 insertions(+), 67 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..15fb8d13199 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided, its WAL segment files must be in
+ sequential order; otherwise, an error will be reported.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..4a95b47b4da
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,639 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as it moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* Before that we must parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data; raise an error if it's not the current
+ * segment being read by the archive streamer or if reading of the
+ * archived file has finished.
+ */
+ if (privateInfo->cur_file != entry ||
+ read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have either have successfully read all the requested bytes or
+ * reported a failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. Although we could
+ * return a boolean since we either successfully read the WAL page or
+ * raise an error, but the caller expects this value to be returned. The
+ * routine that reads WAL pages from the physical WAL file follows the
+ * same convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateinfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /* Fetch more data */
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (privateInfo->cur_file == NULL)
+ continue;
+
+ entry = privateInfo->cur_file;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /* WAL segments must be archived in order */
+ pg_log_error("WAL files are not archived in sequential order");
+ pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
+ fname, entry->fname);
+ exit(1);
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+ char *buffer;
+
+ buffer = pg_malloc(count * sizeof(uint8));
+
+ rc = read(privateInfo->archive_fd, buffer, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ buffer, rc, ASTREAMER_UNKNOWN);
+ pg_free(buffer);
+
+ return rc;
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..a18c56a7322 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,80 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer space
+ * occupied by the previous segment data. Since pg_waldump never requests
+ * the same WAL bytes twice, moving to a new segment implies the previous
+ * buffer's data and that segment will not be needed again.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete any
+ * entries that might be requested again once the decoding loop starts.
+ * For more details, see the comments in read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+ }
+
+ /* Read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +851,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +884,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +944,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1023,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,12 +1187,21 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
@@ -1128,6 +1217,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1238,76 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1212,12 +1319,36 @@ main(int argc, char **argv)
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1376,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1460,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..54d54a8a718 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,11 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +26,44 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index f12ba52cbfc..6f8ce319841 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
program_options_handling_ok('pg_waldump');
@@ -162,6 +165,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -259,11 +298,50 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = sort @files;
+
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
my @scenarios = (
{
- 'path' => $node->data_dir
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
});
for my $scenario (@scenarios)
@@ -272,6 +350,19 @@ for my $scenario (@scenarios)
SKIP:
{
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
+
command_fails_like(
[ 'pg_waldump', '--path' => $path ],
qr/error: no start WAL location given/,
@@ -305,9 +396,14 @@ for my $scenario (@scenarios)
test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
is(@lines, 6, 'limit option observed');
@@ -337,6 +433,9 @@ for my $scenario (@scenarios)
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
'--block' => 1);
is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
}
}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 77e3c04144e..595ad7d5c5a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -145,6 +145,8 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3513,6 +3515,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v15-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.8K, 8-v15-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
download | inline diff:
From c5b0a92f4808816108bdff02e5c137280749d01c Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 27 Jan 2026 15:38:34 +0530
Subject: [PATCH v15 07/11] pg_waldump: Remove the restriction on the order of
archived WAL files.
With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue. Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
doc/src/sgml/ref/pg_waldump.sgml | 19 ++-
src/bin/pg_waldump/archive_waldump.c | 172 +++++++++++++++++++++++++--
src/bin/pg_waldump/pg_waldump.c | 32 ++++-
src/bin/pg_waldump/pg_waldump.h | 3 +
src/bin/pg_waldump/t/001_basic.pl | 3 +-
5 files changed, 209 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 15fb8d13199..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
of <envar>PGDATA</envar>.
</para>
<para>
- If a tar archive is provided, its WAL segment files must be in
- sequential order; otherwise, an error will be reported.
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
</para>
</listitem>
</varlistentry>
@@ -387,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 4a95b47b4da..1479efe61f5 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "access/xlog_internal.h"
+#include "common/file_perm.h"
#include "common/hashfn.h"
#include "common/logging.h"
#include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
/*
* Check if the start segment number is zero; this indicates a request to read
* any WAL file.
@@ -57,6 +61,8 @@ typedef struct ArchivedWALFile
const char *fname; /* hash key: WAL segment name */
StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
int read_len; /* total bytes of a WAL read from archive */
} ArchivedWALFile;
@@ -84,6 +90,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo,
int WalSegSz);
static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
static void astreamer_waldump_content(astreamer *streamer,
@@ -106,7 +117,9 @@ static const astreamer_ops astreamer_waldump_ops = {
/*
* Initializes the tar archive reader, creates a hash table for WAL entries,
* checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
*/
void
init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -199,6 +212,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
}
/*
@@ -365,6 +385,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
destroyStringInfo(entry->buf);
entry->buf = NULL;
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
/* Set cur_file to NULL if it matches the entry being ignored */
if (privateInfo->cur_file == entry)
privateInfo->cur_file = NULL;
@@ -376,12 +407,16 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* Returns the archived WAL entry from the hash table if it exists. Otherwise,
* it invokes the routine to read the archived file, which then populates the
* entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
int WalSegSz)
{
ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
/* Search hash table */
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
@@ -395,28 +430,59 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
*/
while (1)
{
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
/* Fetch more data */
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
- break; /* archive file ended */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
/*
* Archived streamer is reading a non-WAL file or an irrelevant WAL
* file.
*/
- if (privateInfo->cur_file == NULL)
+ if (entry == NULL)
continue;
- entry = privateInfo->cur_file;
-
/* Found the required entry */
if (strcmp(fname, entry->fname) == 0)
return entry;
- /* WAL segments must be archived in order */
- pg_log_error("WAL files are not archived in sequential order");
- pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
- fname, entry->fname);
- exit(1);
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
}
/* Requested WAL segment not found */
@@ -454,7 +520,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
}
/*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
*/
static astreamer *
astreamer_waldump_new(XLogDumpPrivate *privateInfo)
@@ -538,6 +685,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
}
entry->buf = makeStringInfo();
+ entry->spilled = false;
entry->read_len = 0;
privateInfo->cur_file = entry;
}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a18c56a7322..4b438b53ead 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -478,10 +478,14 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return -1;
/*
- * If the target page is in a different segment, free the buffer space
- * occupied by the previous segment data. Since pg_waldump never requests
- * the same WAL bytes twice, moving to a new segment implies the previous
- * buffer's data and that segment will not be needed again.
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
*/
curSegNo = state->seg.ws_segno;
if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
@@ -497,6 +501,13 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
state->seg.ws_tli = private->timeline;
state->seg.ws_segno = nextSegNo;
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
/*
* If in pre-reading mode (prior to actual decoding), do not delete any
* entries that might be requested again once the decoding loop starts.
@@ -507,9 +518,20 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
free_archive_wal_entry(fname, private);
}
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
}
- /* Read the WAL page from the archive streamer */
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
return read_archive_wal_page(private, targetPagePtr, count, readBuff,
WalSegSz);
}
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 54d54a8a718..6c242b7fcbc 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -18,6 +18,9 @@
struct ArchivedWALFile;
struct ArchivedWAL_hash;
+/* Temporary directory */
+extern char *TmpWalSegDir;
+
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
{
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 6f8ce319841..6960bd46ba4 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
my $tar = $ENV{TAR};
@@ -312,7 +313,7 @@ sub generate_archive
}
closedir $dh;
- @files = sort @files;
+ @files = shuffle @files;
# move into the WAL directory before archiving files
my $cwd = getcwd;
--
2.47.1
[application/x-patch] v15-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 9-v15-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
download | inline diff:
From 42c939b5d33b160d86ff01c21d61e5a68170b415 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v15 08/11] pg_verifybackup: Delay default WAL directory
preparation.
We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..8cc204719ee 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /* By default, look for the WAL in the backup directory, too. */
+ if (wal_directory == NULL)
+ wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
--
2.47.1
[application/x-patch] v15-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 10-v15-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
download | inline diff:
From dfa159e43527c0705b3b1b14303775c6b55b80f7 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v15 09/11] pg_verifybackup: Rename the wal-directory switch to
wal-path
With previous patches to pg_waldump can now decode WAL directly from
tar files. This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.
To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.
== NOTE ==
The corresponding PO files require updating due to this change.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++-----------
src/bin/pg_verifybackup/t/007_wal.pl | 4 ++--
3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 8cc204719ee..34520546bc3 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +126,7 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
{NULL, 0, NULL, 0}
};
@@ -135,7 +135,7 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +221,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -365,15 +365,15 @@ main(int argc, char **argv)
verify_backup_checksums(&context);
/* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+ if (wal_path == NULL)
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -1188,7 +1188,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1198,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1366,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
--
2.47.1
[application/x-patch] v15-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch (11.5K, 11-v15-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch)
download | inline diff:
From 72cd114f16823c3faee5adebdb1605833c835743 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v15 10/11] pg_verifybackup: Enabled WAL parsing for tar-format
backup
Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 12 ++--
src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 --
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 16 +++++
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 70 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -264,9 +261,10 @@ PostgreSQL documentation
<term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 34520546bc3..935ab8fafa8 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -136,6 +140,8 @@ main(int argc, char **argv)
bool no_parse_wal = false;
bool quiet = false;
char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -327,17 +333,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -346,7 +341,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -364,9 +359,28 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
- /* By default, look for the WAL in the backup directory, too. */
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
if (wal_path == NULL)
- wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 8ad2234453d..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-04 21:50 Andrew Dunstan <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Dunstan @ 2026-03-04 21:50 UTC (permalink / raw)
To: Amul Sul <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On 2026-03-04 We 7:52 AM, Amul Sul wrote:
> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<[email protected]> wrote:
>>
>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<[email protected]> wrote:
>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<[email protected]> wrote:
>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<[email protected]> wrote:
>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<[email protected]> wrote:
>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<[email protected]> wrote:
>>>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>>>> key, which is much more straightforward. I have rewritten
>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>>>> logic to discard irrelevant WAL files is still within
>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>>>> previously.
>>>>>>>>
>>>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>>>> still not happy with the current approach for filtering/discarding
>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>>>> code readability, with comments to make it easier to understand. I
>>>>>>>> also added a comment for ArchivedWALFile.
>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>>>> which loops some more until we have all the data we need for the
>>>>>>> requested file. But it seems odd to me to have two separate loops
>>>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>>>> until we find the beginning of the file that we care about and then
>>>>>>> the second one is going to call read_archive_file() some more until we
>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>>>> from calling read_archive_file(), the two loops do different things,
>>>>>>> but I still think it looks odd.
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>>>> + * the desired one.
>>>>>>> + */
>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>>>> + segno > curSegNo)
>>>>>>> + {
>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>>>> + continue;
>>>>>>> + }
>>>>>>>
>>>>>>> The comment doesn't match the code. If it did, the test would be
>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>>>> instead the segno test is > rather than !=, and the checks against
>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>>>> comment to explain things like that, rather than leaving the reader to
>>>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>>>> endSegNo.
>>>>>>>
>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>>>> you just compare the strings?
>>>>>>>
>>>>>>> Again, the division of labor is really odd here. It's the job of
>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>>>> checks in completely separate parts of the code.
>>>>>>>
>>>>>> Keeping the timeline and segment start-end range checks inside the
>>>>>> archive streamer creates a circular dependency that cannot be resolved
>>>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>>>> to determine the wal_segment_size before it can calculate the target
>>>>>> segment range. Moving the checks inside the streamer would make it
>>>>>> impossible to process that initial file, as the necessary filtering
>>>>>> parameters -- would still be unknown which would need to be skipped
>>>>>> for the first read somehow. What if later we realized that the first
>>>>>> WAL file which was allowed to be streamed by skipping that check is
>>>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>>>
>>>>> Please have a look at the attached version, specifically patch 0005.
>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>>>> check from get_archive_wal_entry(). This check will be skipped during
>>>>> the initial read in init_archive_reader(), which instead performs it
>>>>> explicitly once it determines the WAL segment size and the start/end
>>>>> segments.
>>>>>
>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>>>> structure in the separate 0004 patch.
>>>> Attached is an updated version including the aforesaid changes. It
>>>> includes a new refactoring patch (0001) that moves the logic for
>>>> identifying tar archives and their compression types from
>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>>>> per a suggestion from Euler [1]. Additionally, I have added a test
>>>> for the contrecord decoding to the main patch (now 0006).
>>>>
>>>> 1]http://postgr.es/m/[email protected]
>>>>
>>> Rebased against the latest master, fixed typos in code comments, and
>>> replaced palloc0 with palloc0_object.
>>>
>> Hi Amul.
>>
>>
>> I think this looks in pretty good shape.
>>
> Thank you very much for looking at the patch.
>
>> Attached are patches for a few things I think could be fixed. They are
>> mostly self-explanatory. The TAP test fix is the only sane way I could
>> come up with stopping the skip code you had from reporting a wildly
>> inaccurate number of tests skipped. The sane way to do this from a
>> Test::More perspective is a subtest, but unfortunately meson does not
>> like subtest output, which is why we don't use it elsewhere, so the only
>> way I could come up with was to split this out into a separate test. Of
>> course, we might just say we don't care about the misreport, in which
>> case we could just live with things as they are.
>>
> I agree that the reported skip number was incorrect, and I have
> corrected it in the attached patch. I haven't applied your patch for
> the TAP test improvements yet because I wanted to double-check it
> first with you; the patch as it stood created duplicate tests already
> present in 001_basic.pl. To avoid this duplication, I have added a
> loop that performs tests for both plain and tar WAL directory inputs,
> similar to the approach used in pg_verifybackup for different
> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> don't have any objection to doing so if you feel the duplication is
> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> is a bit tidier. Let me know your thoughts.
I will take a look.
>
> I have applied all your other patches but skipped the changes to
> pg_verifybackup.c from cf5955-fixes.patch.no-cfbot, as they seem
> unrelated or perhaps I have misunderstood them.
<brown-paper-bag> That's what I get for using a poorly written tool.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-09 12:26 Amul Sul <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-09 12:26 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sat, Mar 7, 2026 at 3:51 AM Andrew Dunstan <[email protected]> wrote:
>
>
> On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:
> >
> >
> > On 2026-03-04 We 7:52 AM, Amul Sul wrote:
> >> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<[email protected]> wrote:
> >>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
> >>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<[email protected]> wrote:
> >>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<[email protected]> wrote:
> >>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<[email protected]> wrote:
> >>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<[email protected]> wrote:
> >>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<[email protected]> wrote:
> >>>>>>>>> In the attached version, I am using the WAL segment name as the hash
> >>>>>>>>> key, which is much more straightforward. I have rewritten
> >>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
> >>>>>>>>> logic to discard irrelevant WAL files is still within
> >>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
> >>>>>>>>> NULL, which is now handled in the separate function I mentioned
> >>>>>>>>> previously.
> >>>>>>>>>
> >>>>>>>>> Kindly have a look at the attached version; let me know if you are
> >>>>>>>>> still not happy with the current approach for filtering/discarding
> >>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
> >>>>>>>>> version, but I have tried to keep it in a separate routine for better
> >>>>>>>>> code readability, with comments to make it easier to understand. I
> >>>>>>>>> also added a comment for ArchivedWALFile.
> >>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
> >>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
> >>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
> >>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> >>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
> >>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
> >>>>>>>> which loops some more until we have all the data we need for the
> >>>>>>>> requested file. But it seems odd to me to have two separate loops
> >>>>>>>> here. I think that the first loop is going to call read_archive_file()
> >>>>>>>> until we find the beginning of the file that we care about and then
> >>>>>>>> the second one is going to call read_archive_file() some more until we
> >>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
> >>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
> >>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
> >>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
> >>>>>>>> from calling read_archive_file(), the two loops do different things,
> >>>>>>>> but I still think it looks odd.
> >>>>>>>>
> >>>>>>>> + /*
> >>>>>>>> + * Ignore if the timeline is different or the current segment is not
> >>>>>>>> + * the desired one.
> >>>>>>>> + */
> >>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> >>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
> >>>>>>>> + privateInfo->startSegNo > curSegNo ||
> >>>>>>>> + privateInfo->endSegNo < curSegNo ||
> >>>>>>>> + segno > curSegNo)
> >>>>>>>> + {
> >>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
> >>>>>>>> + continue;
> >>>>>>>> + }
> >>>>>>>>
> >>>>>>>> The comment doesn't match the code. If it did, the test would be
> >>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> >>>>>>>> instead the segno test is > rather than !=, and the checks against
> >>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
> >>>>>>>> why the segno test uses > rather than !=, but it's the point of the
> >>>>>>>> comment to explain things like that, rather than leaving the reader to
> >>>>>>>> guess. And I don't know why we also need to test startSegNo and
> >>>>>>>> endSegNo.
> >>>>>>>>
> >>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
> >>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
> >>>>>>>> you just compare the strings?
> >>>>>>>>
> >>>>>>>> Again, the division of labor is really odd here. It's the job of
> >>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
> >>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
> >>>>>>>> are WAL files but not the one we want. I disagree with putting those
> >>>>>>>> checks in completely separate parts of the code.
> >>>>>>>>
> >>>>>>> Keeping the timeline and segment start-end range checks inside the
> >>>>>>> archive streamer creates a circular dependency that cannot be resolved
> >>>>>>> without a 'dirty hack'. We must read the first available WAL file page
> >>>>>>> to determine the wal_segment_size before it can calculate the target
> >>>>>>> segment range. Moving the checks inside the streamer would make it
> >>>>>>> impossible to process that initial file, as the necessary filtering
> >>>>>>> parameters -- would still be unknown which would need to be skipped
> >>>>>>> for the first read somehow. What if later we realized that the first
> >>>>>>> WAL file which was allowed to be streamed by skipping that check is
> >>>>>>> irrelevant and doesn't fall under the start-end segment range?
> >>>>>>>
> >>>>>> Please have a look at the attached version, specifically patch 0005.
> >>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
> >>>>>> check from get_archive_wal_entry(). This check will be skipped during
> >>>>>> the initial read in init_archive_reader(), which instead performs it
> >>>>>> explicitly once it determines the WAL segment size and the start/end
> >>>>>> segments.
> >>>>>>
> >>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
> >>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
> >>>>>> structure in the separate 0004 patch.
> >>>>> Attached is an updated version including the aforesaid changes. It
> >>>>> includes a new refactoring patch (0001) that moves the logic for
> >>>>> identifying tar archives and their compression types from
> >>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
> >>>>> per a suggestion from Euler [1]. Additionally, I have added a test
> >>>>> for the contrecord decoding to the main patch (now 0006).
> >>>>>
> >>>>> 1]http://postgr.es/m/[email protected]
> >>>>>
> >>>> Rebased against the latest master, fixed typos in code comments, and
> >>>> replaced palloc0 with palloc0_object.
> >>>>
> >>> Hi Amul.
> >>>
> >>>
> >>> I think this looks in pretty good shape.
> >>>
> >> Thank you very much for looking at the patch.
> >>
> >>> Attached are patches for a few things I think could be fixed. They are
> >>> mostly self-explanatory. The TAP test fix is the only sane way I could
> >>> come up with stopping the skip code you had from reporting a wildly
> >>> inaccurate number of tests skipped. The sane way to do this from a
> >>> Test::More perspective is a subtest, but unfortunately meson does not
> >>> like subtest output, which is why we don't use it elsewhere, so the only
> >>> way I could come up with was to split this out into a separate test. Of
> >>> course, we might just say we don't care about the misreport, in which
> >>> case we could just live with things as they are.
> >>>
> >> I agree that the reported skip number was incorrect, and I have
> >> corrected it in the attached patch. I haven't applied your patch for
> >> the TAP test improvements yet because I wanted to double-check it
> >> first with you; the patch as it stood created duplicate tests already
> >> present in 001_basic.pl. To avoid this duplication, I have added a
> >> loop that performs tests for both plain and tar WAL directory inputs,
> >> similar to the approach used in pg_verifybackup for different
> >> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> >> don't have any objection to doing so if you feel the duplication is
> >> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> >> is a bit tidier. Let me know your thoughts.
> >
> >
> > I will take a look.
> >
>
> I'm ok, with doing it this way. It's just a bit fragile - if we add a
> test the number will be wrong. But maybe it's not worth worrying about.
>
> Everything else looks fairly good. The attached fixes a few relatively
> minor issues in v15. The main one is that it stops allocating/freeing a
> buffer every time we call read_archive_file() and instead adds a
> reusable buffer. It also adds back wal-directory as an undocumented
> alias of wal-path, to avoid breaking legacy scripts unnecessarily, and
> adds constness to the fname argument of pg_tar_compress_algorithm, as
> well as fixing some indentation and grammar issues.
>
> All in all I think we're in good shape.
Thanks for the review. I have incorporated your suggested changes,
with one exception: I have skipped the buffer reallocation code in
read_archive_file(). Since we only handle two specific read sizes --
XLOG_BLCKSZ and READ_CHUNK_SIZE (128 KB, we defined in
archive_waldump.c) -- dynamic reallocation seems unnecessary. Instead,
I moved the allocation to init_archive_reader(), which now initializes
a buffer at READ_CHUNK_SIZE. I also added an assertion in
read_archive_file() to ensure that no read request exceeds this
allocated capacity.
Kindly have a look at the attached version and let me know your thoughts.
Regards,
Amul
Attachments:
[application/x-patch] v16-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch (6.7K, 2-v16-0001-Refactor-Move-tar-archive-parsing-into-a-common-.patch)
download | inline diff:
From 5f5be6940651f89ea843a6ee98eeab0087fab8fa Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v16 01/11] Refactor: Move tar archive parsing into a common
location.
pg_basebackup and pg_verifybackup both require logic to identify tar
files and determine their compression types. Similar functionality
will be needed for pg_waldump when it gets the capability to decode
WAL files from tar archives. Moving this logic to a common location
allows for reuse and prevents code duplication.
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..fb27501d297 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v16-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.2K, 3-v16-0002-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
download | inline diff:
From a9a044df26e1ed14fe9eeabe5a2479e5456fa7ff Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v16 02/11] Refactor: pg_waldump: Move some declarations to new
pg_waldump.h
This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
src/bin/pg_waldump/pg_waldump.c | 9 +--------
src/bin/pg_waldump/pg_waldump.h | 25 +++++++++++++++++++++++++
2 files changed, 26 insertions(+), 8 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..4b7411a6498 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..64a9109229e
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v16-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.4K, 4-v16-0003-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
download | inline diff:
From 6dbd37969b3972b35ce7542cf893cfd2c38ec137 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:38:16 +0530
Subject: [PATCH v16 03/11] Refactor: pg_waldump: Separate logic used to
calculate the required read size.
This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
src/bin/pg_waldump/pg_waldump.c | 43 +++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 4b7411a6498..958a71a01cf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -326,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -383,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
--
2.47.1
[application/x-patch] v16-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch (6.6K, 5-v16-0004-Refactor-pg_waldump-Restructure-TAP-tests.patch)
download | inline diff:
From 805bb1a6dac9f26e5d145fa7eb84d8cb3478ec85 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v16 04/11] Refactor: pg_waldump: Restructure TAP tests.
Restructured tests that do not have a WAL file argument to run within
a loop, facilitating their re-execution for decoding WAL from tar
archives.
== NOTE ==
This is not intended to be committed separately. It can be merged
with the next patch, which is the main patch implementing this
feature.
---
src/bin/pg_waldump/t/001_basic.pl | 140 +++++++++++++++++-------------
1 file changed, 79 insertions(+), 61 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..f12ba52cbfc 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +205,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +224,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +239,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -288,38 +261,83 @@ sub test_pg_waldump
my @lines;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir
+ });
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ SKIP:
+ {
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ }
+}
done_testing();
--
2.47.1
[application/x-patch] v16-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch (5.1K, 6-v16-0005-Refactor-pg_waldump-Move-WAL-segment-size-to-XLo.patch)
download | inline diff:
From f611a208cea878c44c2f983877c949b051443602 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 4 Feb 2026 15:31:51 +0530
Subject: [PATCH v16 05/11] Refactor: pg_waldump: Move WAL segment size to
XLogDumpPrivate.
Relocate the WAL segment size variable to the XLogDumpPrivate
structure and rename it to segsize for consistency. This change is
required to make the segment size accessible to the archive streamer
code, where passing it as a function argument is not feasible.
---
src/bin/pg_waldump/pg_waldump.c | 26 +++++++++++++-------------
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 958a71a01cf..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -811,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -865,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1138,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1159,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1175,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1190,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1200,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1213,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1234,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 64a9109229e..013b051506f 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -17,6 +17,7 @@
typedef struct XLogDumpPrivate
{
TimeLineID timeline;
+ int segsize;
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
--
2.47.1
[application/x-patch] v16-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch (42.5K, 7-v16-0006-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
download | inline diff:
From fcf25ba0adf9ed1976f1b205c11a08866defb498 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 10 Feb 2026 11:42:36 +0530
Subject: [PATCH v16 06/11] pg_waldump: Add support for archived WAL decoding.
pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.
Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
doc/src/sgml/ref/pg_waldump.sgml | 8 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 653 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 257 ++++++++---
src/bin/pg_waldump/pg_waldump.h | 45 ++
src/bin/pg_waldump/t/001_basic.pl | 105 ++++-
src/tools/pgindent/typedefs.list | 3 +
8 files changed, 1015 insertions(+), 67 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..15fb8d13199 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided, its WAL segment files must be in
+ sequential order; otherwise, an error will be reported.
+ </para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..0936ffc0a75
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,653 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as it moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* Before that we must parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* Before that we must decompress, if archive is compressed. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data; raise an error if it's not the current
+ * segment being read by the archive streamer or if reading of the
+ * archived file has finished.
+ */
+ if (privateInfo->cur_file != entry ||
+ read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. We could return a
+ * boolean since we either successfully read the WAL page or raise an
+ * error, but the caller expects this value to be returned. The routine
+ * that reads WAL pages from the physical WAL file follows the same
+ * convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateInfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /* Fetch more data */
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (privateInfo->cur_file == NULL)
+ continue;
+
+ entry = privateInfo->cur_file;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /* WAL segments must be archived in order */
+ pg_log_error("WAL files are not archived in sequential order");
+ pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
+ fname, entry->fname);
+ exit(1);
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..f0b8116ff14 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,81 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer space
+ * occupied by the previous segment data. Since pg_waldump never requests
+ * the same WAL bytes twice, moving to a new segment implies the previous
+ * buffer's data and that segment will not be needed again.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+ }
+
+ /* Read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +852,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +885,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +945,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1024,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,12 +1188,21 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
@@ -1128,6 +1218,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1239,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1212,12 +1319,36 @@ main(int argc, char **argv)
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1376,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1460,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..62054bc74c0 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,11 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +26,46 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+ Size archive_read_buf_size;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index f12ba52cbfc..6f8ce319841 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+my $tar = $ENV{TAR};
+
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
program_options_handling_ok('pg_waldump');
@@ -162,6 +165,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -259,11 +298,50 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
+
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
+
+ @files = sort @files;
+
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
my @scenarios = (
{
- 'path' => $node->data_dir
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
});
for my $scenario (@scenarios)
@@ -272,6 +350,19 @@ for my $scenario (@scenarios)
SKIP:
{
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
+
command_fails_like(
[ 'pg_waldump', '--path' => $path ],
qr/error: no start WAL location given/,
@@ -305,9 +396,14 @@ for my $scenario (@scenarios)
test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
- @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
@lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
is(@lines, 6, 'limit option observed');
@@ -337,6 +433,9 @@ for my $scenario (@scenarios)
'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
'--block' => 1);
is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
}
}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3250564d4ff..d849293e6fa 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -145,6 +145,8 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3514,6 +3516,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v16-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.8K, 8-v16-0007-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
download | inline diff:
From bf3e02016deca04eb7948dae32d2c83688e0f4a9 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 27 Jan 2026 15:38:34 +0530
Subject: [PATCH v16 07/11] pg_waldump: Remove the restriction on the order of
archived WAL files.
With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue. Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
doc/src/sgml/ref/pg_waldump.sgml | 19 ++-
src/bin/pg_waldump/archive_waldump.c | 172 +++++++++++++++++++++++++--
src/bin/pg_waldump/pg_waldump.c | 32 ++++-
src/bin/pg_waldump/pg_waldump.h | 3 +
src/bin/pg_waldump/t/001_basic.pl | 3 +-
5 files changed, 209 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index 15fb8d13199..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
of <envar>PGDATA</envar>.
</para>
<para>
- If a tar archive is provided, its WAL segment files must be in
- sequential order; otherwise, an error will be reported.
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
</para>
</listitem>
</varlistentry>
@@ -387,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 0936ffc0a75..547a5154cb6 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "access/xlog_internal.h"
+#include "common/file_perm.h"
#include "common/hashfn.h"
#include "common/logging.h"
#include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
*/
#define READ_CHUNK_SIZE (128 * 1024)
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
/*
* Check if the start segment number is zero; this indicates a request to read
* any WAL file.
@@ -57,6 +61,8 @@ typedef struct ArchivedWALFile
const char *fname; /* hash key: WAL segment name */
StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
int read_len; /* total bytes of a WAL read from archive */
} ArchivedWALFile;
@@ -84,6 +90,11 @@ static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo,
int WalSegSz);
static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
static void astreamer_waldump_content(astreamer *streamer,
@@ -106,7 +117,9 @@ static const astreamer_ops astreamer_waldump_ops = {
/*
* Initializes the tar archive reader, creates a hash table for WAL entries,
* checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
*/
void
init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -206,6 +219,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
privateInfo->start_segno > segno ||
privateInfo->end_segno < segno)
free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
}
/*
@@ -379,6 +399,17 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
destroyStringInfo(entry->buf);
entry->buf = NULL;
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
/* Set cur_file to NULL if it matches the entry being ignored */
if (privateInfo->cur_file == entry)
privateInfo->cur_file = NULL;
@@ -390,12 +421,16 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* Returns the archived WAL entry from the hash table if it exists. Otherwise,
* it invokes the routine to read the archived file, which then populates the
* entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
int WalSegSz)
{
ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
/* Search hash table */
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
@@ -409,28 +444,59 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
*/
while (1)
{
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
/* Fetch more data */
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
- break; /* archive file ended */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
/*
* Archived streamer is reading a non-WAL file or an irrelevant WAL
* file.
*/
- if (privateInfo->cur_file == NULL)
+ if (entry == NULL)
continue;
- entry = privateInfo->cur_file;
-
/* Found the required entry */
if (strcmp(fname, entry->fname) == 0)
return entry;
- /* WAL segments must be archived in order */
- pg_log_error("WAL files are not archived in sequential order");
- pg_log_error_detail("Expecting segment \"%s\" but found \"%s\".",
- fname, entry->fname);
- exit(1);
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
}
/* Requested WAL segment not found */
@@ -468,7 +534,88 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
}
/*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
*/
static astreamer *
astreamer_waldump_new(XLogDumpPrivate *privateInfo)
@@ -552,6 +699,7 @@ astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
}
entry->buf = makeStringInfo();
+ entry->spilled = false;
entry->read_len = 0;
privateInfo->cur_file = entry;
}
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f0b8116ff14..e970b007883 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -478,10 +478,14 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return -1;
/*
- * If the target page is in a different segment, free the buffer space
- * occupied by the previous segment data. Since pg_waldump never requests
- * the same WAL bytes twice, moving to a new segment implies the previous
- * buffer's data and that segment will not be needed again.
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
*/
curSegNo = state->seg.ws_segno;
if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
@@ -497,6 +501,13 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
state->seg.ws_tli = private->timeline;
state->seg.ws_segno = nextSegNo;
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
/*
* If in pre-reading mode (prior to actual decoding), do not delete
* any entries that might be requested again once the decoding loop
@@ -508,9 +519,20 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
free_archive_wal_entry(fname, private);
}
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
}
- /* Read the WAL page from the archive streamer */
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
return read_archive_wal_page(private, targetPagePtr, count, readBuff,
WalSegSz);
}
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 62054bc74c0..1097390d575 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -18,6 +18,9 @@
struct ArchivedWALFile;
struct ArchivedWAL_hash;
+/* Temporary directory */
+extern char *TmpWalSegDir;
+
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
{
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 6f8ce319841..6960bd46ba4 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
my $tar = $ENV{TAR};
@@ -312,7 +313,7 @@ sub generate_archive
}
closedir $dh;
- @files = sort @files;
+ @files = shuffle @files;
# move into the WAL directory before archiving files
my $cwd = getcwd;
--
2.47.1
[application/x-patch] v16-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 9-v16-0008-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
download | inline diff:
From fae022329eaf4c7a067dc96374017fcd2453c812 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v16 08/11] pg_verifybackup: Delay default WAL directory
preparation.
We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..8cc204719ee 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /* By default, look for the WAL in the backup directory, too. */
+ if (wal_directory == NULL)
+ wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
--
2.47.1
[application/x-patch] v16-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 10-v16-0009-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
download | inline diff:
From 2a0ca4af4197e13f106cbf3bfa35600db2db3ff9 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v16 09/11] pg_verifybackup: Rename the wal-directory switch to
wal-path
With previous patches to pg_waldump can now decode WAL directly from
tar files. This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.
To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.
== NOTE ==
The corresponding PO files require updating due to this change.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 2 +-
src/bin/pg_verifybackup/pg_verifybackup.c | 23 ++++++++++++-----------
src/bin/pg_verifybackup/t/007_wal.pl | 4 ++--
3 files changed, 15 insertions(+), 14 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 8cc204719ee..682c365431f 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +126,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +136,7 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +222,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -365,15 +366,15 @@ main(int argc, char **argv)
verify_backup_checksums(&context);
/* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+ if (wal_path == NULL)
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -1188,7 +1189,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1199,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1367,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
--
2.47.1
[application/x-patch] v16-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch (11.5K, 11-v16-0010-pg_verifybackup-Enabled-WAL-parsing-for-tar-form.patch)
download | inline diff:
From ddf838dcfb1376d0fe76c0325df78f4037a948b3 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v16 10/11] pg_verifybackup: Enabled WAL parsing for tar-format
backup
Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
doc/src/sgml/ref/pg_verifybackup.sgml | 12 ++--
src/bin/pg_verifybackup/pg_verifybackup.c | 66 +++++++++++++------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 --
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 16 +++++
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 70 insertions(+), 42 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -264,9 +261,10 @@ PostgreSQL documentation
<term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 682c365431f..db79dd39103 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -137,6 +141,8 @@ main(int argc, char **argv)
bool no_parse_wal = false;
bool quiet = false;
char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -328,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -347,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -365,9 +360,28 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
- /* By default, look for the WAL in the backup directory, too. */
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
if (wal_path == NULL)
- wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
/*
* Try to parse the required ranges of WAL records, unless we were told
@@ -788,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -817,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -876,11 +892,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -919,9 +937,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 8ad2234453d..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-18 11:45 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-18 11:45 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Mar 11, 2026 at 10:38 PM Andrew Dunstan <[email protected]> wrote:
>
>
> On 2026-03-09 Mo 8:26 AM, Amul Sul wrote:
>
> On Sat, Mar 7, 2026 at 3:51 AM Andrew Dunstan <[email protected]> wrote:
>
> On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:
>
> On 2026-03-04 We 7:52 AM, Amul Sul wrote:
>
> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<[email protected]> wrote:
>
> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>
> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<[email protected]> wrote:
>
> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<[email protected]> wrote:
>
> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<[email protected]> wrote:
>
> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<[email protected]> wrote:
>
> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<[email protected]> wrote:
>
> In the attached version, I am using the WAL segment name as the hash
> key, which is much more straightforward. I have rewritten
> read_archive_wal_page(), and it looks much cleaner than before. The
> logic to discard irrelevant WAL files is still within
> get_archive_wal_entry. I added an explanation for setting cur_wal to
> NULL, which is now handled in the separate function I mentioned
> previously.
>
> Kindly have a look at the attached version; let me know if you are
> still not happy with the current approach for filtering/discarding
> irrelevant WAL segments. It isn't much different from the previous
> version, but I have tried to keep it in a separate routine for better
> code readability, with comments to make it easier to understand. I
> also added a comment for ArchivedWALFile.
>
> I feel like the division of labor between get_archive_wal_entry() and
> read_archive_wal_page() is odd. I noticed this in the last version,
> too, and it still seems to be the case. get_archive_wal_entry() first
> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
> If it doesn't, it loops until an entry for the requested file shows up
> and then returns it. Then control returns to read_archive_wal_page()
> which loops some more until we have all the data we need for the
> requested file. But it seems odd to me to have two separate loops
> here. I think that the first loop is going to call read_archive_file()
> until we find the beginning of the file that we care about and then
> the second one is going to call read_archive_file() some more until we
> have read enough of it to satisfy the request. It feels odd to me to
> do it that way, as if we told somebody to first wait until 9 o'clock
> and then wait another 30 minutes, instead of just telling them to wait
> until 9:30. I realize it's not quite the same thing, because apart
> from calling read_archive_file(), the two loops do different things,
> but I still think it looks odd.
>
> + /*
> + * Ignore if the timeline is different or the current segment is not
> + * the desired one.
> + */
> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
> + if (privateInfo->timeline != curSegTimeline ||
> + privateInfo->startSegNo > curSegNo ||
> + privateInfo->endSegNo < curSegNo ||
> + segno > curSegNo)
> + {
> + free_archive_wal_entry(entry->fname, privateInfo);
> + continue;
> + }
>
> The comment doesn't match the code. If it did, the test would be
> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
> instead the segno test is > rather than !=, and the checks against
> startSegNo and endSegNo aren't explained at all. I think I understand
> why the segno test uses > rather than !=, but it's the point of the
> comment to explain things like that, rather than leaving the reader to
> guess. And I don't know why we also need to test startSegNo and
> endSegNo.
>
> I also wonder what the point is of doing XLogFromFileName() on the
> fname provided by the caller and then again on entry->fname. Couldn't
> you just compare the strings?
>
> Again, the division of labor is really odd here. It's the job of
> astreamer_waldump_content() to skip things that aren't WAL files at
> all, but it's the job of get_archive_wal_entry() to skip things that
> are WAL files but not the one we want. I disagree with putting those
> checks in completely separate parts of the code.
>
> Keeping the timeline and segment start-end range checks inside the
> archive streamer creates a circular dependency that cannot be resolved
> without a 'dirty hack'. We must read the first available WAL file page
> to determine the wal_segment_size before it can calculate the target
> segment range. Moving the checks inside the streamer would make it
> impossible to process that initial file, as the necessary filtering
> parameters -- would still be unknown which would need to be skipped
> for the first read somehow. What if later we realized that the first
> WAL file which was allowed to be streamed by skipping that check is
> irrelevant and doesn't fall under the start-end segment range?
>
> Please have a look at the attached version, specifically patch 0005.
> In astreamer_waldump_content(), I have moved the WAL file filtration
> check from get_archive_wal_entry(). This check will be skipped during
> the initial read in init_archive_reader(), which instead performs it
> explicitly once it determines the WAL segment size and the start/end
> segments.
>
> To access the WAL segment size inside astreamer_waldump_content(), I
> have moved the WAL segment size variable into the XLogDumpPrivate
> structure in the separate 0004 patch.
>
> Attached is an updated version including the aforesaid changes. It
> includes a new refactoring patch (0001) that moves the logic for
> identifying tar archives and their compression types from
> pg_basebackup and pg_verifybackup into a separate-reusable function,
> per a suggestion from Euler [1]. Additionally, I have added a test
> for the contrecord decoding to the main patch (now 0006).
>
> 1]http://postgr.es/m/[email protected]
>
> Rebased against the latest master, fixed typos in code comments, and
> replaced palloc0 with palloc0_object.
>
> Hi Amul.
>
>
> I think this looks in pretty good shape.
>
> Thank you very much for looking at the patch.
>
> Attached are patches for a few things I think could be fixed. They are
> mostly self-explanatory. The TAP test fix is the only sane way I could
> come up with stopping the skip code you had from reporting a wildly
> inaccurate number of tests skipped. The sane way to do this from a
> Test::More perspective is a subtest, but unfortunately meson does not
> like subtest output, which is why we don't use it elsewhere, so the only
> way I could come up with was to split this out into a separate test. Of
> course, we might just say we don't care about the misreport, in which
> case we could just live with things as they are.
>
> I agree that the reported skip number was incorrect, and I have
> corrected it in the attached patch. I haven't applied your patch for
> the TAP test improvements yet because I wanted to double-check it
> first with you; the patch as it stood created duplicate tests already
> present in 001_basic.pl. To avoid this duplication, I have added a
> loop that performs tests for both plain and tar WAL directory inputs,
> similar to the approach used in pg_verifybackup for different
> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> don't have any objection to doing so if you feel the duplication is
> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> is a bit tidier. Let me know your thoughts.
>
> I will take a look.
>
> I'm ok, with doing it this way. It's just a bit fragile - if we add a
> test the number will be wrong. But maybe it's not worth worrying about.
>
> Everything else looks fairly good. The attached fixes a few relatively
> minor issues in v15. The main one is that it stops allocating/freeing a
> buffer every time we call read_archive_file() and instead adds a
> reusable buffer. It also adds back wal-directory as an undocumented
> alias of wal-path, to avoid breaking legacy scripts unnecessarily, and
> adds constness to the fname argument of pg_tar_compress_algorithm, as
> well as fixing some indentation and grammar issues.
>
> All in all I think we're in good shape.
>
> Thanks for the review. I have incorporated your suggested changes,
> with one exception: I have skipped the buffer reallocation code in
> read_archive_file(). Since we only handle two specific read sizes --
> XLOG_BLCKSZ and READ_CHUNK_SIZE (128 KB, we defined in
> archive_waldump.c) -- dynamic reallocation seems unnecessary. Instead,
> I moved the allocation to init_archive_reader(), which now initializes
> a buffer at READ_CHUNK_SIZE. I also added an assertion in
> read_archive_file() to ensure that no read request exceeds this
> allocated capacity.
>
> Kindly have a look at the attached version and let me know your thoughts.
>
>
> Looks pretty good. I have squashed them into three patches I think are committable. Also attached is a diff showing what's changed - mainly this:
>
> . --follow + tar archive rejected (pg_waldump.c) — new validation prevents a confusing pg_fatal when combining --follow with a tar archive
> . error messages split (archive_waldump.c) — the single "could not read file" error is now two distinct messages: "WAL segment is too short" (truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an issue raised in review
> . hash table cleanup (archive_waldump.c) — free_archive_reader now iterates and frees all remaining hash entries and destroys the table
>
The final squashed version looks good to me, thank you. But, I would
like to propose splitting the 0001 patch into two separate commits: a
preparatory refactoring of the pg_waldump code and a standalone commit
that moves the tar archive detection and compression logic to a common
location, as the latter is an independent improvement to the existing
codebase. Additionally, since the test file refactoring was only kept
separate to facilitate the review and has already been reviewed, I
suggest merging those changes into the main feature patch i.e. 0002.
All other elements should remain in a single preparatory refactoring
patch for pg_waldump.
Attached is the version that includes the proposed split. No
additional changes to 0002 and 0003 patches.
Regards,
Amul
Attachments:
[application/x-patch] v18-0001-Move-tar-detection-and-compression-logic-to-comm.patch (7.0K, 2-v18-0001-Move-tar-detection-and-compression-logic-to-comm.patch)
download | inline diff:
From 93b0818ce1b44619a37b9c5624eb0c7792a30edd Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v18 1/4] Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..fb27501d297 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v18-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch (8.3K, 3-v18-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch)
download | inline diff:
From 3790325ce1316d745e453081dc381f05b68ad036 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v18 2/4] pg_waldump: Preparatory refactoring for tar archive
WAL decoding.
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++--------------
src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++
2 files changed, 70 insertions(+), 34 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
@@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
@@ -801,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -855,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1128,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1149,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1165,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1180,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1190,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1203,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1224,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..013b051506f
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ int segsize;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v18-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch (54.4K, 4-v18-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch)
download | inline diff:
From 94816d7767e18691d25820490e1a079ba90813cd Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v18 3/4] pg_waldump: Add support for reading WAL from tar
archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_waldump.sgml | 23 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 823 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 286 ++++++++--
src/bin/pg_waldump/pg_waldump.h | 48 ++
src/bin/pg_waldump/t/001_basic.pl | 242 ++++++--
src/tools/pgindent/typedefs.list | 4 +
8 files changed, 1311 insertions(+), 126 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,21 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
+ </para>
</listitem>
</varlistentry>
@@ -383,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..c93e02ece8b
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,823 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/file_perm.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as pg_waldump moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* We must first parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* If the archive is compressed, decompress before parsing. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free any remaining hash table entries and their buffers. */
+ if (privateInfo->archive_wal_htab != NULL)
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *entry;
+
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->buf != NULL)
+ destroyStringInfo(entry->buf);
+ }
+ ArchivedWAL_destroy(privateInfo->archive_wal_htab);
+ privateInfo->archive_wal_htab = NULL;
+ }
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data. Raise an error if the archive
+ * streamer has moved past our segment (meaning the WAL file
+ * in the archive is shorter than expected) or if reading the
+ * archive reached EOF.
+ */
+ if (privateInfo->cur_file != entry)
+ pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
+ privateInfo->archive_name, fname,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. We could return a
+ * boolean since we either successfully read the WAL page or raise an
+ * error, but the caller expects this value to be returned. The routine
+ * that reads WAL pages from the physical WAL file follows the same
+ * convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateInfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
+
+ /* Search hash table */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The requested WAL entry has not been read from the archive yet; invoke
+ * the archive streamer to read it.
+ */
+ while (1)
+ {
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
+ /* Fetch more data */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (entry == NULL)
+ continue;
+
+ /* Found the required entry */
+ if (strcmp(fname, entry->fname) == 0)
+ return entry;
+
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..b13cedaa3e7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,103 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
+ }
+
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +874,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +907,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +967,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1046,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,12 +1210,21 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
@@ -1128,6 +1240,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1261,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1209,15 +1338,46 @@ main(int argc, char **argv)
goto bad_argument;
}
+ /* --follow is not supported with tar archives */
+ if (config.follow && private.archive_name)
+ {
+ pg_log_error("--follow is not supported when reading from a tar archive");
+ goto bad_argument;
+ }
+
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1405,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1489,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..1097390d575 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,14 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
+
+/* Temporary directory */
+extern char *TmpWalSegDir;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +29,46 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+ Size archive_read_buf_size;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..6960bd46ba4 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,9 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -198,28 +238,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +245,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +264,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +279,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -286,40 +299,145 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+ @files = shuffle @files;
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ });
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ SKIP:
+ {
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
+ }
+}
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..4961c3024af 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -147,6 +147,9 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
+ArchivedWAL_iterator
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3540,6 +3543,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v18-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch (16.0K, 5-v18-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch)
download | inline diff:
From b3e7949959cca0d3c5258c63c172702794f8051e Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Wed, 11 Mar 2026 11:26:36 -0400
Subject: [PATCH v18 4/4] pg_verifybackup: Enable WAL parsing for tar-format
backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++-
src/bin/pg_verifybackup/pg_verifybackup.c | 89 ++++++++++++-------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 20 ++++-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 85 insertions(+), 56 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -261,12 +258,13 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..db79dd39103 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +130,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +140,9 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +228,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -285,10 +292,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -331,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -350,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -368,12 +360,35 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
+ if (wal_path == NULL)
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +892,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +937,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
@@ -1188,7 +1215,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1225,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1393,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-18 15:16 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-18 15:16 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Mar 18, 2026 at 5:15 PM Amul Sul <[email protected]> wrote:
>
> On Wed, Mar 11, 2026 at 10:38 PM Andrew Dunstan <[email protected]> wrote:
> > [...]
> > Looks pretty good. I have squashed them into three patches I think are committable. Also attached is a diff showing what's changed - mainly this:
> >
> > . --follow + tar archive rejected (pg_waldump.c) — new validation prevents a confusing pg_fatal when combining --follow with a tar archive
> > . error messages split (archive_waldump.c) — the single "could not read file" error is now two distinct messages: "WAL segment is too short" (truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an issue raised in review
> > . hash table cleanup (archive_waldump.c) — free_archive_reader now iterates and frees all remaining hash entries and destroys the table
> >
>
> The final squashed version looks good to me, thank you. But, I would
> like to propose splitting the 0001 patch into two separate commits: a
> preparatory refactoring of the pg_waldump code and a standalone commit
> that moves the tar archive detection and compression logic to a common
> location, as the latter is an independent improvement to the existing
> codebase. Additionally, since the test file refactoring was only kept
> separate to facilitate the review and has already been reviewed, I
> suggest merging those changes into the main feature patch i.e. 0002.
> All other elements should remain in a single preparatory refactoring
> patch for pg_waldump.
>
> Attached is the version that includes the proposed split. No
> additional changes to 0002 and 0003 patches.
>
Added the two missing 'Reviewed-by' lines to the credit section of the
commit message and did a minor optimization in get_archive_wal_entry.
Regards,
Amul
Attachments:
[application/octet-stream] v19-0001-Move-tar-detection-and-compression-logic-to-comm.patch (7.1K, 2-v19-0001-Move-tar-detection-and-compression-logic-to-comm.patch)
download | inline diff:
From 2b3fec35c1070e187ee71ee7fdaa76bef09e076f Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 17 Feb 2026 14:51:11 +0530
Subject: [PATCH v19 1/4] Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..fb27501d297 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the name is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/octet-stream] v19-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch (8.4K, 3-v19-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch)
download | inline diff:
From 1de7233f3785e202662cd00b5b7fd2b750e24fea Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 22 Jan 2026 10:28:32 +0530
Subject: [PATCH v19 2/4] pg_waldump: Preparatory refactoring for tar archive
WAL decoding.
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++--------------
src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++
2 files changed, 70 insertions(+), 34 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
@@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
@@ -801,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -855,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1128,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1149,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1165,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1180,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1190,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1203,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1224,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..013b051506f
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ int segsize;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/octet-stream] v19-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch (54.7K, 4-v19-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch)
download | inline diff:
From 0de254f057d17e772b3dccebb18b49ca7baa7e85 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 18 Feb 2026 11:07:57 +0530
Subject: [PATCH v19 3/4] pg_waldump: Add support for reading WAL from tar
archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_waldump.sgml | 23 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 827 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 286 +++++++--
src/bin/pg_waldump/pg_waldump.h | 48 ++
src/bin/pg_waldump/t/001_basic.pl | 242 ++++++--
src/tools/pgindent/typedefs.list | 4 +
8 files changed, 1315 insertions(+), 126 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,21 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
+ </para>
</listitem>
</varlistentry>
@@ -383,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..f36de991dc6
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,827 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/file_perm.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic if segments
+ * are ever archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as pg_waldump moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* We must first parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* If the archive is compressed, decompress before parsing. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Verify that the archive contains valid WAL files and fetch WAL segment
+ * size
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archived streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding a WAL file, once we hit the
+ * end LSN, any remaining WAL data in the buffer or the tar archive's
+ * unreached end can be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free any remaining hash table entries and their buffers. */
+ if (privateInfo->archive_wal_htab != NULL)
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *entry;
+
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->buf != NULL)
+ destroyStringInfo(entry->buf);
+ }
+ ArchivedWAL_destroy(privateInfo->archive_wal_htab);
+ privateInfo->archive_wal_htab = NULL;
+ }
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /* Calculate the LSN range currently residing in the buffer */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring it remains full page available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data. Raise an error if the archive streamer
+ * has moved past our segment (meaning the WAL file in the archive
+ * is shorter than expected) or if reading the archive reached
+ * EOF.
+ */
+ if (privateInfo->cur_file != entry)
+ pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
+ privateInfo->archive_name, fname,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return the fixed value provided as input. We could return a
+ * boolean since we either successfully read the WAL page or raise an
+ * error, but the caller expects this value to be returned. The routine
+ * that reads WAL pages from the physical WAL file follows the same
+ * convention.
+ */
+ return count;
+}
+
+/*
+ * Clears the buffer of a WAL entry that is being ignored. This frees up memory
+ * and prevents the accumulation of irrelevant WAL data. Additionally,
+ * conditionally setting cur_file within privateInfo to NULL ensures the
+ * archive streamer skips unnecessary copy operations.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
+ /* Set cur_file to NULL if it matches the entry being ignored */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table if that WAL exists in the archive.
+ * If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
+
+ /*
+ * Search the hash table first. If the entry is found, return it.
+ * Otherwise, the requested WAL entry hasn't been read from the archive
+ * yet; invoke the archive streamer to fetch it.
+ */
+ while (1)
+ {
+ /*
+ * Search hash table.
+ *
+ * We perform the search inside the loop because a single iteration of
+ * the archive reader may decompress and extract multiple files into
+ * the hash table. One of these newly added files could be the one we
+ * are seeking.
+ */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
+ /* Fetch more data */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
+
+ /*
+ * Archived streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (entry == NULL)
+ continue;
+
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, use the provided WAL directory to extract WAL file
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s\": %m", fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files. */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file could be with full path */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for filemap hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..b13cedaa3e7 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -440,6 +440,103 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
+ }
+
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +874,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a ./pg_wal that contains such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +907,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +967,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1046,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,12 +1210,21 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
@@ -1128,6 +1240,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1261,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1209,15 +1338,46 @@ main(int argc, char **argv)
goto bad_argument;
}
+ /* --follow is not supported with tar archives */
+ if (config.follow && private.archive_name)
+ {
+ pg_log_error("--follow is not supported when reading from a tar archive");
+ goto bad_argument;
+ }
+
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory of the pg_waldump execution
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1405,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1489,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..1097390d575 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,14 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
+
+/* Temporary directory */
+extern char *TmpWalSegDir;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +29,46 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* Tar archive name */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+ Size archive_read_buf_size;
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Although these values can be easily derived from startptr and endptr,
+ * doing so repeatedly for each archived member would be inefficient, as
+ * it would involve recalculating and filtering out irrelevant WAL
+ * segments.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..6960bd46ba4 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,9 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -198,28 +238,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +245,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +264,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +279,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -286,40 +299,145 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+ @files = shuffle @files;
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ });
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ SKIP:
+ {
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
+ }
+}
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 52f8603a7be..4961c3024af 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -147,6 +147,9 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
+ArchivedWAL_iterator
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3540,6 +3543,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/octet-stream] v19-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch (16.1K, 5-v19-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch)
download | inline diff:
From 382c0c70d12309f1fc71001c7705b46621d74332 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Wed, 11 Mar 2026 11:26:36 -0400
Subject: [PATCH v19 4/4] pg_verifybackup: Enable WAL parsing for tar-format
backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++-
src/bin/pg_verifybackup/pg_verifybackup.c | 89 ++++++++++++-------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 20 ++++-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 85 insertions(+), 56 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -261,12 +258,13 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..db79dd39103 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +130,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +140,9 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +228,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -285,10 +292,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -331,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -350,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -368,12 +360,35 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
+ if (wal_path == NULL)
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,11 +892,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
@@ -918,9 +937,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+
+ *base_archive_path = pstrdup(fullpath);
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+
+ *wal_archive_path = pstrdup(fullpath);
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
@@ -1188,7 +1215,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1225,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1393,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-19 10:20 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-19 10:20 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Wed, Mar 18, 2026 at 8:46 PM Amul Sul <[email protected]> wrote:
>
> On Wed, Mar 18, 2026 at 5:15 PM Amul Sul <[email protected]> wrote:
> >
> > On Wed, Mar 11, 2026 at 10:38 PM Andrew Dunstan <[email protected]> wrote:
> > > [...]
> > > Looks pretty good. I have squashed them into three patches I think are committable. Also attached is a diff showing what's changed - mainly this:
> > >
> > > . --follow + tar archive rejected (pg_waldump.c) — new validation prevents a confusing pg_fatal when combining --follow with a tar archive
> > > . error messages split (archive_waldump.c) — the single "could not read file" error is now two distinct messages: "WAL segment is too short" (truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an issue raised in review
> > > . hash table cleanup (archive_waldump.c) — free_archive_reader now iterates and frees all remaining hash entries and destroys the table
> > >
> >
> > The final squashed version looks good to me, thank you. But, I would
> > like to propose splitting the 0001 patch into two separate commits: a
> > preparatory refactoring of the pg_waldump code and a standalone commit
> > that moves the tar archive detection and compression logic to a common
> > location, as the latter is an independent improvement to the existing
> > codebase. Additionally, since the test file refactoring was only kept
> > separate to facilitate the review and has already been reviewed, I
> > suggest merging those changes into the main feature patch i.e. 0002.
> > All other elements should remain in a single preparatory refactoring
> > patch for pg_waldump.
> >
> > Attached is the version that includes the proposed split. No
> > additional changes to 0002 and 0003 patches.
> >
>
> Added the two missing 'Reviewed-by' lines to the credit section of the
> commit message and did a minor optimization in get_archive_wal_entry.
>
Attaching an updated version. It includes some tweaks to code
comments, adds an assert inside get_archive_wal_entry(), moves the
archive_read_buf_size declaration and usage into an assert-enabled
check, and makes a minor change to precheck_tar_backup_file() to
assign out-variables only after successful validation.
Regards,
Amul
Attachments:
[application/x-patch] v20-0001-Move-tar-detection-and-compression-logic-to-comm.patch (7.1K, 2-v20-0001-Move-tar-detection-and-compression-logic-to-comm.patch)
download | inline diff:
From ba736014228ea250b8eb155f2776bb86feed2b55 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:30 +0530
Subject: [PATCH v20 1/5] Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..fefbed68bea 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the extension is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ int fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v20-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch (8.4K, 3-v20-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch)
download | inline diff:
From 7b36f9bdebaf9be7e5adb9b8dac25394cb611d0b Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:39 +0530
Subject: [PATCH v20 2/5] pg_waldump: Preparatory refactoring for tar archive
WAL decoding.
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++--------------
src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++
2 files changed, 70 insertions(+), 34 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
@@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
@@ -801,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -855,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1128,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1149,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1165,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1180,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1190,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1203,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1224,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..013b051506f
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ int segsize;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v20-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch (56.2K, 4-v20-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch)
download | inline diff:
From 4c191b1cbed7eaa19d5ecff3072ce807943ffdf1 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:46 +0530
Subject: [PATCH v20 3/5] pg_waldump: Add support for reading WAL from tar
archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_waldump.sgml | 23 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 847 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 293 +++++++--
src/bin/pg_waldump/pg_waldump.h | 50 ++
src/bin/pg_waldump/t/001_basic.pl | 242 ++++++--
src/tools/pgindent/typedefs.list | 4 +
8 files changed, 1342 insertions(+), 128 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,21 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
+ </para>
</listitem>
</varlistentry>
@@ -383,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..9cbcae3e8af
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,847 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/file_perm.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic when segments
+ * are archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as pg_waldump moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader: opens the archive, builds a hash table
+ * for WAL entries, reads ahead until a full WAL page header is available to
+ * determine the WAL segment size, and computes start/end segment numbers for
+ * filtering. It also sets up a temporary directory for out-of-order WAL data
+ * and registers an atexit callback to clean it up.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* We must first parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* If the archive is compressed, decompress before parsing. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+
+#ifdef USE_ASSERT_CHECKING
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+#endif
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size.
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from
+ * the first WAL segment in the archive so we can extract the WAL segment
+ * size from the long page header.
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archive streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding WAL, once we hit the end
+ * LSN, any remaining buffered data or unread portion of the archive can
+ * be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free any remaining hash table entries and their buffers. */
+ if (privateInfo->archive_wal_htab != NULL)
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *entry;
+
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->buf != NULL)
+ destroyStringInfo(entry->buf);
+ }
+ ArchivedWAL_destroy(privateInfo->archive_wal_htab);
+ privateInfo->archive_wal_htab = NULL;
+ }
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies the requested WAL data from the hash entry's buffer into readBuff.
+ * If the buffer does not yet contain the needed bytes, fetches more data from
+ * the tar archive via the archive streamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /*
+ * Calculate the LSN range currently residing in the buffer.
+ *
+ * read_len tracks total bytes received for this segment (including
+ * already-discarded data), so endPtr is the LSN just past the last
+ * buffered byte, and startPtr is the LSN of the first buffered byte.
+ */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring a full page remains available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data. Raise an error if the archive streamer
+ * has moved past our segment (meaning the WAL file in the archive
+ * is shorter than expected) or if reading the archive reached
+ * EOF.
+ */
+ if (privateInfo->cur_file != entry)
+ pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
+ privateInfo->archive_name, fname,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return count unchanged. We could return a boolean since we
+ * either successfully read the WAL page or raise an error, but the caller
+ * expects this value to be returned. The routine that reads WAL pages
+ * from physical WAL files follows the same convention.
+ */
+ return count;
+}
+
+/*
+ * Releases the buffer of a WAL entry that is no longer needed, preventing the
+ * accumulation of irrelevant WAL data. Also removes any associated temporary
+ * file and clears privateInfo->cur_file if it points to this entry, so the
+ * archive streamer skips subsequent data for it.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
+ /* Clear cur_file if it points to the entry being freed */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it already exists.
+ * Otherwise, reads more data from the archive until the requested entry is
+ * found. If the archive streamer is reading a WAL file from the archive that
+ * is not currently needed, that data is spilled to a temporary file for later
+ * retrieval.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
+
+ /*
+ * Search the hash table first. If the entry is found, return it.
+ * Otherwise, the requested WAL entry hasn't been read from the archive
+ * yet; invoke the archive streamer to fetch it.
+ */
+ while (1)
+ {
+ /*
+ * Search hash table.
+ *
+ * We perform the search inside the loop because a single iteration of
+ * the archive reader may decompress and extract multiple files into
+ * the hash table. One of these newly added files could be the one we
+ * are seeking.
+ */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
+ /*
+ * Fetch more data either when no current file is being tracked or
+ * when its buffer has been fully flushed to the temporary file.
+ */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
+
+ /*
+ * Archive streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (entry == NULL)
+ continue;
+
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+ Assert(strcmp(fname, entry->fname) != 0);
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, fall back to the provided WAL directory to store WAL files
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s/%s\": %m", TmpWalSegDir, fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * End of this tar member; mark cur_file NULL so subsequent
+ * content callbacks (if any) know no WAL file is currently
+ * active.
+ */
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream. This is a
+ * terminal streamer so it must have no successor.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name.
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file may appear with a full path (e.g., pg_wal/<name>) */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for WAL file hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..f28153165e6 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -327,8 +327,8 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
}
/*
- * Returns the size in bytes of the data to be read. Returns -1 if the end
- * point has already been reached.
+ * Returns the number of bytes to read for the given page. Returns -1 if
+ * the requested range has already been reached or exceeded.
*/
static inline int
required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
@@ -440,6 +440,106 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback to support dumping
+ * WAL files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer.
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
+ }
+
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +877,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a pg_wal subdirectory containing such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +910,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +970,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1049,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,12 +1213,21 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
}
@@ -1128,6 +1243,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1264,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1209,15 +1341,46 @@ main(int argc, char **argv)
goto bad_argument;
}
+ /* --follow is not supported with tar archives */
+ if (config.follow && private.archive_name)
+ {
+ pg_log_error("--follow is not supported when reading from a tar archive");
+ goto bad_argument;
+ }
+
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory.
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1408,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1492,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..fd25792b33a 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,14 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
+
+/* Temporary directory for spilling out-of-order WAL segments from archives */
+extern char *TmpWalSegDir;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +29,48 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* tar archive filename */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+
+#ifdef USE_ASSERT_CHECKING
+ Size archive_read_buf_size;
+#endif
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Pre-computed segment numbers derived from startptr and endptr. Caching
+ * them avoids repeated XLByteToSeg() calls when filtering each archive
+ * member against the requested WAL range.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..94c58187412 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,9 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -198,28 +238,6 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
-command_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
command_like(
[
'pg_waldump', '--quiet',
@@ -227,22 +245,16 @@ command_like(
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +264,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +279,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -286,40 +299,145 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, shuffle the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+ @files = shuffle @files;
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ });
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ SKIP:
+ {
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
+ }
+}
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 174e2798443..3e2fc711a3e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -147,6 +147,9 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
+ArchivedWAL_iterator
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3542,6 +3545,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v20-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch (16.7K, 5-v20-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch)
download | inline diff:
From cccb149f57bd3323506d459d005c7552c19aa07d Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Thu, 19 Mar 2026 15:43:53 +0530
Subject: [PATCH v20 4/5] pg_verifybackup: Enable WAL parsing for tar-format
backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++-
src/bin/pg_verifybackup/pg_verifybackup.c | 96 ++++++++++++-------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 20 +++-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 91 insertions(+), 57 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -261,12 +258,13 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..b60ab8739d5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +130,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +140,9 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +228,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -285,10 +292,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -331,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -350,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -368,12 +360,35 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
+ if (wal_path == NULL)
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,17 +892,21 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
pg_compress_algorithm compress_algorithm;
tar_file *tar;
char *suffix = NULL;
+ bool is_base_archive = false;
+ bool is_wal_archive = false;
/* Should be tar format backup */
Assert(context->format == 't');
@@ -918,9 +939,15 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+ is_base_archive = true;
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+ is_wal_archive = true;
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
@@ -953,8 +980,13 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* Ignore WALs, as reading and verification will be handled through
* pg_waldump.
*/
- if (strncmp("pg_wal", relpath, 6) == 0)
+ if (is_wal_archive)
+ {
+ *wal_archive_path = pstrdup(fullpath);
return;
+ }
+ else if (is_base_archive)
+ *base_archive_path = pstrdup(fullpath);
/*
* Append the information to the list for complete verification at a later
@@ -1188,7 +1220,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1230,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1398,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-19 20:48 Zsolt Parragi <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Zsolt Parragi @ 2026-03-19 20:48 UTC (permalink / raw)
To: Amul Sul <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
Hello!
Path is ignored with a positional argument, I think this is a bug?
This fails:
pg_waldump --path /wal/dir 000000010000000000000001
And this works:
pg_waldump --path /wal/dir --start 0/01000028 --end 0/010020F8
+{
+ int fname_len = strlen(fname);
+
Shouldn't this use size_t?
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
Maybe this could be deferred to be created only on first use? If I
understand correctly, in a typical scenario waldump won't use this
temporary directory, yet it always creates it.
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-20 11:31 Amul Sul <[email protected]>
parent: Zsolt Parragi <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-20 11:31 UTC (permalink / raw)
To: Zsolt Parragi <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Fri, Mar 20, 2026 at 2:18 AM Zsolt Parragi <[email protected]> wrote:
>
> Hello!
>
> Path is ignored with a positional argument, I think this is a bug?
>
> This fails:
>
> pg_waldump --path /wal/dir 000000010000000000000001
>
> And this works:
>
> pg_waldump --path /wal/dir --start 0/01000028 --end 0/010020F8
>
Good catch! I've fixed this in the attached version and updated a test
case to cover this scenario.
> +{
> + int fname_len = strlen(fname);
> +
>
> Shouldn't this use size_t?
>
Okay, that can be used. I’ve done the same in the attached version.
> + /*
> + * Setup temporary directory to store WAL segments and set up an exit
> + * callback to remove it upon completion.
> + */
> + setup_tmpwal_dir(waldir);
>
> Maybe this could be deferred to be created only on first use? If I
> understand correctly, in a typical scenario waldump won't use this
> temporary directory, yet it always creates it.
Yeah, that optimization can be done, but passing the waldir -- which
is only used once -- to the point where the first temp file is created
would require quite a bit of code refactoring that doesn't seem to
offer much gain, IMO.
Regards,
Amul
Attachments:
[application/x-patch] v21-0001-Move-tar-detection-and-compression-logic-to-comm.patch (7.1K, 2-v21-0001-Move-tar-detection-and-compression-logic-to-comm.patch)
download | inline diff:
From 608372553eb1bb88285081a0382fe2c227c90d60 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:30 +0530
Subject: [PATCH v21 1/5] Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Zsolt Parragi <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..ae2089d9406 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the extension is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ size_t fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/x-patch] v21-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch (8.4K, 3-v21-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch)
download | inline diff:
From 73b7ca37810df5c30391f4f09a199e672acd6b75 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:39 +0530
Subject: [PATCH v21 2/5] pg_waldump: Preparatory refactoring for tar archive
WAL decoding.
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++--------------
src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++
2 files changed, 70 insertions(+), 34 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
@@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
@@ -801,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -855,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1128,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1149,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1165,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1180,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1190,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1203,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1224,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..013b051506f
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ int segsize;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/x-patch] v21-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch (56.4K, 4-v21-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch)
download | inline diff:
From 9c87d8e8de19ed7be63874aea61df19a0cb41dd2 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:46 +0530
Subject: [PATCH v21 3/5] pg_waldump: Add support for reading WAL from tar
archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Zsolt Parragi <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_waldump.sgml | 23 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 847 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 295 ++++++++--
src/bin/pg_waldump/pg_waldump.h | 50 ++
src/bin/pg_waldump/t/001_basic.pl | 246 ++++++--
src/tools/pgindent/typedefs.list | 4 +
8 files changed, 1346 insertions(+), 130 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,21 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
+ </para>
</listitem>
</varlistentry>
@@ -383,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..9cbcae3e8af
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,847 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/file_perm.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/* Temporary exported WAL file directory */
+char *TmpWalSegDir = NULL;
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic when segments
+ * are archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as pg_waldump moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
+
+ int read_len; /* total bytes of a WAL read from archive */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo,
+ int WalSegSz);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader: opens the archive, builds a hash table
+ * for WAL entries, reads ahead until a full WAL page header is available to
+ * determine the WAL segment size, and computes start/end segment numbers for
+ * filtering. It also sets up a temporary directory for out-of-order WAL data
+ * and registers an atexit callback to clean it up.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+ int *WalSegSz, pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* We must first parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* If the archive is compressed, decompress before parsing. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+
+#ifdef USE_ASSERT_CHECKING
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+#endif
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size.
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from
+ * the first WAL segment in the archive so we can extract the WAL segment
+ * size from the long page header.
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Set WalSegSz if WAL data is successfully read */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ *WalSegSz = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archive streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion.
+ */
+ setup_tmpwal_dir(waldir);
+ atexit(cleanup_tmpwal_dir_atexit);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding WAL, once we hit the end
+ * LSN, any remaining buffered data or unread portion of the archive can
+ * be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free any remaining hash table entries and their buffers. */
+ if (privateInfo->archive_wal_htab != NULL)
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *entry;
+
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->buf != NULL)
+ destroyStringInfo(entry->buf);
+ }
+ ArchivedWAL_destroy(privateInfo->archive_wal_htab);
+ privateInfo->archive_wal_htab = NULL;
+ }
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies the requested WAL data from the hash entry's buffer into readBuff.
+ * If the buffer does not yet contain the needed bytes, fetches more data from
+ * the tar archive via the archive streamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff, int WalSegSz)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, WalSegSz);
+ XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+ entry = get_archive_wal_entry(fname, privateInfo, WalSegSz);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /*
+ * Calculate the LSN range currently residing in the buffer.
+ *
+ * read_len tracks total bytes received for this segment (including
+ * already-discarded data), so endPtr is the LSN just past the last
+ * buffered byte, and startPtr is the LSN of the first buffered byte.
+ */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring a full page remains available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data. Raise an error if the archive streamer
+ * has moved past our segment (meaning the WAL file in the archive
+ * is shorter than expected) or if reading the archive reached
+ * EOF.
+ */
+ if (privateInfo->cur_file != entry)
+ pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
+ privateInfo->archive_name, fname,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return count unchanged. We could return a boolean since we
+ * either successfully read the WAL page or raise an error, but the caller
+ * expects this value to be returned. The routine that reads WAL pages
+ * from physical WAL files follows the same convention.
+ */
+ return count;
+}
+
+/*
+ * Releases the buffer of a WAL entry that is no longer needed, preventing the
+ * accumulation of irrelevant WAL data. Also removes any associated temporary
+ * file and clears privateInfo->cur_file if it points to this entry, so the
+ * archive streamer skips subsequent data for it.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
+ /* Clear cur_file if it points to the entry being freed */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it already exists.
+ * Otherwise, reads more data from the archive until the requested entry is
+ * found. If the archive streamer is reading a WAL file from the archive that
+ * is not currently needed, that data is spilled to a temporary file for later
+ * retrieval.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo,
+ int WalSegSz)
+{
+ ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
+
+ /*
+ * Search the hash table first. If the entry is found, return it.
+ * Otherwise, the requested WAL entry hasn't been read from the archive
+ * yet; invoke the archive streamer to fetch it.
+ */
+ while (1)
+ {
+ /*
+ * Search hash table.
+ *
+ * We perform the search inside the loop because a single iteration of
+ * the archive reader may decompress and extract multiple files into
+ * the hash table. One of these newly added files could be the one we
+ * are seeking.
+ */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * The WAL file entry currently being processed may change during
+ * archive streamer execution. Therefore, maintain a local variable to
+ * reference the previous entry, ensuring that any remaining data in
+ * its buffer is successfully flushed to the temporary file before
+ * switching to the next WAL entry.
+ */
+ entry = privateInfo->cur_file;
+
+ /*
+ * Fetch more data either when no current file is being tracked or
+ * when its buffer has been fully flushed to the temporary file.
+ */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
+
+ /*
+ * Archive streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (entry == NULL)
+ continue;
+
+ /*
+ * Archive streamer is currently reading a file that isn't the one
+ * asked for, but it's required in the future. It should be written to
+ * a temporary location for retrieval when needed.
+ */
+ Assert(strcmp(fname, entry->fname) != 0);
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * The change in the current segment entry indicates that the reading
+ * of this file has ended.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, fall back to the provided WAL directory to store WAL files
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ rmtree(TmpWalSegDir, true);
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Create an empty placeholder */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s/%s\": %m", TmpWalSegDir, fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Further checks are skipped if any WAL file can be read.
+ * This typically occurs during initial verification.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * End of this tar member; mark cur_file NULL so subsequent
+ * content callbacks (if any) know no WAL file is currently
+ * active.
+ */
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream. This is a
+ * terminal streamer so it must have no successor.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name.
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* WAL files from the top-level or pg_wal directory will be decoded */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file may appear with a full path (e.g., pg_wal/<name>) */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for WAL file hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..0f6d2372076 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -327,8 +327,8 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
}
/*
- * Returns the size in bytes of the data to be read. Returns -1 if the end
- * point has already been reached.
+ * Returns the number of bytes to read for the given page. Returns -1 if
+ * the requested range has already been reached or exceeded.
*/
static inline int
required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
@@ -440,6 +440,106 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback to support dumping
+ * WAL files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int WalSegSz = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, free the buffer and/or
+ * temporary file disk space occupied by the previous segment's data.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment implies the previous buffer's data and that segment will
+ * not be needed again.
+ *
+ * Afterward, check for the next required WAL segment's physical existence
+ * in the temporary directory first before invoking the archive streamer.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer.
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz);
+ free_archive_wal_entry(fname, private);
+ }
+
+ /*
+ * If the next segment exists, open it and continue reading from there
+ */
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
+ }
+
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff,
+ WalSegSz);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +877,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a pg_wal subdirectory containing such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -810,7 +910,9 @@ main(int argc, char **argv)
XLogRecord *record;
XLogRecPtr first_record;
char *waldir = NULL;
+ char *walpath = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +970,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -943,7 +1049,7 @@ main(int argc, char **argv)
}
break;
case 'p':
- waldir = pg_strdup(optarg);
+ walpath = pg_strdup(optarg);
break;
case 'q':
config.quiet = true;
@@ -1107,14 +1213,25 @@ main(int argc, char **argv)
goto bad_argument;
}
- if (waldir != NULL)
+ if (walpath != NULL)
{
+ /* validate path points to tar archive */
+ if (parse_tar_compress_algorithm(walpath, &compression))
+ {
+ char *fname = NULL;
+
+ split_path(walpath, &waldir, &fname);
+
+ private.archive_name = fname;
+ }
/* validate path points to directory */
- if (!verify_directory(waldir))
+ else if (!verify_directory(walpath))
{
- pg_log_error("could not open directory \"%s\": %m", waldir);
+ pg_log_error("could not open directory \"%s\": %m", walpath);
goto bad_argument;
}
+ else
+ waldir = walpath;
}
if (config.save_fullpage_path != NULL)
@@ -1128,6 +1245,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,69 +1266,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
- waldir = identify_target_directory(waldir, NULL, &private.segsize);
+ else if (!private.archive_name)
+ waldir = identify_target_directory(walpath, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1209,15 +1343,46 @@ main(int argc, char **argv)
goto bad_argument;
}
+ /* --follow is not supported with tar archives */
+ if (config.follow && private.archive_name)
+ {
+ pg_log_error("--follow is not supported when reading from a tar archive");
+ goto bad_argument;
+ }
+
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL WAL directory indicates that the archive file is located in
+ * the current working directory.
+ */
+ if (waldir == NULL)
+ waldir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, waldir, &private.segsize, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1410,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1494,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..fd25792b33a 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,14 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
+
+/* Temporary directory for spilling out-of-order WAL segments from archives */
+extern char *TmpWalSegDir;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +29,48 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_name; /* tar archive filename */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+
+#ifdef USE_ASSERT_CHECKING
+ Size archive_read_buf_size;
+#endif
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of all WAL files that the archive stream has read, including
+ * the one currently in progress.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Pre-computed segment numbers derived from startptr and endptr. Caching
+ * them avoids repeated XLByteToSeg() calls when filtering each archive
+ * member against the requested WAL range.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ const char *waldir, int *WalSegSz,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff,
+ int WalSegSz);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..11df7e092bf 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,9 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -198,51 +238,23 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
command_like(
[
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
-command_like(
- [
- 'pg_waldump', '--quiet',
- $node->data_dir . '/pg_wal/' . $start_walfile
+ 'pg_waldump', '--quiet', '--path',
+ $node->data_dir . '/pg_wal/', $start_walfile
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +264,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +279,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -286,40 +299,145 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, shuffle the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+ @files = shuffle @files;
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ });
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ SKIP:
+ {
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
+ }
+}
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 174e2798443..3e2fc711a3e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -147,6 +147,9 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
+ArchivedWAL_iterator
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3542,6 +3545,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/x-patch] v21-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch (16.7K, 5-v21-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch)
download | inline diff:
From 4afd782e69bce712df4a673615b7bfd36ae903ed Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Thu, 19 Mar 2026 15:43:53 +0530
Subject: [PATCH v21 4/5] pg_verifybackup: Enable WAL parsing for tar-format
backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++-
src/bin/pg_verifybackup/pg_verifybackup.c | 96 ++++++++++++-------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 20 +++-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 91 insertions(+), 57 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -261,12 +258,13 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..b60ab8739d5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +130,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +140,9 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +228,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -285,10 +292,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -331,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -350,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -368,12 +360,35 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
+ if (wal_path == NULL)
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,17 +892,21 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
pg_compress_algorithm compress_algorithm;
tar_file *tar;
char *suffix = NULL;
+ bool is_base_archive = false;
+ bool is_wal_archive = false;
/* Should be tar format backup */
Assert(context->format == 't');
@@ -918,9 +939,15 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+ is_base_archive = true;
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+ is_wal_archive = true;
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
@@ -953,8 +980,13 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* Ignore WALs, as reading and verification will be handled through
* pg_waldump.
*/
- if (strncmp("pg_wal", relpath, 6) == 0)
+ if (is_wal_archive)
+ {
+ *wal_archive_path = pstrdup(fullpath);
return;
+ }
+ else if (is_base_archive)
+ *base_archive_path = pstrdup(fullpath);
/*
* Append the information to the list for complete verification at a later
@@ -1188,7 +1220,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1230,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1398,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-20 13:26 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-20 13:26 UTC (permalink / raw)
To: Zsolt Parragi <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Fri, Mar 20, 2026 at 5:01 PM Amul Sul <[email protected]> wrote:
>
> On Fri, Mar 20, 2026 at 2:18 AM Zsolt Parragi <[email protected]> wrote:
> >
> > Hello!
> >
> > Path is ignored with a positional argument, I think this is a bug?
> >
> > This fails:
> >
> > pg_waldump --path /wal/dir 000000010000000000000001
> >
> > And this works:
> >
> > pg_waldump --path /wal/dir --start 0/01000028 --end 0/010020F8
> >
>
> Good catch! I've fixed this in the attached version and updated a test
> case to cover this scenario.
>
> > +{
> > + int fname_len = strlen(fname);
> > +
> >
> > Shouldn't this use size_t?
> >
>
> Okay, that can be used. I’ve done the same in the attached version.
>
> > + /*
> > + * Setup temporary directory to store WAL segments and set up an exit
> > + * callback to remove it upon completion.
> > + */
> > + setup_tmpwal_dir(waldir);
> >
> > Maybe this could be deferred to be created only on first use? If I
> > understand correctly, in a typical scenario waldump won't use this
> > temporary directory, yet it always creates it.
>
> Yeah, that optimization can be done, but passing the waldir -- which
> is only used once -- to the point where the first temp file is created
> would require quite a bit of code refactoring that doesn't seem to
> offer much gain, IMO.
>
Since Andrew also leans toward creating the directory only when
needed, I have reconsidered the approach. I think we can pass waldir
(the archive directory) via XLogDumpPrivate, and I’ve implemented that
in the attached version.
Regards,
Amul
Attachments:
[application/octet-stream] v22-0001-Move-tar-detection-and-compression-logic-to-comm.patch (7.1K, 2-v22-0001-Move-tar-detection-and-compression-logic-to-comm.patch)
download | inline diff:
From f338bb11c4d69ca092de6a85939b93a1bdb34190 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:30 +0530
Subject: [PATCH v22 1/5] Move tar detection and compression logic to common.
Consolidate tar archive identification and compression-type detection
logic into a shared location. Currently used by pg_basebackup and
pg_verifybackup, this functionality is also required for upcoming
pg_waldump enhancements.
This change promotes code reuse and simplifies maintenance across
frontend tools.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Zsolt Parragi <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++----------------
src/bin/pg_verifybackup/pg_verifybackup.c | 12 +-------
src/common/compression.c | 30 +++++++++++++++++++
src/include/common/compression.h | 2 ++
4 files changed, 44 insertions(+), 36 deletions(-)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index fa169a8d642..c1a4672aa6f 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
astreamer *manifest_inject_streamer = NULL;
bool inject_manifest;
bool is_tar,
- is_tar_gz,
- is_tar_lz4,
- is_tar_zstd,
is_compressed_tar;
+ pg_compress_algorithm compressed_tar_algorithm;
bool must_parse_archive;
- int archive_name_len = strlen(archive_name);
/*
* Normally, we emit the backup manifest as a separate file, but when
@@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
*/
inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest);
- /* Is this a tar archive? */
- is_tar = (archive_name_len > 4 &&
- strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
-
- /* Is this a .tar.gz archive? */
- is_tar_gz = (archive_name_len > 7 &&
- strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
-
- /* Is this a .tar.lz4 archive? */
- is_tar_lz4 = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
-
- /* Is this a .tar.zst archive? */
- is_tar_zstd = (archive_name_len > 8 &&
- strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
+ /* Check whether it is a tar archive and its compression type */
+ is_tar = parse_tar_compress_algorithm(archive_name,
+ &compressed_tar_algorithm);
/* Is this any kind of compressed tar? */
- is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd;
+ is_compressed_tar = (is_tar &&
+ compressed_tar_algorithm != PG_COMPRESSION_NONE);
/*
* Injecting the manifest into a compressed tar file would be possible if
@@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
(spclocation == NULL && writerecoveryconf));
/* At present, we only know how to parse tar archives. */
- if (must_parse_archive && !is_tar && !is_compressed_tar)
+ if (must_parse_archive && !is_tar)
{
pg_log_error("cannot parse archive \"%s\"", archive_name);
pg_log_error_detail("Only tar archives can be parsed.");
@@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
* If the user has requested a server compressed archive along with
* archive extraction at client then we need to decompress it.
*/
- if (format == 'p')
+ if (format == 'p' && is_compressed_tar)
{
- if (is_tar_gz)
+ if (compressed_tar_algorithm == PG_COMPRESSION_GZIP)
streamer = astreamer_gzip_decompressor_new(streamer);
- else if (is_tar_lz4)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4)
streamer = astreamer_lz4_decompressor_new(streamer);
- else if (is_tar_zstd)
+ else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD)
streamer = astreamer_zstd_decompressor_new(streamer);
}
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index cbc9447384f..31f606c45b1 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
}
/* Now, check the compression type of the tar */
- if (strcmp(suffix, ".tar") == 0)
- compress_algorithm = PG_COMPRESSION_NONE;
- else if (strcmp(suffix, ".tgz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.gz") == 0)
- compress_algorithm = PG_COMPRESSION_GZIP;
- else if (strcmp(suffix, ".tar.lz4") == 0)
- compress_algorithm = PG_COMPRESSION_LZ4;
- else if (strcmp(suffix, ".tar.zst") == 0)
- compress_algorithm = PG_COMPRESSION_ZSTD;
- else
+ if (!parse_tar_compress_algorithm(suffix, &compress_algorithm))
{
report_backup_error(context,
"file \"%s\" is not expected in a tar format backup",
diff --git a/src/common/compression.c b/src/common/compression.c
index 92cd4ec7a0d..ae2089d9406 100644
--- a/src/common/compression.c
+++ b/src/common/compression.c
@@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value,
static bool expect_boolean_value(char *keyword, char *value,
pg_compress_specification *result);
+/*
+ * Look up a compression algorithm by archive file extension. Returns true and
+ * sets *algorithm if the extension is recognized. Otherwise returns false.
+ */
+bool
+parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm)
+{
+ size_t fname_len = strlen(fname);
+
+ if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tar") == 0)
+ *algorithm = PG_COMPRESSION_NONE;
+ else if (fname_len >= 4 &&
+ strcmp(fname + fname_len - 4, ".tgz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 7 &&
+ strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+ *algorithm = PG_COMPRESSION_GZIP;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+ *algorithm = PG_COMPRESSION_LZ4;
+ else if (fname_len >= 8 &&
+ strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+ *algorithm = PG_COMPRESSION_ZSTD;
+ else
+ return false;
+
+ return true;
+}
+
/*
* Look up a compression algorithm by name. Returns true and sets *algorithm
* if the name is recognized. Otherwise returns false.
diff --git a/src/include/common/compression.h b/src/include/common/compression.h
index 6c745b90066..f99c747cdd3 100644
--- a/src/include/common/compression.h
+++ b/src/include/common/compression.h
@@ -41,6 +41,8 @@ typedef struct pg_compress_specification
extern void parse_compress_options(const char *option, char **algorithm,
char **detail);
+extern bool parse_tar_compress_algorithm(const char *fname,
+ pg_compress_algorithm *algorithm);
extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm);
extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm);
--
2.47.1
[application/octet-stream] v22-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch (8.4K, 3-v22-0002-pg_waldump-Preparatory-refactoring-for-tar-archi.patch)
download | inline diff:
From 07c53a039d8be2abedc60810ab62eca77d21bfb8 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:39 +0530
Subject: [PATCH v22 2/5] pg_waldump: Preparatory refactoring for tar archive
WAL decoding.
Several refactoring steps in preparation for adding tar archive WAL
decoding support to pg_waldump:
- Move XLogDumpPrivate and related declarations into a new pg_waldump.h
header, allowing a second source file to share them.
- Factor out required_read_len() so the read-size calculation can be
reused for both regular WAL files and tar-archived WAL.
- Move the WAL segment size variable into XLogDumpPrivate and rename it
to segsize, making it accessible to the archive streamer code.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++--------------
src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++
2 files changed, 70 insertions(+), 34 deletions(-)
create mode 100644 src/bin/pg_waldump/pg_waldump.h
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index f3446385d6a..5d31b15dbd8 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
#include "common/logging.h"
#include "common/relpath.h"
#include "getopt_long.h"
+#include "pg_waldump.h"
#include "rmgrdesc.h"
#include "storage/bufpage.h"
@@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false;
static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
-typedef struct XLogDumpPrivate
-{
- TimeLineID timeline;
- XLogRecPtr startptr;
- XLogRecPtr endptr;
- bool endptr_reached;
-} XLogDumpPrivate;
-
typedef struct XLogDumpConfig
{
/* display options */
@@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
return NULL; /* not reached */
}
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+ int reqLen)
+{
+ int count = XLOG_BLCKSZ;
+
+ if (XLogRecPtrIsValid(private->endptr))
+ {
+ if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+ count = XLOG_BLCKSZ;
+ else if (targetPagePtr + reqLen <= private->endptr)
+ count = private->endptr - targetPagePtr;
+ else
+ {
+ private->endptr_reached = true;
+ return -1;
+ }
+ }
+
+ return count;
+}
+
/* pg_waldump's XLogReaderRoutine->segment_open callback */
static void
WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
XLogRecPtr targetPtr, char *readBuff)
{
XLogDumpPrivate *private = state->private_data;
- int count = XLOG_BLCKSZ;
+ int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- if (XLogRecPtrIsValid(private->endptr))
- {
- if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
- count = XLOG_BLCKSZ;
- else if (targetPagePtr + reqLen <= private->endptr)
- count = private->endptr - targetPagePtr;
- else
- {
- private->endptr_reached = true;
- return -1;
- }
- }
+ /* Bail out if the count to be read is not valid */
+ if (count < 0)
+ return -1;
if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
&errinfo))
@@ -801,7 +811,6 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
- int WalSegSz;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -855,6 +864,7 @@ main(int argc, char **argv)
memset(&stats, 0, sizeof(XLogStats));
private.timeline = 1;
+ private.segsize = 0;
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
@@ -1128,18 +1138,18 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &WalSegSz);
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
{
pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.startptr),
@@ -1149,7 +1159,7 @@ main(int argc, char **argv)
/* no second file specified, set end position */
if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
/* parse ENDSEG if passed */
if (optind + 1 < argc)
@@ -1165,14 +1175,14 @@ main(int argc, char **argv)
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
if (endsegno < segno)
pg_fatal("ENDSEG %s is before STARTSEG %s",
argv[optind + 1], argv[optind]);
if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
private.endptr);
/* set segno to endsegno for check of --end */
@@ -1180,8 +1190,8 @@ main(int argc, char **argv)
}
- if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
- private.endptr != (segno + 1) * WalSegSz)
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
{
pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
LSN_FORMAT_ARGS(private.endptr),
@@ -1190,7 +1200,7 @@ main(int argc, char **argv)
}
}
else
- waldir = identify_target_directory(waldir, NULL, &WalSegSz);
+ waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
if (!XLogRecPtrIsValid(private.startptr))
@@ -1203,7 +1213,7 @@ main(int argc, char **argv)
/* we have everything we need, start reading */
xlogreader_state =
- XLogReaderAllocate(WalSegSz, waldir,
+ XLogReaderAllocate(private.segsize, waldir,
XL_ROUTINE(.page_read = WALDumpReadPage,
.segment_open = WALDumpOpenSegment,
.segment_close = WALDumpCloseSegment),
@@ -1224,7 +1234,7 @@ main(int argc, char **argv)
* a segment (e.g. we were used in file mode).
*/
if (first_record != private.startptr &&
- XLogSegmentOffset(private.startptr, WalSegSz) != 0)
+ XLogSegmentOffset(private.startptr, private.segsize) != 0)
pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte",
"first record is after %X/%08X, at %X/%08X, skipping over %u bytes",
(first_record - private.startptr)),
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..013b051506f
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+ TimeLineID timeline;
+ int segsize;
+ XLogRecPtr startptr;
+ XLogRecPtr endptr;
+ bool endptr_reached;
+} XLogDumpPrivate;
+
+#endif /* PG_WALDUMP_H */
--
2.47.1
[application/octet-stream] v22-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch (56.9K, 4-v22-0003-pg_waldump-Add-support-for-reading-WAL-from-tar-.patch)
download | inline diff:
From 1524a491d93f1abb34530bc9fef4116bbd84d33a Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 19 Mar 2026 15:43:46 +0530
Subject: [PATCH v22 3/5] pg_waldump: Add support for reading WAL from tar
archives
pg_waldump can now accept the path to a tar archive (optionally
compressed with gzip, lz4, or zstd) containing WAL files and decode
them. This was added primarily for pg_verifybackup, which previously
had to skip WAL parsing for tar-format backups.
The implementation uses the existing archive streamer infrastructure
with a hash table to track WAL segments read from the archive. If WAL
files within the archive are not in sequential order, out-of-order
segments are written to a temporary directory (created via mkdtemp under
$TMPDIR or the archive's directory) and read back when needed. An
atexit callback ensures the temporary directory is cleaned up.
The --follow option is not supported when reading from a tar archive.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
Reviewed-by: Zsolt Parragi <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_waldump.sgml | 23 +-
src/bin/pg_waldump/Makefile | 7 +-
src/bin/pg_waldump/archive_waldump.c | 861 +++++++++++++++++++++++++++
src/bin/pg_waldump/meson.build | 4 +-
src/bin/pg_waldump/pg_waldump.c | 288 +++++++--
src/bin/pg_waldump/pg_waldump.h | 51 ++
src/bin/pg_waldump/t/001_basic.pl | 246 ++++++--
src/tools/pgindent/typedefs.list | 4 +
8 files changed, 1356 insertions(+), 128 deletions(-)
create mode 100644 src/bin/pg_waldump/archive_waldump.c
diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d1715ff5124..9bbb4bd5772 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,21 @@ PostgreSQL documentation
<term><option>--path=<replaceable>path</replaceable></option></term>
<listitem>
<para>
- Specifies a directory to search for WAL segment files or a
- directory with a <literal>pg_wal</literal> subdirectory that
+ Specifies a tar archive or a directory to search for WAL segment files
+ or a directory with a <literal>pg_wal</literal> subdirectory that
contains such files. The default is to search in the current
directory, the <literal>pg_wal</literal> subdirectory of the
current directory, and the <literal>pg_wal</literal> subdirectory
of <envar>PGDATA</envar>.
</para>
+ <para>
+ If a tar archive is provided and its WAL segment files are not in
+ sequential order, those files will be written to a temporary directory
+ named starting with <filename>waldump_tmp</filename>. This directory will be
+ created inside the directory specified by the <envar>TMPDIR</envar>
+ environment variable if it is set; otherwise, it will be created within
+ the same directory as the tar archive.
+ </para>
</listitem>
</varlistentry>
@@ -383,6 +391,17 @@ PostgreSQL documentation
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term><envar>TMPDIR</envar></term>
+ <listitem>
+ <para>
+ Directory in which to create temporary files when reading WAL from a
+ tar archive with out-of-order segment files. If not set, the temporary
+ directory is created within the same directory as the tar archive.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</refsect1>
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
PGFILEDESC = "pg_waldump - decode and display WAL"
PGAPPICON=win32
+# make these available to TAP test scripts
+export TAR
+
subdir = src/bin/pg_waldump
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
$(RMGRDESCOBJS) \
$(WIN32RES) \
+ archive_waldump.o \
compat.o \
pg_waldump.o \
rmgrdesc.o \
xlogreader.o \
xlogstats.o
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..f372777366e
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,861 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ * A generic facility for reading WAL data from tar archives via archive
+ * streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/file_perm.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE (128 * 1024)
+
+/* Temporary directory for spilled WAL segment files */
+char *TmpWalSegDir = NULL;
+
+/*
+ * Check if the start segment number is zero; this indicates a request to read
+ * any WAL file.
+ */
+#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0)
+
+/*
+ * Hash entry representing a WAL segment retrieved from the archive.
+ *
+ * While WAL segments are typically read sequentially, individual entries
+ * maintain their own buffers for the following reasons:
+ *
+ * 1. Boundary Handling: The archive streamer provides a continuous byte
+ * stream. A single streaming chunk may contain the end of one WAL segment
+ * and the start of the next. Separate buffers allow us to easily
+ * partition and track these bytes by their respective segments.
+ *
+ * 2. Out-of-Order Support: Dedicated buffers simplify logic when segments
+ * are archived or retrieved out of sequence.
+ *
+ * To minimize the memory footprint, entries and their associated buffers are
+ * freed immediately once consumed. Since pg_waldump does not request the same
+ * bytes twice, a segment is discarded as soon as pg_waldump moves past it.
+ */
+typedef struct ArchivedWALFile
+{
+ uint32 status; /* hash status */
+ const char *fname; /* hash key: WAL segment name */
+
+ StringInfo buf; /* holds WAL bytes read from archive */
+ bool spilled; /* true if the WAL data was spilled to a
+ * temporary file */
+
+ int read_len; /* total bytes received from archive for this
+ * segment, including already-consumed data */
+} ArchivedWALFile;
+
+static uint32 hash_string_pointer(const char *s);
+#define SH_PREFIX ArchivedWAL
+#define SH_ELEMENT_TYPE ArchivedWALFile
+#define SH_KEY_TYPE const char *
+#define SH_KEY fname
+#define SH_HASH_KEY(tb, key) hash_string_pointer(key)
+#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0)
+#define SH_SCOPE static inline
+#define SH_RAW_ALLOCATOR pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+typedef struct astreamer_waldump
+{
+ astreamer base;
+ XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static ArchivedWALFile *get_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static void setup_tmpwal_dir(const char *waldir);
+static void cleanup_tmpwal_dir_atexit(void);
+
+static FILE *prepare_tmp_write(const char *fname, XLogDumpPrivate *privateInfo);
+static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+ astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+ astreamer_member *member,
+ char **fname);
+
+static const astreamer_ops astreamer_waldump_ops = {
+ .content = astreamer_waldump_content,
+ .finalize = astreamer_waldump_finalize,
+ .free = astreamer_waldump_free
+};
+
+/*
+ * Initializes the tar archive reader: opens the archive, builds a hash table
+ * for WAL entries, reads ahead until a full WAL page header is available to
+ * determine the WAL segment size, and computes start/end segment numbers for
+ * filtering.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo,
+ pg_compress_algorithm compression)
+{
+ int fd;
+ astreamer *streamer;
+ ArchivedWALFile *entry = NULL;
+ XLogLongPageHeader longhdr;
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /* Open tar archive and store its file descriptor */
+ fd = open_file_in_directory(privateInfo->archive_dir,
+ privateInfo->archive_name);
+
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+ privateInfo->archive_fd = fd;
+
+ streamer = astreamer_waldump_new(privateInfo);
+
+ /* We must first parse the tar archive. */
+ streamer = astreamer_tar_parser_new(streamer);
+
+ /* If the archive is compressed, decompress before parsing. */
+ if (compression == PG_COMPRESSION_GZIP)
+ streamer = astreamer_gzip_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_LZ4)
+ streamer = astreamer_lz4_decompressor_new(streamer);
+ else if (compression == PG_COMPRESSION_ZSTD)
+ streamer = astreamer_zstd_decompressor_new(streamer);
+
+ privateInfo->archive_streamer = streamer;
+
+ /*
+ * Allocate a buffer for reading the archive file to facilitate content
+ * decoding; read requests must not exceed the allocated buffer size.
+ */
+ privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE);
+
+#ifdef USE_ASSERT_CHECKING
+ privateInfo->archive_read_buf_size = READ_CHUNK_SIZE;
+#endif
+
+ /*
+ * Hash table storing WAL entries read from the archive with an arbitrary
+ * initial size.
+ */
+ privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
+
+ /*
+ * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from
+ * the first WAL segment in the archive so we can extract the WAL segment
+ * size from the long page header.
+ */
+ while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ {
+ if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+
+ entry = privateInfo->cur_file;
+ }
+
+ /* Extract the WAL segment size from the long page header */
+ longhdr = (XLogLongPageHeader) entry->buf->data;
+
+ if (!IsValidWalSegSize(longhdr->xlp_seg_size))
+ {
+ pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+ "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+ longhdr->xlp_seg_size),
+ privateInfo->archive_name, longhdr->xlp_seg_size);
+ pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+ exit(1);
+ }
+
+ privateInfo->segsize = longhdr->xlp_seg_size;
+
+ /*
+ * With the WAL segment size available, we can now initialize the
+ * dependent start and end segment numbers.
+ */
+ Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+ XLByteToSeg(privateInfo->startptr, privateInfo->start_segno,
+ privateInfo->segsize);
+
+ if (!XLogRecPtrIsInvalid(privateInfo->endptr))
+ XLByteToSeg(privateInfo->endptr, privateInfo->end_segno,
+ privateInfo->segsize);
+
+ /*
+ * This WAL record was fetched before the filtering parameters
+ * (start_segno and end_segno) were fully initialized. Perform the
+ * relevance check against the user-provided range now; if the WAL falls
+ * outside this range, remove it from the hash table. Subsequent WAL will
+ * be filtered automatically by the archive streamer using the updated
+ * start_segno and end_segno values.
+ */
+ XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ free_archive_wal_entry(entry->fname, privateInfo);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+ /*
+ * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+ * flush any remaining buffered data or to ensure the end of the tar
+ * archive is reached. However, when decoding WAL, once we hit the end
+ * LSN, any remaining buffered data or unread portion of the archive can
+ * be safely ignored.
+ */
+ astreamer_free(privateInfo->archive_streamer);
+
+ /* Free any remaining hash table entries and their buffers. */
+ if (privateInfo->archive_wal_htab != NULL)
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *entry;
+
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->buf != NULL)
+ destroyStringInfo(entry->buf);
+ }
+ ArchivedWAL_destroy(privateInfo->archive_wal_htab);
+ privateInfo->archive_wal_htab = NULL;
+ }
+
+ /* Free the reusable read buffer. */
+ if (privateInfo->archive_read_buf != NULL)
+ {
+ pg_free(privateInfo->archive_read_buf);
+ privateInfo->archive_read_buf = NULL;
+ }
+
+ /* Close the file. */
+ if (close(privateInfo->archive_fd) != 0)
+ pg_log_error("could not close file \"%s\": %m",
+ privateInfo->archive_name);
+}
+
+/*
+ * Copies the requested WAL data from the hash entry's buffer into readBuff.
+ * If the buffer does not yet contain the needed bytes, fetches more data from
+ * the tar archive via the archive streamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+ Size count, char *readBuff)
+{
+ char *p = readBuff;
+ Size nbytes = count;
+ XLogRecPtr recptr = targetPagePtr;
+ int segsize = privateInfo->segsize;
+ XLogSegNo segno;
+ char fname[MAXFNAMELEN];
+ ArchivedWALFile *entry;
+
+ /* Identify the segment and locate its entry in the archive hash */
+ XLByteToSeg(targetPagePtr, segno, segsize);
+ XLogFileName(fname, privateInfo->timeline, segno, segsize);
+ entry = get_archive_wal_entry(fname, privateInfo);
+
+ while (nbytes > 0)
+ {
+ char *buf = entry->buf->data;
+ int bufLen = entry->buf->len;
+ XLogRecPtr endPtr;
+ XLogRecPtr startPtr;
+
+ /*
+ * Calculate the LSN range currently residing in the buffer.
+ *
+ * read_len tracks total bytes received for this segment (including
+ * already-discarded data), so endPtr is the LSN just past the last
+ * buffered byte, and startPtr is the LSN of the first buffered byte.
+ */
+ XLogSegNoOffsetToRecPtr(segno, entry->read_len, segsize, endPtr);
+ startPtr = endPtr - bufLen;
+
+ /*
+ * Copy the requested WAL record if it exists in the buffer.
+ */
+ if (bufLen > 0 && startPtr <= recptr && recptr < endPtr)
+ {
+ int copyBytes;
+ int offset = recptr - startPtr;
+
+ /*
+ * Given startPtr <= recptr < endPtr and a total buffer size
+ * 'bufLen', the offset (recptr - startPtr) will always be less
+ * than 'bufLen'.
+ */
+ Assert(offset < bufLen);
+
+ copyBytes = Min(nbytes, bufLen - offset);
+ memcpy(p, buf + offset, copyBytes);
+
+ /* Update state for read */
+ recptr += copyBytes;
+ nbytes -= copyBytes;
+ p += copyBytes;
+ }
+ else
+ {
+ /*
+ * Before starting the actual decoding loop, pg_waldump tries to
+ * locate the first valid record from the user-specified start
+ * position, which might not be the start of a WAL record and
+ * could fall in the middle of a record that spans multiple pages.
+ * Consequently, the valid start position the decoder is looking
+ * for could be far away from that initial position.
+ *
+ * This may involve reading across multiple pages, and this
+ * pre-reading fetches data in multiple rounds from the archive
+ * streamer; normally, we would throw away existing buffer
+ * contents to fetch the next set of data, but that existing data
+ * might be needed once the main loop starts. Because previously
+ * read data cannot be re-read by the archive streamer, we delay
+ * resetting the buffer until the main decoding loop is entered.
+ *
+ * Once pg_waldump has entered the main loop, it may re-read the
+ * currently active page, but never an older one; therefore, any
+ * fully consumed WAL data preceding the current page can then be
+ * safely discarded.
+ */
+ if (privateInfo->decoding_started)
+ {
+ resetStringInfo(entry->buf);
+
+ /*
+ * Push back the partial page data for the current page to the
+ * buffer, ensuring a full page remains available for
+ * re-reading if requested.
+ */
+ if (p > readBuff)
+ {
+ Assert((count - nbytes) > 0);
+ appendBinaryStringInfo(entry->buf, readBuff, count - nbytes);
+ }
+ }
+
+ /*
+ * Now, fetch more data. Raise an error if the archive streamer
+ * has moved past our segment (meaning the WAL file in the archive
+ * is shorter than expected) or if reading the archive reached
+ * EOF.
+ */
+ if (privateInfo->cur_file != entry)
+ pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes",
+ fname, privateInfo->archive_name,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
+ privateInfo->archive_name, fname,
+ (long long int) (count - nbytes),
+ (long long int) count);
+ }
+ }
+
+ /*
+ * Should have successfully read all the requested bytes or reported a
+ * failure before this point.
+ */
+ Assert(nbytes == 0);
+
+ /*
+ * NB: We return count unchanged. We could return a boolean since we
+ * either successfully read the WAL page or raise an error, but the caller
+ * expects this value to be returned. The routine that reads WAL pages
+ * from physical WAL files follows the same convention.
+ */
+ return count;
+}
+
+/*
+ * Releases the buffer of a WAL entry that is no longer needed, preventing the
+ * accumulation of irrelevant WAL data. Also removes any associated temporary
+ * file and clears privateInfo->cur_file if it points to this entry, so the
+ * archive streamer skips subsequent data for it.
+ */
+void
+free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry;
+
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry == NULL)
+ return;
+
+ /* Destroy the buffer */
+ destroyStringInfo(entry->buf);
+ entry->buf = NULL;
+
+ /* Remove temporary file if any */
+ if (entry->spilled)
+ {
+ char fpath[MAXPGPATH];
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ if (unlink(fpath) == 0)
+ pg_log_debug("removed file \"%s\"", fpath);
+ }
+
+ /* Clear cur_file if it points to the entry being freed */
+ if (privateInfo->cur_file == entry)
+ privateInfo->cur_file = NULL;
+
+ ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry);
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it already exists.
+ * Otherwise, reads more data from the archive until the requested entry is
+ * found. If the archive streamer is reading a WAL file from the archive that
+ * is not currently needed, that data is spilled to a temporary file for later
+ * retrieval.
+ */
+static ArchivedWALFile *
+get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ ArchivedWALFile *entry = NULL;
+ FILE *write_fp = NULL;
+
+ /*
+ * Search the hash table first. If the entry is found, return it.
+ * Otherwise, the requested WAL entry hasn't been read from the archive
+ * yet; invoke the archive streamer to fetch it.
+ */
+ while (1)
+ {
+ /*
+ * Search hash table.
+ *
+ * We perform the search inside the loop because a single iteration of
+ * the archive reader may decompress and extract multiple files into
+ * the hash table. One of these newly added files could be the one we
+ * are seeking.
+ */
+ entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
+
+ if (entry != NULL)
+ return entry;
+
+ /*
+ * Capture the current entry before calling read_archive_file(),
+ * because cur_file may advance to a new segment during streaming. We
+ * hold this reference so we can flush any remaining buffer data and
+ * close the write handle once we detect that cur_file has moved on.
+ */
+ entry = privateInfo->cur_file;
+
+ /*
+ * Fetch more data either when no current file is being tracked or
+ * when its buffer has been fully flushed to the temporary file.
+ */
+ if (entry == NULL || entry->buf->len == 0)
+ {
+ if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ break; /* archive file ended */
+ }
+
+ /*
+ * Archive streamer is reading a non-WAL file or an irrelevant WAL
+ * file.
+ */
+ if (entry == NULL)
+ continue;
+
+ /*
+ * The streamer is producing a WAL segment that isn't the one asked
+ * for; it must be arriving out of order. Spill its data to disk so
+ * it can be read back when needed.
+ */
+ Assert(strcmp(fname, entry->fname) != 0);
+
+ /* Create a temporary file if one does not already exist */
+ if (!entry->spilled)
+ {
+ write_fp = prepare_tmp_write(entry->fname, privateInfo);
+ entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(entry->fname, entry->buf, write_fp);
+ resetStringInfo(entry->buf);
+
+ /*
+ * If cur_file changed since we captured entry above, the archive
+ * streamer has finished this segment and moved on. Close its spill
+ * file handle so data is flushed to disk before the next segment
+ * starts writing to a different handle.
+ */
+ if (entry != privateInfo->cur_file && write_fp != NULL)
+ {
+ fclose(write_fp);
+ write_fp = NULL;
+ }
+ }
+
+ /* Requested WAL segment not found */
+ pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
+ fname, privateInfo->archive_name);
+}
+
+/*
+ * Reads a chunk from the archive file and passes it through the streamer
+ * pipeline for decompression (if needed) and tar member extraction.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+ int rc;
+
+ /* The read request must not exceed the allocated buffer size. */
+ Assert(privateInfo->archive_read_buf_size >= count);
+
+ rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
+ if (rc < 0)
+ pg_fatal("could not read file \"%s\": %m",
+ privateInfo->archive_name);
+
+ /*
+ * Decompress (if required), and then parse the previously read contents
+ * of the tar file.
+ */
+ if (rc > 0)
+ astreamer_content(privateInfo->archive_streamer, NULL,
+ privateInfo->archive_read_buf, rc,
+ ASTREAMER_UNKNOWN);
+
+ return rc;
+}
+
+/*
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpwal_dir(const char *waldir)
+{
+ char *template;
+
+ Assert(TmpWalSegDir == NULL);
+
+ /*
+ * Use the directory specified by the TMPDIR environment variable. If it's
+ * not set, fall back to the provided WAL directory to store WAL files
+ * temporarily.
+ */
+ template = psprintf("%s/waldump_tmp-XXXXXX",
+ getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+ TmpWalSegDir = mkdtemp(template);
+
+ if (TmpWalSegDir == NULL)
+ pg_fatal("could not create directory \"%s\": %m", template);
+
+ canonicalize_path(TmpWalSegDir);
+
+ pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Remove temporary directory at exit, if any.
+ */
+static void
+cleanup_tmpwal_dir_atexit(void)
+{
+ Assert(TmpWalSegDir != NULL);
+
+ rmtree(TmpWalSegDir, true);
+
+ TmpWalSegDir = NULL;
+}
+
+/*
+ * Open a file in the temporary spill directory for writing an out-of-order
+ * WAL segment, creating the directory and registering the cleanup callback
+ * if not already done. Returns the open file handle.
+ */
+static FILE *
+prepare_tmp_write(const char *fname, XLogDumpPrivate *privateInfo)
+{
+ char fpath[MAXPGPATH];
+ FILE *file;
+
+ /*
+ * Setup temporary directory to store WAL segments and set up an exit
+ * callback to remove it upon completion if not already.
+ */
+ if (unlikely(TmpWalSegDir == NULL))
+ {
+ setup_tmpwal_dir(privateInfo->archive_dir);
+ atexit(cleanup_tmpwal_dir_atexit);
+ }
+
+ snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
+
+ /* Open the spill file for writing */
+ file = fopen(fpath, PG_BINARY_W);
+ if (file == NULL)
+ pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+ if (chmod(fpath, pg_file_create_mode))
+ pg_fatal("could not set permissions on file \"%s\": %m",
+ fpath);
+#endif
+
+ pg_log_debug("spilling to temporary file \"%s\"", fpath);
+
+ return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(const char *fname, StringInfo buf, FILE *file)
+{
+ Assert(file);
+
+ errno = 0;
+ if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+ {
+ /*
+ * If write didn't set errno, assume problem is no disk space
+ */
+ if (errno == 0)
+ errno = ENOSPC;
+ pg_fatal("could not write to file \"%s/%s\": %m", TmpWalSegDir, fname);
+ }
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+ astreamer_waldump *streamer;
+
+ streamer = palloc0_object(astreamer_waldump);
+ *((const astreamer_ops **) &streamer->base.bbs_ops) =
+ &astreamer_waldump_ops;
+
+ streamer->privateInfo = privateInfo;
+
+ return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+ const char *data, int len,
+ astreamer_archive_context context)
+{
+ astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+ XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+ Assert(context != ASTREAMER_UNKNOWN);
+
+ switch (context)
+ {
+ case ASTREAMER_MEMBER_HEADER:
+ {
+ char *fname = NULL;
+ ArchivedWALFile *entry;
+ bool found;
+
+ pg_log_debug("reading \"%s\"", member->pathname);
+
+ if (!member_is_wal_file(mystreamer, member, &fname))
+ break;
+
+ /*
+ * Skip range filtering during initial startup, before the WAL
+ * segment size and segment number bounds are known.
+ */
+ if (!READ_ANY_WAL(privateInfo))
+ {
+ XLogSegNo segno;
+ TimeLineID timeline;
+
+ /*
+ * Skip the segment if the timeline does not match, if it
+ * falls outside the caller-specified range.
+ */
+ XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize);
+ if (privateInfo->timeline != timeline ||
+ privateInfo->start_segno > segno ||
+ privateInfo->end_segno < segno)
+ {
+ pfree(fname);
+ break;
+ }
+ }
+
+ entry = ArchivedWAL_insert(privateInfo->archive_wal_htab,
+ fname, &found);
+
+ /*
+ * Shouldn't happen, but if it does, simply ignore the
+ * duplicate WAL file.
+ */
+ if (found)
+ {
+ pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"",
+ member->pathname, privateInfo->archive_name);
+ pfree(fname);
+ break;
+ }
+
+ entry->buf = makeStringInfo();
+ entry->spilled = false;
+ entry->read_len = 0;
+ privateInfo->cur_file = entry;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_CONTENTS:
+ if (privateInfo->cur_file)
+ {
+ appendBinaryStringInfo(privateInfo->cur_file->buf, data, len);
+ privateInfo->cur_file->read_len += len;
+ }
+ break;
+
+ case ASTREAMER_MEMBER_TRAILER:
+
+ /*
+ * End of this tar member; mark cur_file NULL so subsequent
+ * content callbacks (if any) know no WAL file is currently
+ * active.
+ */
+ privateInfo->cur_file = NULL;
+ break;
+
+ case ASTREAMER_ARCHIVE_TRAILER:
+ break;
+
+ default:
+ /* Shouldn't happen. */
+ pg_fatal("unexpected state while parsing tar file");
+ }
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream. This is a
+ * terminal streamer so it must have no successor.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+ Assert(streamer->bbs_next == NULL);
+ pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment name.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+ char **fname)
+{
+ int pathlen;
+ char pathname[MAXPGPATH];
+ char *filename;
+
+ /* We are only interested in normal files */
+ if (member->is_directory || member->is_link)
+ return false;
+
+ if (strlen(member->pathname) < XLOG_FNAME_LEN)
+ return false;
+
+ /*
+ * For a correct comparison, we must remove any '.' or '..' components
+ * from the member pathname. Similar to member_verify_header(), we prepend
+ * './' to the path so that canonicalize_path() can properly resolve and
+ * strip these references from the tar member name.
+ */
+ snprintf(pathname, MAXPGPATH, "./%s", member->pathname);
+ canonicalize_path(pathname);
+ pathlen = strlen(pathname);
+
+ /* Skip files in subdirectories other than pg_wal/ */
+ if (pathlen > XLOG_FNAME_LEN &&
+ strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0)
+ return false;
+
+ /* WAL file may appear with a full path (e.g., pg_wal/<name>) */
+ filename = pathname + (pathlen - XLOG_FNAME_LEN);
+ if (!IsXLogFileName(filename))
+ return false;
+
+ *fname = pnstrdup(filename, XLOG_FNAME_LEN);
+
+ return true;
+}
+
+/*
+ * Helper function for WAL file hash table.
+ */
+static uint32
+hash_string_pointer(const char *s)
+{
+ unsigned char *ss = (unsigned char *) s;
+
+ return hash_bytes(ss, strlen(s));
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..5296f21b82c 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
# Copyright (c) 2022-2026, PostgreSQL Global Development Group
pg_waldump_sources = files(
+ 'archive_waldump.c',
'compat.c',
'pg_waldump.c',
'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
pg_waldump = executable('pg_waldump',
pg_waldump_sources,
- dependencies: [frontend_code, lz4, zstd],
+ dependencies: [frontend_code, libpq, lz4, zstd],
c_args: ['-DFRONTEND'], # needed for xlogreader et al
kwargs: default_bin_args,
)
@@ -29,6 +30,7 @@ tests += {
'sd': meson.current_source_dir(),
'bd': meson.current_build_dir(),
'tap': {
+ 'env': {'TAR': tar.found() ? tar.full_path() : ''},
'tests': [
't/001_basic.pl',
't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 5d31b15dbd8..f82507ef696 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname)
*
* return a read only fd
*/
-static int
+int
open_file_in_directory(const char *directory, const char *fname)
{
int fd = -1;
@@ -327,8 +327,8 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz)
}
/*
- * Returns the size in bytes of the data to be read. Returns -1 if the end
- * point has already been reached.
+ * Returns the number of bytes to read for the given page. Returns -1 if
+ * the requested range has already been reached or exceeded.
*/
static inline int
required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
@@ -412,7 +412,7 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
int count = required_read_len(private, targetPagePtr, reqLen);
WALReadError errinfo;
- /* Bail out if the count to be read is not valid */
+ /* Bail out if the end of the requested range has already been reached */
if (count < 0)
return -1;
@@ -440,6 +440,109 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
return count;
}
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+ TimeLineID *tli_p)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback to support dumping
+ * WAL files from tar archives. Segment tracking is handled by
+ * TarWALDumpReadPage, so no action is needed here.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+ /* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+ XLogRecPtr targetPtr, char *readBuff)
+{
+ XLogDumpPrivate *private = state->private_data;
+ int count = required_read_len(private, targetPagePtr, reqLen);
+ int segsize = state->segcxt.ws_segsize;
+ XLogSegNo curSegNo;
+
+ /* Bail out if the end of the requested range has already been reached */
+ if (count < 0)
+ return -1;
+
+ /*
+ * If the target page is in a different segment, release the hash entry
+ * buffer and remove any spilled temporary file for the previous segment.
+ * Since pg_waldump never requests the same WAL bytes twice, moving to a
+ * new segment means the previous segment's data will not be needed again.
+ *
+ * Afterward, check whether the next required WAL segment was already
+ * spilled to the temporary directory before invoking the archive
+ * streamer.
+ */
+ curSegNo = state->seg.ws_segno;
+ if (!XLByteInSeg(targetPagePtr, curSegNo, segsize))
+ {
+ char fname[MAXFNAMELEN];
+ XLogSegNo nextSegNo;
+
+ /*
+ * Calculate the next WAL segment to be decoded from the given page
+ * pointer.
+ */
+ XLByteToSeg(targetPagePtr, nextSegNo, segsize);
+ state->seg.ws_tli = private->timeline;
+ state->seg.ws_segno = nextSegNo;
+
+ /* Close the WAL segment file if it is currently open */
+ if (state->seg.ws_file >= 0)
+ {
+ close(state->seg.ws_file);
+ state->seg.ws_file = -1;
+ }
+
+ /*
+ * If in pre-reading mode (prior to actual decoding), do not delete
+ * any entries that might be requested again once the decoding loop
+ * starts. For more details, see the comments in
+ * read_archive_wal_page().
+ */
+ if (private->decoding_started && curSegNo < nextSegNo)
+ {
+ XLogFileName(fname, state->seg.ws_tli, curSegNo, segsize);
+ free_archive_wal_entry(fname, private);
+ }
+
+ /*
+ * If the next segment exists in the temporary spill directory, open
+ * it and continue reading from there.
+ */
+ if (TmpWalSegDir != NULL)
+ {
+ XLogFileName(fname, state->seg.ws_tli, nextSegNo, segsize);
+ state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname);
+ }
+ }
+
+ /* Continue reading from the open WAL segment, if any */
+ if (state->seg.ws_file >= 0)
+ return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+ readBuff);
+
+ /* Otherwise, read the WAL page from the archive streamer */
+ return read_archive_wal_page(private, targetPagePtr, count, readBuff);
+}
+
/*
* Boolean to return whether the given WAL record matches a specific relation
* and optionally block.
@@ -777,8 +880,8 @@ usage(void)
printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n"
" valid names are main, fsm, vm, init\n"));
printf(_(" -n, --limit=N number of records to display\n"));
- printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n"
- " directory with a ./pg_wal that contains such files\n"
+ printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n"
+ " a directory with a pg_wal subdirectory containing such files\n"
" (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n"
@@ -811,6 +914,7 @@ main(int argc, char **argv)
XLogRecPtr first_record;
char *waldir = NULL;
char *errormsg;
+ pg_compress_algorithm compression = PG_COMPRESSION_NONE;
static struct option long_options[] = {
{"bkp-details", no_argument, NULL, 'b'},
@@ -868,6 +972,10 @@ main(int argc, char **argv)
private.startptr = InvalidXLogRecPtr;
private.endptr = InvalidXLogRecPtr;
private.endptr_reached = false;
+ private.decoding_started = false;
+ private.archive_name = NULL;
+ private.start_segno = 0;
+ private.end_segno = UINT64_MAX;
config.quiet = false;
config.bkp_details = false;
@@ -1109,8 +1217,13 @@ main(int argc, char **argv)
if (waldir != NULL)
{
- /* validate path points to directory */
- if (!verify_directory(waldir))
+ /* Check whether the path looks like a tar archive by its extension */
+ if (parse_tar_compress_algorithm(waldir, &compression))
+ {
+ split_path(waldir, &private.archive_dir, &private.archive_name);
+ }
+ /* Otherwise it must be a directory */
+ else if (!verify_directory(waldir))
{
pg_log_error("could not open directory \"%s\": %m", waldir);
goto bad_argument;
@@ -1128,6 +1241,17 @@ main(int argc, char **argv)
int fd;
XLogSegNo segno;
+ /*
+ * If a tar archive is passed using the --path option, all other
+ * arguments become unnecessary.
+ */
+ if (private.archive_name)
+ {
+ pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+ argv[optind]);
+ goto bad_argument;
+ }
+
split_path(argv[optind], &directory, &fname);
if (waldir == NULL && directory != NULL)
@@ -1138,68 +1262,75 @@ main(int argc, char **argv)
pg_fatal("could not open directory \"%s\": %m", waldir);
}
- waldir = identify_target_directory(waldir, fname, &private.segsize);
- fd = open_file_in_directory(waldir, fname);
- if (fd < 0)
- pg_fatal("could not open file \"%s\"", fname);
- close(fd);
-
- /* parse position from file */
- XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
-
- if (!XLogRecPtrIsValid(private.startptr))
- XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
- else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ if (fname != NULL && parse_tar_compress_algorithm(fname, &compression))
{
- pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.startptr),
- fname);
- goto bad_argument;
+ private.archive_dir = waldir;
+ private.archive_name = fname;
}
-
- /* no second file specified, set end position */
- if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
-
- /* parse ENDSEG if passed */
- if (optind + 1 < argc)
+ else
{
- XLogSegNo endsegno;
-
- /* ignore directory, already have that */
- split_path(argv[optind + 1], &directory, &fname);
-
+ waldir = identify_target_directory(waldir, fname, &private.segsize);
fd = open_file_in_directory(waldir, fname);
if (fd < 0)
pg_fatal("could not open file \"%s\"", fname);
close(fd);
/* parse position from file */
- XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+ XLogFromFileName(fname, &private.timeline, &segno, private.segsize);
- if (endsegno < segno)
- pg_fatal("ENDSEG %s is before STARTSEG %s",
- argv[optind + 1], argv[optind]);
+ if (!XLogRecPtrIsValid(private.startptr))
+ XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr);
+ else if (!XLByteInSeg(private.startptr, segno, private.segsize))
+ {
+ pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.startptr),
+ fname);
+ goto bad_argument;
+ }
- if (!XLogRecPtrIsValid(private.endptr))
- XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
- private.endptr);
+ /* no second file specified, set end position */
+ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr);
- /* set segno to endsegno for check of --end */
- segno = endsegno;
- }
+ /* parse ENDSEG if passed */
+ if (optind + 1 < argc)
+ {
+ XLogSegNo endsegno;
+ /* ignore directory, already have that */
+ split_path(argv[optind + 1], &directory, &fname);
- if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
- private.endptr != (segno + 1) * private.segsize)
- {
- pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
- LSN_FORMAT_ARGS(private.endptr),
- argv[argc - 1]);
- goto bad_argument;
+ fd = open_file_in_directory(waldir, fname);
+ if (fd < 0)
+ pg_fatal("could not open file \"%s\"", fname);
+ close(fd);
+
+ /* parse position from file */
+ XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize);
+
+ if (endsegno < segno)
+ pg_fatal("ENDSEG %s is before STARTSEG %s",
+ argv[optind + 1], argv[optind]);
+
+ if (!XLogRecPtrIsValid(private.endptr))
+ XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize,
+ private.endptr);
+
+ /* set segno to endsegno for check of --end */
+ segno = endsegno;
+ }
+
+ if (!XLByteInSeg(private.endptr, segno, private.segsize) &&
+ private.endptr != (segno + 1) * private.segsize)
+ {
+ pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+ LSN_FORMAT_ARGS(private.endptr),
+ argv[argc - 1]);
+ goto bad_argument;
+ }
}
}
- else
+ else if (!private.archive_name)
waldir = identify_target_directory(waldir, NULL, &private.segsize);
/* we don't know what to print */
@@ -1209,15 +1340,46 @@ main(int argc, char **argv)
goto bad_argument;
}
+ /* --follow is not supported with tar archives */
+ if (config.follow && private.archive_name)
+ {
+ pg_log_error("--follow is not supported when reading from a tar archive");
+ goto bad_argument;
+ }
+
/* done with argument parsing, do the actual work */
/* we have everything we need, start reading */
- xlogreader_state =
- XLogReaderAllocate(private.segsize, waldir,
- XL_ROUTINE(.page_read = WALDumpReadPage,
- .segment_open = WALDumpOpenSegment,
- .segment_close = WALDumpCloseSegment),
- &private);
+ if (private.archive_name)
+ {
+ /*
+ * A NULL directory indicates that the archive file is located in the
+ * current working directory.
+ */
+ if (private.archive_dir == NULL)
+ private.archive_dir = pg_strdup(".");
+
+ /* Set up for reading tar file */
+ init_archive_reader(&private, compression);
+
+ /* Routine to decode WAL files in tar archive */
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, private.archive_dir,
+ XL_ROUTINE(.page_read = TarWALDumpReadPage,
+ .segment_open = TarWALDumpOpenSegment,
+ .segment_close = TarWALDumpCloseSegment),
+ &private);
+ }
+ else
+ {
+ xlogreader_state =
+ XLogReaderAllocate(private.segsize, waldir,
+ XL_ROUTINE(.page_read = WALDumpReadPage,
+ .segment_open = WALDumpOpenSegment,
+ .segment_close = WALDumpCloseSegment),
+ &private);
+ }
+
if (!xlogreader_state)
pg_fatal("out of memory while allocating a WAL reading processor");
@@ -1245,6 +1407,9 @@ main(int argc, char **argv)
if (config.stats == true && !config.quiet)
stats.startptr = first_record;
+ /* Flag indicating that the decoding loop has been entered */
+ private.decoding_started = true;
+
for (;;)
{
if (time_to_stop)
@@ -1326,6 +1491,9 @@ main(int argc, char **argv)
XLogReaderFree(xlogreader_state);
+ if (private.archive_name)
+ free_archive_reader(&private);
+
return EXIT_SUCCESS;
bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 013b051506f..36893624f53 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,6 +12,14 @@
#define PG_WALDUMP_H
#include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
+
+/* Forward declaration */
+struct ArchivedWALFile;
+struct ArchivedWAL_hash;
+
+/* Temporary directory for spilling out-of-order WAL segments from archives */
+extern char *TmpWalSegDir;
/* Contains the necessary information to drive WAL decoding */
typedef struct XLogDumpPrivate
@@ -21,6 +29,49 @@ typedef struct XLogDumpPrivate
XLogRecPtr startptr;
XLogRecPtr endptr;
bool endptr_reached;
+ bool decoding_started;
+
+ /* Fields required to read WAL from archive */
+ char *archive_dir;
+ char *archive_name; /* Tar archive filename */
+ int archive_fd; /* File descriptor for the open tar file */
+
+ astreamer *archive_streamer;
+ char *archive_read_buf; /* Reusable read buffer for archive I/O */
+
+#ifdef USE_ASSERT_CHECKING
+ Size archive_read_buf_size;
+#endif
+
+ /* What the archive streamer is currently reading */
+ struct ArchivedWALFile *cur_file;
+
+ /*
+ * Hash table of WAL segments currently buffered from the archive,
+ * including any segment currently being streamed. Entries are removed
+ * once consumed, so this does not accumulate all segments ever read.
+ */
+ struct ArchivedWAL_hash *archive_wal_htab;
+
+ /*
+ * Pre-computed segment numbers derived from startptr and endptr. Caching
+ * them avoids repeated XLByteToSeg() calls when filtering each archive
+ * member against the requested WAL range. end_segno is initialized to
+ * UINT64_MAX when no end limit is requested.
+ */
+ XLogSegNo start_segno;
+ XLogSegNo end_segno;
} XLogDumpPrivate;
+extern int open_file_in_directory(const char *directory, const char *fname);
+
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+ pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int read_archive_wal_page(XLogDumpPrivate *privateInfo,
+ XLogRecPtr targetPagePtr,
+ Size count, char *readBuff);
+extern void free_archive_wal_entry(const char *fname,
+ XLogDumpPrivate *privateInfo);
+
#endif /* PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..11df7e092bf 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,9 +3,13 @@
use strict;
use warnings FATAL => 'all';
+use Cwd;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;
+use List::Util qw(shuffle);
+
+my $tar = $ENV{TAR};
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path';
DROP TABLESPACE ts1;
});
+# Test: Decode a continuation record (contrecord) that spans multiple WAL
+# segments.
+#
+# Now consume all remaining room in the current WAL segment, leaving
+# space enough only for the start of a largish record.
+$node->safe_psql(
+ 'postgres', q{
+DO $$
+DECLARE
+ wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size';
+ remain int;
+ iters int := 0;
+BEGIN
+ LOOP
+ INSERT into t1(b)
+ select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int)
+ from generate_series(1, 10) g;
+
+ remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize;
+ IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN
+ RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain;
+ EXIT;
+ END IF;
+ iters := iters + 1;
+ END LOOP;
+END
+$$;
+});
+
+my $contrecord_lsn = $node->safe_psql('postgres',
+ 'SELECT pg_current_wal_insert_lsn()');
+# Generate contrecord record
+$node->safe_psql('postgres',
+ qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
+);
+
my ($end_lsn, $end_walfile) = split /\|/,
$node->safe_psql('postgres',
q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
@@ -198,51 +238,23 @@ command_like(
],
qr/./,
'runs with start and end segment specified');
-command_fails_like(
- [ 'pg_waldump', '--path' => $node->data_dir ],
- qr/error: no start WAL location given/,
- 'path option requires start location');
command_like(
[
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
- ],
- qr/./,
- 'runs with path option and start and end locations');
-command_fails_like(
- [
- 'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- ],
- qr/error: error in WAL record at/,
- 'falling off the end of the WAL results in an error');
-
-command_like(
- [
- 'pg_waldump', '--quiet',
- $node->data_dir . '/pg_wal/' . $start_walfile
+ 'pg_waldump', '--quiet', '--path',
+ $node->data_dir . '/pg_wal/', $start_walfile
],
qr/^$/,
'no output with --quiet option');
-command_fails_like(
- [
- 'pg_waldump', '--quiet',
- '--path' => $node->data_dir,
- '--start' => $start_lsn
- ],
- qr/error: error in WAL record at/,
- 'errors are shown with --quiet');
-
# Test for: Display a message that we're skipping data if `from`
# wasn't a pointer to the start of a record.
+sub test_pg_waldump_skip_bytes
{
+ my ($path, $startlsn, $endlsn) = @_;
+
# Construct a new LSN that is one byte past the original
# start_lsn.
- my ($part1, $part2) = split qr{/}, $start_lsn;
+ my ($part1, $part2) = split qr{/}, $startlsn;
my $lsn2 = hex $part2;
$lsn2++;
my $new_start = sprintf("%s/%X", $part1, $lsn2);
@@ -252,7 +264,8 @@ command_fails_like(
my $result = IPC::Run::run [
'pg_waldump',
'--start' => $new_start,
- $node->data_dir . '/pg_wal/' . $start_walfile
+ '--end' => $endlsn,
+ '--path' => $path,
],
'>' => \$stdout,
'2>' => \$stderr;
@@ -266,15 +279,15 @@ command_fails_like(
sub test_pg_waldump
{
local $Test::Builder::Level = $Test::Builder::Level + 1;
- my @opts = @_;
+ my ($path, $startlsn, $endlsn, @opts) = @_;
my ($stdout, $stderr);
my $result = IPC::Run::run [
'pg_waldump',
- '--path' => $node->data_dir,
- '--start' => $start_lsn,
- '--end' => $end_lsn,
+ '--start' => $startlsn,
+ '--end' => $endlsn,
+ '--path' => $path,
@opts
],
'>' => \$stdout,
@@ -286,40 +299,145 @@ sub test_pg_waldump
return @lines;
}
-my @lines;
+# Create a tar archive, shuffle the file order
+sub generate_archive
+{
+ my ($archive, $directory, $compression_flags) = @_;
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+ my @files;
+ opendir my $dh, $directory or die "opendir: $!";
+ while (my $entry = readdir $dh) {
+ # Skip '.' and '..'
+ next if $entry eq '.' || $entry eq '..';
+ push @files, $entry;
+ }
+ closedir $dh;
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+ @files = shuffle @files;
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+ # move into the WAL directory before archiving files
+ my $cwd = getcwd;
+ chdir($directory) || die "chdir: $!";
+ command_ok([$tar, $compression_flags, $archive, @files]);
+ chdir($cwd) || die "chdir: $!";
+}
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+my @scenarios = (
+ {
+ 'path' => $node->data_dir,
+ 'is_archive' => 0,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar",
+ 'compression_method' => 'none',
+ 'compression_flags' => '-cf',
+ 'is_archive' => 1,
+ 'enabled' => 1
+ },
+ {
+ 'path' => "$tmp_dir/pg_wal.tar.gz",
+ 'compression_method' => 'gzip',
+ 'compression_flags' => '-czf',
+ 'is_archive' => 1,
+ 'enabled' => check_pg_config("#define HAVE_LIBZ 1")
+ });
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+for my $scenario (@scenarios)
+{
+ my $path = $scenario->{'path'};
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+ SKIP:
+ {
+ skip "tar command is not available", 56
+ if !defined $tar && $scenario->{'is_archive'};
+ skip "$scenario->{'compression_method'} compression not supported by this build", 56
+ if !$scenario->{'enabled'} && $scenario->{'is_archive'};
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
- 0, 'only lines for selected relation');
+ # create pg_wal archive
+ if ($scenario->{'is_archive'})
+ {
+ generate_archive($path,
+ $node->data_dir . '/pg_wal',
+ $scenario->{'compression_flags'});
+ }
-@lines = test_pg_waldump(
- '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
- '--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+ command_fails_like(
+ [ 'pg_waldump', '--path' => $path ],
+ qr/error: no start WAL location given/,
+ 'path option requires start location');
+ command_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ '--end' => $end_lsn,
+ ],
+ qr/./,
+ 'runs with path option and start and end locations');
+ command_fails_like(
+ [
+ 'pg_waldump',
+ '--path' => $path,
+ '--start' => $start_lsn,
+ ],
+ qr/error: error in WAL record at/,
+ 'falling off the end of the WAL results in an error');
+ command_fails_like(
+ [
+ 'pg_waldump', '--quiet',
+ '--path' => $path,
+ '--start' => $start_lsn
+ ],
+ qr/error: error in WAL record at/,
+ 'errors are shown with --quiet');
+
+ test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn);
+
+ my @lines = test_pg_waldump($path, $start_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn);
+ is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+
+ test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn);
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6);
+ is(@lines, 6, 'limit option observed');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage');
+ is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record');
+ like($lines[0], qr/WAL statistics/, "statistics on stdout");
+ is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree');
+ is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init');
+ is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+ is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+ 0, 'only lines for selected relation');
+
+ @lines = test_pg_waldump($path, $start_lsn, $end_lsn,
+ '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+ '--block' => 1);
+ is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+ # Cleanup.
+ unlink $path if $scenario->{'is_archive'};
+ }
+}
done_testing();
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a4a2ed07816..3f428a64b47 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -147,6 +147,9 @@ ArchiveOpts
ArchiveShutdownCB
ArchiveStartupCB
ArchiveStreamState
+ArchivedWALFile
+ArchivedWAL_hash
+ArchivedWAL_iterator
ArchiverOutput
ArchiverStage
ArrayAnalyzeExtraData
@@ -3544,6 +3547,7 @@ astreamer_recovery_injector
astreamer_tar_archiver
astreamer_tar_parser
astreamer_verify
+astreamer_waldump
astreamer_zstd_frame
auth_password_hook_typ
autovac_table
--
2.47.1
[application/octet-stream] v22-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch (16.7K, 5-v22-0004-pg_verifybackup-Enable-WAL-parsing-for-tar-forma.patch)
download | inline diff:
From f758c8ae6e97140a9ea329f4e7f6ec8bb6271afd Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Thu, 19 Mar 2026 15:43:53 +0530
Subject: [PATCH v22 4/5] pg_verifybackup: Enable WAL parsing for tar-format
backups
Now that pg_waldump supports reading WAL from tar archives, remove the
restriction that forced --no-parse-wal for tar-format backups.
pg_verifybackup now automatically locates the WAL archive: it looks for
a separate pg_wal.tar first, then falls back to the main base.tar. A
new --wal-path option (replacing the old --wal-directory, which is kept
as a silent alias) accepts either a directory or a tar archive path.
The default WAL directory preparation is deferred until the backup
format is known, since tar-format backups resolve the WAL path
differently from plain-format ones.
Author: Amul Sul <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jakub Wartak <[email protected]>
Reviewed-by: Chao Li <[email protected]>
Reviewed-by: Euler Taveira <[email protected]>
Reviewed-by: Andrew Dunstan <[email protected]>
discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com
---
doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++-
src/bin/pg_verifybackup/pg_verifybackup.c | 96 ++++++++++++-------
src/bin/pg_verifybackup/t/002_algorithm.pl | 4 -
src/bin/pg_verifybackup/t/003_corruption.pl | 4 +-
src/bin/pg_verifybackup/t/007_wal.pl | 20 +++-
src/bin/pg_verifybackup/t/008_untar.pl | 5 +-
src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +-
7 files changed, 91 insertions(+), 57 deletions(-)
diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..1695cfe91c8 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
<literal>backup_manifest</literal> generated by the server at the time
of the backup. The backup may be stored either in the "plain" or the "tar"
format; this includes tar-format backups compressed with any algorithm
- supported by <application>pg_basebackup</application>. However, at present,
- <literal>WAL</literal> verification is supported only for plain-format
- backups. Therefore, if the backup is stored in tar-format, the
- <literal>-n, --no-parse-wal</literal> option should be used.
+ supported by <application>pg_basebackup</application>.
</para>
<para>
@@ -261,12 +258,13 @@ PostgreSQL documentation
<varlistentry>
<term><option>-w <replaceable class="parameter">path</replaceable></option></term>
- <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+ <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
<listitem>
<para>
- Try to parse WAL files stored in the specified directory, rather than
- in <literal>pg_wal</literal>. This may be useful if the backup is
- stored in a separate location from the WAL archive.
+ Try to parse WAL files stored in the specified directory or tar
+ archive, rather than in <literal>pg_wal</literal>. This may be
+ useful if the backup is stored in a separate location from the WAL
+ archive.
</para>
</listitem>
</varlistentry>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 31f606c45b1..b60ab8739d5 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
const char *fmt,...)
pg_attribute_printf(2, 3);
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_plain_backup_directory(verifier_context *context,
char *relpath, char *fullpath,
DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
static void verify_control_file(const char *controlpath,
uint64 manifest_system_identifier);
static void precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles);
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path,
+ char **wal_archive_path);
static void verify_tar_file(verifier_context *context, char *relpath,
char *fullpath, astreamer *streamer);
static void report_extra_backup_files(verifier_context *context);
@@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context,
uint8 *buffer);
static void parse_required_wal(verifier_context *context,
char *pg_waldump_path,
- char *wal_directory);
+ char *wal_path);
static astreamer *create_archive_verifier(verifier_context *context,
char *archive_name,
Oid tblspc_oid,
@@ -126,7 +130,8 @@ main(int argc, char **argv)
{"progress", no_argument, NULL, 'P'},
{"quiet", no_argument, NULL, 'q'},
{"skip-checksums", no_argument, NULL, 's'},
- {"wal-directory", required_argument, NULL, 'w'},
+ {"wal-path", required_argument, NULL, 'w'},
+ {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */
{NULL, 0, NULL, 0}
};
@@ -135,7 +140,9 @@ main(int argc, char **argv)
char *manifest_path = NULL;
bool no_parse_wal = false;
bool quiet = false;
- char *wal_directory = NULL;
+ char *wal_path = NULL;
+ char *base_archive_path = NULL;
+ char *wal_archive_path = NULL;
char *pg_waldump_path = NULL;
DIR *dir;
@@ -221,8 +228,8 @@ main(int argc, char **argv)
context.skip_checksums = true;
break;
case 'w':
- wal_directory = pstrdup(optarg);
- canonicalize_path(wal_directory);
+ wal_path = pstrdup(optarg);
+ canonicalize_path(wal_path);
break;
default:
/* getopt_long already emitted a complaint */
@@ -285,10 +292,6 @@ main(int argc, char **argv)
manifest_path = psprintf("%s/backup_manifest",
context.backup_directory);
- /* By default, look for the WAL in the backup directory, too. */
- if (wal_directory == NULL)
- wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
/*
* Try to read the manifest. We treat any errors encountered while parsing
* the manifest as fatal; there doesn't seem to be much point in trying to
@@ -331,17 +334,6 @@ main(int argc, char **argv)
pfree(path);
}
- /*
- * XXX: In the future, we should consider enhancing pg_waldump to read WAL
- * files from an archive.
- */
- if (!no_parse_wal && context.format == 't')
- {
- pg_log_error("pg_waldump cannot read tar files");
- pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
- exit(1);
- }
-
/*
* Perform the appropriate type of verification appropriate based on the
* backup format. This will close 'dir'.
@@ -350,7 +342,7 @@ main(int argc, char **argv)
verify_plain_backup_directory(&context, NULL, context.backup_directory,
dir);
else
- verify_tar_backup(&context, dir);
+ verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
/*
* The "matched" flag should now be set on every entry in the hash table.
@@ -368,12 +360,35 @@ main(int argc, char **argv)
if (context.format == 'p' && !context.skip_checksums)
verify_backup_checksums(&context);
+ /*
+ * By default, WAL files are expected to be found in the backup directory
+ * for plain-format backups. In the case of tar-format backups, if a
+ * separate WAL archive is not found, the WAL files are most likely
+ * included within the main data directory archive.
+ */
+ if (wal_path == NULL)
+ {
+ if (context.format == 'p')
+ wal_path = psprintf("%s/pg_wal", context.backup_directory);
+ else if (wal_archive_path)
+ wal_path = wal_archive_path;
+ else if (base_archive_path)
+ wal_path = base_archive_path;
+ else
+ {
+ pg_log_error("WAL archive not found");
+ pg_log_error_hint("Specify the correct path using the option -w/--wal-path. "
+ "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+ exit(1);
+ }
+ }
+
/*
* Try to parse the required ranges of WAL records, unless we were told
* not to do so.
*/
if (!no_parse_wal)
- parse_required_wal(&context, pg_waldump_path, wal_directory);
+ parse_required_wal(&context, pg_waldump_path, wal_path);
/*
* If everything looks OK, tell the user this, unless we were asked to
@@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
* close when we're done with it.
*/
static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+ char **wal_archive_path)
{
struct dirent *dirent;
SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
char *fullpath;
fullpath = psprintf("%s/%s", context->backup_directory, filename);
- precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+ precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+ base_archive_path, wal_archive_path);
pfree(fullpath);
}
}
@@ -875,17 +892,21 @@ verify_tar_backup(verifier_context *context, DIR *dir)
*
* The arguments to this function are mostly the same as the
* verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
*/
static void
precheck_tar_backup_file(verifier_context *context, char *relpath,
- char *fullpath, SimplePtrList *tarfiles)
+ char *fullpath, SimplePtrList *tarfiles,
+ char **base_archive_path, char **wal_archive_path)
{
struct stat sb;
Oid tblspc_oid = InvalidOid;
pg_compress_algorithm compress_algorithm;
tar_file *tar;
char *suffix = NULL;
+ bool is_base_archive = false;
+ bool is_wal_archive = false;
/* Should be tar format backup */
Assert(context->format == 't');
@@ -918,9 +939,15 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* extension such as .gz, .lz4, or .zst.
*/
if (strncmp("base", relpath, 4) == 0)
+ {
suffix = relpath + 4;
+ is_base_archive = true;
+ }
else if (strncmp("pg_wal", relpath, 6) == 0)
+ {
suffix = relpath + 6;
+ is_wal_archive = true;
+ }
else
{
/* Expected a <tablespaceoid>.tar file here. */
@@ -953,8 +980,13 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
* Ignore WALs, as reading and verification will be handled through
* pg_waldump.
*/
- if (strncmp("pg_wal", relpath, 6) == 0)
+ if (is_wal_archive)
+ {
+ *wal_archive_path = pstrdup(fullpath);
return;
+ }
+ else if (is_base_archive)
+ *base_archive_path = pstrdup(fullpath);
/*
* Append the information to the list for complete verification at a later
@@ -1188,7 +1220,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
*/
static void
parse_required_wal(verifier_context *context, char *pg_waldump_path,
- char *wal_directory)
+ char *wal_path)
{
manifest_data *manifest = context->manifest;
manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1198,7 +1230,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
char *pg_waldump_cmd;
pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
- pg_waldump_path, wal_directory, this_wal_range->tli,
+ pg_waldump_path, wal_path, this_wal_range->tli,
LSN_FORMAT_ARGS(this_wal_range->start_lsn),
LSN_FORMAT_ARGS(this_wal_range->end_lsn));
fflush(NULL);
@@ -1366,7 +1398,7 @@ usage(void)
printf(_(" -P, --progress show progress information\n"));
printf(_(" -q, --quiet do not print any output, except for errors\n"));
printf(_(" -s, --skip-checksums skip checksum verification\n"));
- printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n"));
+ printf(_(" -w, --wal-path=PATH use specified path for WAL files\n"));
printf(_(" -V, --version output version information, then exit\n"));
printf(_(" -?, --help show this help, then exit\n"));
printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
{
# Add switch to get a tar-format backup
push @backup, ('--format' => 'tar');
-
- # Add switch to skip WAL verification, which is not yet supported for
- # tar-format backups
- push @verify, ('--no-parse-wal');
}
# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
chdir($cwd) || die "chdir: $!";
- # Now check that the backup no longer verifies. We must use -n
- # here, because pg_waldump can't yet read WAL from a tarfile.
command_fails_like(
- [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+ [ 'pg_verifybackup', $tar_backup_path ],
$scenario->{'fails_like'},
"corrupt backup fails verification: $name");
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..0e0377bfacc 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
command_ok(
[
'pg_verifybackup',
- '--wal-directory' => $relocated_pg_wal,
+ '--wal-path' => $relocated_pg_wal,
$backup_path
],
- '--wal-directory can be used to specify WAL directory');
+ '--wal-path can be used to specify WAL directory');
# Move directory back to original location.
rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
@@ -90,4 +90,20 @@ command_ok(
[ 'pg_verifybackup', $backup_path2 ],
'valid base backup with timeline > 1');
+# Test WAL verification for a tar-format backup with a separate pg_wal.tar,
+# as produced by pg_basebackup --format=tar --wal-method=stream.
+my $backup_path3 = $primary->backup_dir . '/test_tar_wal';
+$primary->command_ok(
+ [
+ 'pg_basebackup',
+ '--pgdata' => $backup_path3,
+ '--no-sync',
+ '--format' => 'tar',
+ '--checkpoint' => 'fast'
+ ],
+ "tar backup with separate pg_wal.tar");
+command_ok(
+ [ 'pg_verifybackup', $backup_path3 ],
+ 'WAL verification succeeds with separate pg_wal.tar');
+
done_testing();
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
rmtree($backup_path);
- rmtree($extract_path);
}
}
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
close $jf;
my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
my @test_configuration = (
{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
# Verify tar backup.
$primary->command_ok(
[
- 'pg_verifybackup', '--no-parse-wal',
- '--exit-on-error', $backup_path,
+ 'pg_verifybackup', '--exit-on-error', $backup_path,
],
"verify backup, compression $method");
# Cleanup.
- rmtree($extract_path);
rmtree($backup_path);
}
}
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-20 19:33 Andrew Dunstan <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 2 replies; 29+ messages in thread
From: Andrew Dunstan @ 2026-03-20 19:33 UTC (permalink / raw)
To: Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; +Cc: Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On 2026-03-20 Fr 9:26 AM, Amul Sul wrote:
> On Fri, Mar 20, 2026 at 5:01 PM Amul Sul <[email protected]> wrote:
>> On Fri, Mar 20, 2026 at 2:18 AM Zsolt Parragi <[email protected]> wrote:
>>> Hello!
>>>
>>> Path is ignored with a positional argument, I think this is a bug?
>>>
>>> This fails:
>>>
>>> pg_waldump --path /wal/dir 000000010000000000000001
>>>
>>> And this works:
>>>
>>> pg_waldump --path /wal/dir --start 0/01000028 --end 0/010020F8
>>>
>> Good catch! I've fixed this in the attached version and updated a test
>> case to cover this scenario.
>>
>>> +{
>>> + int fname_len = strlen(fname);
>>> +
>>>
>>> Shouldn't this use size_t?
>>>
>> Okay, that can be used. I’ve done the same in the attached version.
>>
>>> + /*
>>> + * Setup temporary directory to store WAL segments and set up an exit
>>> + * callback to remove it upon completion.
>>> + */
>>> + setup_tmpwal_dir(waldir);
>>>
>>> Maybe this could be deferred to be created only on first use? If I
>>> understand correctly, in a typical scenario waldump won't use this
>>> temporary directory, yet it always creates it.
>> Yeah, that optimization can be done, but passing the waldir -- which
>> is only used once -- to the point where the first temp file is created
>> would require quite a bit of code refactoring that doesn't seem to
>> offer much gain, IMO.
>>
> Since Andrew also leans toward creating the directory only when
> needed, I have reconsidered the approach. I think we can pass waldir
> (the archive directory) via XLogDumpPrivate, and I’ve implemented that
> in the attached version.
>
Thanks, committed with very minor tweaks.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-21 06:19 Amul Sul <[email protected]>
parent: Andrew Dunstan <[email protected]>
1 sibling, 0 replies; 29+ messages in thread
From: Amul Sul @ 2026-03-21 06:19 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sat, Mar 21, 2026 at 9:19 AM Tom Lane <[email protected]> wrote:
>
> Andrew Dunstan <[email protected]> writes:
> > Thanks, committed with very minor tweaks.
>
> Buildfarm members batta and hachi don't like this very much.
> They fail the pg_verifybackup tests like so:
>
> # Running: pg_verifybackup --exit-on-error /home/admin/batta/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup
> pg_waldump: error: could not find WAL in archive "base.tar.zst"
> pg_verifybackup: error: WAL parsing failed for timeline 1
>
> Only the zstd-compression case fails. I've spent several hours trying
> to reproduce this, without any luck, although I can get a similar
> failure in only the gzip case if I build with --with-wal-blocksize=64.
> I do not have an explanation for the seeming cross-platform
> difference. However after adding a lot of debug tracing, I believe
> I see the bug, or at least a related bug. This bit in
> archive_waldump.c's init_archive_reader is where the error comes from:
>
> /*
> * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from
> * the first WAL segment in the archive so we can extract the WAL segment
> * size from the long page header.
> */
> while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
> {
> if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
> pg_fatal("could not find WAL in archive \"%s\"",
> privateInfo->archive_name);
>
> entry = privateInfo->cur_file;
> }
>
> That looks plausible but is in fact utterly broken when there's not a
> lot of WAL data in the archive, as there is not in this test case.
> There are at least two problems:
>
Thanks for the detailed debugging. I noticed the failure this morning
and had started investigating the issue, but in the meantime, I got
your helpful reply, which saved me a bunch of time and energy.
> 1. read_archive_file reads some data from the source WAL archive and
> shoves it into the astreamer decompression pipeline. However, once it
> runs out of source data, it just returns zero and we fail immediately.
> This does not account for the possibility --- nay, certainty --- that
> there is data queued inside the decompression pipeline. So this
> doesn't work if the data we need has been compressed into less than
> XLOG_BLCKSZ worth of compressed data. (I suppose that the seeming
> cross-platform differences have to do with the effectiveness of the
> compression algorithm, but I don't really understand why it'd not be
> the same everywhere.) We need to do astreamer_finalize once we run
> out of source data. I think the cleanest place to handle that would
> be inside read_archive_file, but its return convention will need some
> rework if we want to put it there (because rc == 0 shouldn't cause an
> immediate failure if we were able to finalize some more data). As an
> ugly experiment I put an astreamer_finalize call into the rc == 0 path
> of the above loop, but it still didn't work, because:
>
> 2. If the decompression pipeline reaches the end of the WAL file that
> we want, the ASTREAMER_MEMBER_TRAILER case in
> astreamer_waldump_content instantly resets privateInfo->cur_file to
> NULL. Then the loop in init_archive_reader cannot exit successfully,
> and it will just read till the end of the archive and fail.
>
> I see that of the three callers of read_archive_file, only
> get_archive_wal_entry is aware of this possibility; but
> init_archive_reader certainly needs to deal with it and I bet
> read_archive_wal_page does too. Moreover, get_archive_wal_entry's
> solution looks to me like a fragile kluge that probably doesn't work
> reliably either, the reason being that privateInfo->cur_file can
> change multiple times during a single call to read_archive_file,
> if the WAL data has been compressed sufficiently. That whole API
> seems to need some rethinking, not to mention better documentation
> than the zero it has now.
>
I agree; init_archive_reader needs that handling, but
read_archive_wal_page doesn't need any fix. Since it only deals with
the current entry and already holds a reference to it, there is no
need to fetch it from the hash table again.
init_archive_reader has to scan the hash table because it doesn't
already have the specific WAL filename it is looking for, unlike
get_archive_wal_entry. Please have a look at the attached patch, which
tries to fix that.
> While I'm bitching: this error message "could not find WAL in archive
> \"%s\"" seems to me to be completely misleading and off-point.
>
I tried to improve that in the attached version.
regards,
Amul
Attachments:
[application/x-patch] 0001-pg_waldump-buildfarm-fix.patch (3.1K, 2-0001-pg_waldump-buildfarm-fix.patch)
download | inline diff:
From bde3fb4e3125eed740b5d949a990b4e06d01499a Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 11:22:50 +0530
Subject: [PATCH] pg_waldump: Handle archive exhaustion in
init_archive_reader().
When read_archive_file() returns 0, the archive may have already
buffered a complete WAL file into the hash table before exhausting
the input. Instead of immediately reporting an error, search the
hash table for an entry containing at least sizeof(XLogLongPageHeader)
bytes. Report a specific error if a WAL entry exists but is too
short (truncated/corrupt), or a generic error if no WAL was found
at all.
Also tighten the loop condition to check for sizeof(XLogLongPageHeader)
rather than XLOG_BLCKSZ, since only the long page header is needed
at this stage.
---
src/bin/pg_waldump/archive_waldump.c | 51 +++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index b078c2d6960..5bd1faf3d95 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -176,13 +176,56 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
* the first WAL segment in the archive so we can extract the WAL segment
* size from the long page header.
*/
- while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ while (entry == NULL || entry->read_len < sizeof(XLogLongPageHeader))
{
if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
- pg_fatal("could not find WAL in archive \"%s\"",
- privateInfo->archive_name);
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *e = NULL;
- entry = privateInfo->cur_file;
+ entry = NULL;
+
+ /*
+ * read_archive_file() returned 0, meaning the archive is
+ * exhausted. However, a sufficiently compressed archive may have
+ * already read a complete WAL file and inserted it into the hash
+ * table before returning. Search the hash table for any entry
+ * that already has enough buffered data to contain the long page
+ * header; if none is found, the archive contains no usable WAL.
+ */
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((e = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (e->read_len >= sizeof(XLogLongPageHeader))
+ {
+ entry = e;
+ break;
+ }
+ }
+
+ if (entry == NULL)
+ {
+ /*
+ * A WAL file was found in the hash table but it does not
+ * contain enough data to read the long page header,
+ * indicating a truncated or corrupt WAL segment.
+ */
+ if (e != NULL)
+ pg_fatal("could not read file \"%s\" from \"%s\" archive: read %d of %d",
+ e->fname, privateInfo->archive_name, e->read_len,
+ (int) sizeof(XLogLongPageHeader));
+
+ /*
+ * The hash table contains no WAL entries at all, meaning the
+ * archive holds no WAL data.
+ */
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+ }
+ }
+ else
+ entry = privateInfo->cur_file;
}
/* Extract the WAL segment size from the long page header */
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-21 06:23 Michael Paquier <[email protected]>
parent: Andrew Dunstan <[email protected]>
1 sibling, 2 replies; 29+ messages in thread
From: Michael Paquier @ 2026-03-21 06:23 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote:
> Andrew Dunstan <[email protected]> writes:
> > Thanks, committed with very minor tweaks.
>
> Buildfarm members batta and hachi don't like this very much.
> They fail the pg_verifybackup tests like so:
>
> # Running: pg_verifybackup --exit-on-error /home/admin/batta/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_008_untar_primary_data/backup/server-backup
> pg_waldump: error: could not find WAL in archive "base.tar.zst"
> pg_verifybackup: error: WAL parsing failed for timeline 1
I did not look at what's happening on the host, but it seems like a
safe bet to assume that we are not seeing many failures in the
buildfarm because we don't have many animals that have the idea to add
--with-zstd to their build configuration, like these two ones.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-21 15:35 Amul Sul <[email protected]>
parent: Michael Paquier <[email protected]>
1 sibling, 1 reply; 29+ messages in thread
From: Amul Sul @ 2026-03-21 15:35 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Tom Lane <[email protected]>; Michael Paquier <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sat, Mar 21, 2026 at 5:51 PM Andrew Dunstan <[email protected]> wrote:
>
>
> On 2026-03-21 Sa 2:34 AM, Tom Lane wrote:
>
> Michael Paquier <[email protected]> writes:
>
> On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote:
>
> Buildfarm members batta and hachi don't like this very much.
>
> I did not look at what's happening on the host, but it seems like a
> safe bet to assume that we are not seeing many failures in the
> buildfarm because we don't have many animals that have the idea to add
> --with-zstd to their build configuration, like these two ones.
>
> That may be part of the story, but only part. I spent a good deal of
> time trying to reproduce batta & hachi's configurations locally, on
> several different platforms, but still couldn't duplicate what they
> are showing.
>
>
>
>
>
> Yeah, I haven't been able to reproduce it either. But while investigating I found a couple of issues. We neglected to add one of the tests to meson.build, and we neglected to close some files, causing errors on windows.
>
While the proposed fix of closing the file pointer before returning is
correct, we also need to ensure the file is reopened in the next call
to spill any remaining buffered data. I’ve made a small update to
Andrew's 0001 patch to handle this. Also, changes to meson.build don't
seem to be needed as we haven't committed that file yet (unless I am
missing something).
I’ve also reattached the other patches so they don't get lost: v2-0002
is Andrew's patch for the archive streamer, and v2-0003 is the patch I
posted previously [1].
Regards,
Amul
1] http://postgr.es/m/CAAJ_b95L5J7bjRNDjRj6WgqFcQeaBD+JX3sAuxPA4uopqEThxA@mail.gmail.com
Attachments:
[application/x-patch] v2-0001-Fix-pg_waldump-archive-reader-file-handle-leak-an.patch (1.8K, 2-v2-0001-Fix-pg_waldump-archive-reader-file-handle-leak-an.patch)
download | inline diff:
From 322fd5b96e9739937c587460b2780308705f5a83 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 20:34:15 +0530
Subject: [PATCH v2 1/3] Fix-pg_waldump-archive-reader-file-handle-leak-and-r
---
src/bin/pg_waldump/archive_waldump.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index b078c2d6960..1e9ae637940 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -474,7 +474,16 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
if (entry != NULL)
+ {
+ /*
+ * Found the target segment. Close any open spill file handle to
+ * avoid a leak; any remaining data for that segment will be
+ * written when the file is reopened in a subsequent call.
+ */
+ if (write_fp != NULL)
+ fclose(write_fp);
return entry;
+ }
/*
* Capture the current entry before calling read_archive_file(),
@@ -508,8 +517,8 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
*/
Assert(strcmp(fname, entry->fname) != 0);
- /* Create a temporary file if one does not already exist */
- if (!entry->spilled)
+ /* Open a spill file for this segment if we haven't already */
+ if (!write_fp)
{
write_fp = prepare_tmp_write(entry->fname, privateInfo);
entry->spilled = true;
@@ -631,7 +640,7 @@ prepare_tmp_write(const char *fname, XLogDumpPrivate *privateInfo)
snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname);
/* Open the spill file for writing */
- file = fopen(fpath, PG_BINARY_W);
+ file = fopen(fpath, PG_BINARY_A);
if (file == NULL)
pg_fatal("could not create file \"%s\": %m", fpath);
--
2.47.1
[application/x-patch] v2-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch (2.5K, 3-v2-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch)
download | inline diff:
From 40e613592ab819c1b8346afe435babf0b212b9ef Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 19:48:48 +0530
Subject: [PATCH v2 2/3] Fix-astreamer-decompressor-finalize-to-send-correct
---
src/fe_utils/astreamer_gzip.c | 9 +++++----
src/fe_utils/astreamer_lz4.c | 9 +++++----
src/fe_utils/astreamer_zstd.c | 2 +-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index 2e080c37a58..df392f67cab 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -347,10 +347,11 @@ astreamer_gzip_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index 2bc32b42879..605c188007b 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -397,10 +397,11 @@ astreamer_lz4_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index f26abcfd0fa..4b43ab795e3 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -347,7 +347,7 @@ astreamer_zstd_decompressor_finalize(astreamer *streamer)
if (mystreamer->zstd_outBuf.pos > 0)
astreamer_content(mystreamer->base.bbs_next, NULL,
mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
+ mystreamer->zstd_outBuf.pos,
ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
--
2.47.1
[application/x-patch] v2-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch (3.3K, 4-v2-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch)
download | inline diff:
From a62de1b7b467a037651a2e1bb3820a390227ce78 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 20:57:23 +0530
Subject: [PATCH v2 3/3] pg_waldump: Handle archive exhaustion in
init_archive_reader().
When read_archive_file() returns 0, the archive may have already
buffered a complete WAL file into the hash table before exhausting
the input. Instead of immediately reporting an error, search the
hash table for an entry containing at least sizeof(XLogLongPageHeader)
bytes. Report a specific error if a WAL entry exists but is too
short (truncated/corrupt), or a generic error if no WAL was found
at all.
Also tighten the loop condition to check for sizeof(XLogLongPageHeader)
rather than XLOG_BLCKSZ, since only the long page header is needed
at this stage.
---
src/bin/pg_waldump/archive_waldump.c | 55 ++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 1e9ae637940..dbc1751fb3c 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -176,13 +176,60 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
* the first WAL segment in the archive so we can extract the WAL segment
* size from the long page header.
*/
- while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ while (entry == NULL || entry->read_len < sizeof(XLogLongPageHeader))
{
if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
- pg_fatal("could not find WAL in archive \"%s\"",
- privateInfo->archive_name);
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *e = NULL;
+ ArchivedWALFile *short_entry = NULL;
- entry = privateInfo->cur_file;
+ entry = NULL;
+
+ /*
+ * read_archive_file() returned 0, meaning the archive is
+ * exhausted. However, a sufficiently compressed archive may have
+ * already read a complete WAL file and inserted it into the hash
+ * table before returning. Search the hash table for any entry
+ * that already has enough buffered data to contain the long page
+ * header; if none is found, the archive contains no usable WAL.
+ */
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((e = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (e->read_len >= sizeof(XLogLongPageHeader))
+ {
+ entry = e;
+ break;
+ }
+ /* Remember a short entry in case we need to report it */
+ short_entry = e;
+ }
+
+ if (entry == NULL)
+ {
+ /*
+ * A WAL file was found in the hash table but it does not
+ * contain enough data to read the long page header,
+ * indicating a truncated or corrupt WAL segment.
+ */
+ if (short_entry != NULL)
+ pg_fatal("could not read file \"%s\" from \"%s\" archive: read %d of %d",
+ short_entry->fname, privateInfo->archive_name,
+ short_entry->read_len,
+ (int) sizeof(XLogLongPageHeader));
+
+ /*
+ * The hash table contains no WAL entries at all, meaning the
+ * archive holds no WAL data.
+ */
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+ }
+ }
+ else
+ entry = privateInfo->cur_file;
}
/* Extract the WAL segment size from the long page header */
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-21 17:26 Amul Sul <[email protected]>
parent: Amul Sul <[email protected]>
0 siblings, 0 replies; 29+ messages in thread
From: Amul Sul @ 2026-03-21 17:26 UTC (permalink / raw)
To: Andrew Dunstan <[email protected]>; +Cc: Tom Lane <[email protected]>; Michael Paquier <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sat, Mar 21, 2026 at 9:05 PM Amul Sul <[email protected]> wrote:
>
> On Sat, Mar 21, 2026 at 5:51 PM Andrew Dunstan <[email protected]> wrote:
> >
> >
> > On 2026-03-21 Sa 2:34 AM, Tom Lane wrote:
> >
> > Michael Paquier <[email protected]> writes:
> >
> > On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote:
> >
> > Buildfarm members batta and hachi don't like this very much.
> >
> > I did not look at what's happening on the host, but it seems like a
> > safe bet to assume that we are not seeing many failures in the
> > buildfarm because we don't have many animals that have the idea to add
> > --with-zstd to their build configuration, like these two ones.
> >
> > That may be part of the story, but only part. I spent a good deal of
> > time trying to reproduce batta & hachi's configurations locally, on
> > several different platforms, but still couldn't duplicate what they
> > are showing.
> >
> >
> >
> >
> >
> > Yeah, I haven't been able to reproduce it either. But while investigating I found a couple of issues. We neglected to add one of the tests to meson.build, and we neglected to close some files, causing errors on windows.
> >
>
> While the proposed fix of closing the file pointer before returning is
> correct, we also need to ensure the file is reopened in the next call
> to spill any remaining buffered data. I’ve made a small update to
> Andrew's 0001 patch to handle this. Also, changes to meson.build don't
> seem to be needed as we haven't committed that file yet (unless I am
> missing something).
>
> I’ve also reattached the other patches so they don't get lost: v2-0002
> is Andrew's patch for the archive streamer, and v2-0003 is the patch I
> posted previously [1].
>
>
On further thought, I don't think v2-0001 is the right patch. Consider
the case where we write a temporary file partially: if the next
segment required for decoding is that same segment,
TarWALDumpReadPage() will find the physical file present and continue
decoding, potentially triggering an error later due to the shorter
file.
I have attached the v3-0001 patch, which ensures that once we start
writing a temporary file, it should be finished before performing the
lookup. This ensures we don't leave a partial file on disk.
Updated patches are attached; 0002 and 0003 remain the same as before.
Regards,
Amul
Attachments:
[application/x-patch] v3-0001-archive_waldump-skip-hash-lookup-and-tighten-writ.patch (1.8K, 2-v3-0001-archive_waldump-skip-hash-lookup-and-tighten-writ.patch)
download | inline diff:
From b3bfdac9e425f4cb9fd7d7b6c698dd1607b737ee Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 22:27:22 +0530
Subject: [PATCH v3 1/3] archive_waldump: skip hash lookup and tighten write_fp
invariant
In get_archive_wal_entry(), when the streamer is still mid-segment
(entry == cur_file), jump directly to read_more instead of looping back
to the top and performing a hash table lookup that is guaranteed to fail.
---
src/bin/pg_waldump/archive_waldump.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index b078c2d6960..ee292b6dc8d 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -484,6 +484,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
*/
entry = privateInfo->cur_file;
+read_more:
/*
* Fetch more data either when no current file is being tracked or
* when its buffer has been fully flushed to the temporary file.
@@ -525,11 +526,20 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* file handle so data is flushed to disk before the next segment
* starts writing to a different handle.
*/
- if (entry != privateInfo->cur_file && write_fp != NULL)
+ if (entry != privateInfo->cur_file)
{
+ Assert(write_fp);
fclose(write_fp);
write_fp = NULL;
}
+ else
+ /*
+ * The file being written hasn't been completed. We must finish
+ * extracting it before performing the hash lookup; otherwise, the
+ * lookup might return without flushing the current segment buffer,
+ * leaving the file open and incomplete on disk.
+ */
+ goto read_more;
}
/* Requested WAL segment not found */
--
2.47.1
[application/x-patch] v3-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch (2.5K, 3-v3-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch)
download | inline diff:
From 3a52d70947f7c7bc3a0decbd473e95891ad3b6eb Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 19:48:48 +0530
Subject: [PATCH v3 2/3] Fix-astreamer-decompressor-finalize-to-send-correct
---
src/fe_utils/astreamer_gzip.c | 9 +++++----
src/fe_utils/astreamer_lz4.c | 9 +++++----
src/fe_utils/astreamer_zstd.c | 2 +-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index 2e080c37a58..df392f67cab 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -347,10 +347,11 @@ astreamer_gzip_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index 2bc32b42879..605c188007b 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -397,10 +397,11 @@ astreamer_lz4_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index f26abcfd0fa..4b43ab795e3 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -347,7 +347,7 @@ astreamer_zstd_decompressor_finalize(astreamer *streamer)
if (mystreamer->zstd_outBuf.pos > 0)
astreamer_content(mystreamer->base.bbs_next, NULL,
mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
+ mystreamer->zstd_outBuf.pos,
ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
--
2.47.1
[application/x-patch] v3-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch (3.3K, 4-v3-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch)
download | inline diff:
From 242b8904682cec326d059e3d11355ca2315c869c Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 20:57:23 +0530
Subject: [PATCH v3 3/3] pg_waldump: Handle archive exhaustion in
init_archive_reader().
When read_archive_file() returns 0, the archive may have already
buffered a complete WAL file into the hash table before exhausting
the input. Instead of immediately reporting an error, search the
hash table for an entry containing at least sizeof(XLogLongPageHeader)
bytes. Report a specific error if a WAL entry exists but is too
short (truncated/corrupt), or a generic error if no WAL was found
at all.
Also tighten the loop condition to check for sizeof(XLogLongPageHeader)
rather than XLOG_BLCKSZ, since only the long page header is needed
at this stage.
---
src/bin/pg_waldump/archive_waldump.c | 55 ++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index ee292b6dc8d..943c843e05b 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -176,13 +176,60 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
* the first WAL segment in the archive so we can extract the WAL segment
* size from the long page header.
*/
- while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ while (entry == NULL || entry->read_len < sizeof(XLogLongPageHeader))
{
if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
- pg_fatal("could not find WAL in archive \"%s\"",
- privateInfo->archive_name);
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *e = NULL;
+ ArchivedWALFile *short_entry = NULL;
- entry = privateInfo->cur_file;
+ entry = NULL;
+
+ /*
+ * read_archive_file() returned 0, meaning the archive is
+ * exhausted. However, a sufficiently compressed archive may have
+ * already read a complete WAL file and inserted it into the hash
+ * table before returning. Search the hash table for any entry
+ * that already has enough buffered data to contain the long page
+ * header; if none is found, the archive contains no usable WAL.
+ */
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((e = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (e->read_len >= sizeof(XLogLongPageHeader))
+ {
+ entry = e;
+ break;
+ }
+ /* Remember a short entry in case we need to report it */
+ short_entry = e;
+ }
+
+ if (entry == NULL)
+ {
+ /*
+ * A WAL file was found in the hash table but it does not
+ * contain enough data to read the long page header,
+ * indicating a truncated or corrupt WAL segment.
+ */
+ if (short_entry != NULL)
+ pg_fatal("could not read file \"%s\" from \"%s\" archive: read %d of %d",
+ short_entry->fname, privateInfo->archive_name,
+ short_entry->read_len,
+ (int) sizeof(XLogLongPageHeader));
+
+ /*
+ * The hash table contains no WAL entries at all, meaning the
+ * archive holds no WAL data.
+ */
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+ }
+ }
+ else
+ entry = privateInfo->cur_file;
}
/* Extract the WAL segment size from the long page header */
--
2.47.1
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-22 11:24 Andrew Dunstan <[email protected]>
parent: Michael Paquier <[email protected]>
1 sibling, 1 reply; 29+ messages in thread
From: Andrew Dunstan @ 2026-03-22 11:24 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sun, Mar 22, 2026 at 12:24 AM Tom Lane <[email protected]> wrote:
> I wrote:
> > Unsurprisingly, applying this change to unmodified master results
> > in the pg_waldump and pg_verifybackup tests falling over. More
> > surprisingly, they still fall over after applying your fix to the
> > decompressors, so there's some other source of garbage trailing
> > data. I haven't figured out what.
>
> In the learn-something-new-every-day dept.: good ol' GNU tar itself
> does that. By default, it zero-pads its output to a multiple of 10kB
> after it's written the required terminator. Moreover, this behavior
> is actually specified by POSIX:
>
> -x format
> Specify the output archive format. The pax utility shall support
> the following formats:
> ...
> ustar
> The tar interchange format; see the EXTENDED DESCRIPTION
> section. The default blocksize for this format for character
> special archive files shall be 10240. Implementations shall
> support all blocksize values less than or equal to 32256 that
> are multiples of 512.
>
> So, astreamer_tar_parser_content's idea that it should disallow more
> than 1024 bytes of trailer is completely wrong, which we would have
> figured out long ago if the code attempting to enforce that weren't
> completely broken.
>
> You could argue that this means the tar files our existing utilities
> create aren't POSIX-compliant. I think it's all right though: we
> can just say that we write these files with blocksize 1024 not
> blocksize 10240, and tar-file readers are required to accept that
> per the above spec text.
>
> However, this discourages me from editorializing on the file trailer
> emitted by whatever wrote the tar file we are reading. I think
> emitting it as-is is the most appropriate thing. So we should just
> get rid of astreamer_tar_parser_content's nonfunctional error check
> and not change its behavior otherwise.
>
>
>
OK, patch 5 of this set does that. I reworked your previous patches 2 and 3
slightly - mostly additional comments, and fixing a bug in use
of sizeof(XLogLongPageHeader). Patch 4 here tries to fix the wrong use of
cur_file in get_archive_wal_entry()
cheers
andrew
Attachments:
[text/x-patch] v5-0003-Fix-init_archive_reader-to-not-depend-on-cur_file.patch (3.6K, 3-v5-0003-Fix-init_archive_reader-to-not-depend-on-cur_file.patch)
download | inline diff:
From 51d53b166df7c8eaebe49756e24088c16764807b Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Sun, 22 Mar 2026 06:53:25 -0400
Subject: [PATCH v5 3/5] Fix init_archive_reader to not depend on cur_file.
init_archive_reader() relied on privateInfo->cur_file to track which
WAL segment was being read, but cur_file can become NULL if a member
trailer is processed during a read_archive_file() call. This could
cause unreproducible "could not find WAL in archive" failures,
particularly with compressed archives where all the WAL data fits
in a small number of compressed bytes.
Fix by scanning the hash table after each read to find any cached
WAL segment with sufficient data, instead of depending on cur_file.
Also reduce the minimum data requirement from XLOG_BLCKSZ to
sizeof(XLogLongPageHeaderData), since we only need the long page
header to extract the segment size.
Add a safety comment on cur_file in pg_waldump.h to document that
it can change during a single read_archive_file() call.
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/bin/pg_waldump/archive_waldump.c | 22 +++++++++++++++++-----
src/bin/pg_waldump/pg_waldump.h | 9 ++++++++-
2 files changed, 25 insertions(+), 6 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index cd092a057ef..3fce2183099 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -173,17 +173,29 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL);
/*
- * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from
- * the first WAL segment in the archive so we can extract the WAL segment
- * size from the long page header.
+ * Read until we have at least one WAL segment with enough data to extract
+ * the WAL segment size from the long page header.
+ *
+ * We must not rely on cur_file here, because it can become NULL if a
+ * member trailer is processed during a read_archive_file() call. Instead,
+ * scan the hash table after each read to find any entry with sufficient
+ * data.
*/
- while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ while (entry == NULL)
{
+ ArchivedWAL_iterator iter;
+
if (!read_archive_file(privateInfo, XLOG_BLCKSZ))
pg_fatal("could not find WAL in archive \"%s\"",
privateInfo->archive_name);
- entry = privateInfo->cur_file;
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (entry->read_len >= sizeof(XLogLongPageHeaderData))
+ break;
+ }
}
/* Extract the WAL segment size from the long page header */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index cde7c6ca3f2..ca0dfd97168 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -44,7 +44,14 @@ typedef struct XLogDumpPrivate
Size archive_read_buf_size;
#endif
- /* What the archive streamer is currently reading */
+ /*
+ * The buffer for the WAL file the archive streamer is currently reading,
+ * or NULL if none. It is quite risky to examine this anywhere except in
+ * astreamer_waldump_content(), since it can change multiple times during
+ * a single read_archive_file() call. However, it is safe to assume that
+ * if cur_file is different from a particular ArchivedWALFile of interest,
+ * then the archive streamer has finished reading that file.
+ */
struct ArchivedWALFile *cur_file;
/*
--
2.43.0
[text/x-patch] v5-0001-Fix-finalization-of-decompressor-astreamers.patch (2.8K, 4-v5-0001-Fix-finalization-of-decompressor-astreamers.patch)
download | inline diff:
From 9978359771c14b411704d808abc0f602119f0a9f Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Sun, 22 Mar 2026 05:31:14 -0400
Subject: [PATCH v5 1/5] Fix finalization of decompressor astreamers.
Send the correct amount of data to the next astreamer, not the
whole allocated buffer size. It's unclear how we missed this bug;
perhaps the use-cases so far are insensitive to trailing garbage.
Author: Andrew Dunstan <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/fe_utils/astreamer_gzip.c | 9 +++++----
src/fe_utils/astreamer_lz4.c | 9 +++++----
src/fe_utils/astreamer_zstd.c | 2 +-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index 2e080c37a58..df392f67cab 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -347,10 +347,11 @@ astreamer_gzip_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index 2bc32b42879..605c188007b 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -397,10 +397,11 @@ astreamer_lz4_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index f26abcfd0fa..4b43ab795e3 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -347,7 +347,7 @@ astreamer_zstd_decompressor_finalize(astreamer *streamer)
if (mystreamer->zstd_outBuf.pos > 0)
astreamer_content(mystreamer->base.bbs_next, NULL,
mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
+ mystreamer->zstd_outBuf.pos,
ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
--
2.43.0
[text/x-patch] v5-0004-Fix-get_archive_wal_entry-to-handle-cur_file-tran.patch (6.5K, 5-v5-0004-Fix-get_archive_wal_entry-to-handle-cur_file-tran.patch)
download | inline diff:
From 91ed0e69a7df00be147548973a8172322998b528 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Sun, 22 Mar 2026 06:53:34 -0400
Subject: [PATCH v5 4/5] Fix get_archive_wal_entry to handle cur_file
transitions reliably.
As noted by Tom Lane, get_archive_wal_entry() uses cur_file in an
unsafe way: a single read_archive_file() call can trigger multiple
astreamer callbacks when compression is effective, causing cur_file
to change several times (entry -> NULL -> new entry) within one
call. The old code captured cur_file before the read and checked
for changes after, but this missed intermediate transitions. This
could cause spill-file handles to leak or data to not be flushed
when the streamer finishes one segment and starts another within
the same read.
Restructure the spill logic to explicitly track the entry being
spilled (spill_entry) separately from cur_file, and detect
transitions at the top of each loop iteration. Also ensure spill
file handles are closed on both success and error paths.
Discussion: https://postgr.es/m/[email protected]
---
src/bin/pg_waldump/archive_waldump.c | 117 ++++++++++++++++-----------
1 file changed, 69 insertions(+), 48 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 3fce2183099..93ed856c674 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -463,11 +463,18 @@ free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* found. If the archive streamer is reading a WAL file from the archive that
* is not currently needed, that data is spilled to a temporary file for later
* retrieval.
+ *
+ * Because a single read_archive_file() call may trigger multiple astreamer
+ * callbacks (especially when data compresses well), cur_file can change
+ * several times within one call: from one entry to NULL (member trailer),
+ * and then to a new entry (next member header). The spill logic below
+ * handles this by flushing and closing per-entry state whenever we detect
+ * that the streamer has moved on.
*/
static ArchivedWALFile *
get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
{
- ArchivedWALFile *entry = NULL;
+ ArchivedWALFile *spill_entry = NULL;
FILE *write_fp = NULL;
/*
@@ -477,6 +484,8 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
*/
while (1)
{
+ ArchivedWALFile *entry;
+
/*
* Search hash table.
*
@@ -488,64 +497,76 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname);
if (entry != NULL)
+ {
+ /* Close any open spill file before returning. */
+ if (write_fp != NULL)
+ fclose(write_fp);
return entry;
-
- /*
- * Capture the current entry before calling read_archive_file(),
- * because cur_file may advance to a new segment during streaming. We
- * hold this reference so we can flush any remaining buffer data and
- * close the write handle once we detect that cur_file has moved on.
- */
- entry = privateInfo->cur_file;
-
- /*
- * Fetch more data either when no current file is being tracked or
- * when its buffer has been fully flushed to the temporary file.
- */
- if (entry == NULL || entry->buf->len == 0)
- {
- if (!read_archive_file(privateInfo, READ_CHUNK_SIZE))
- break; /* archive file ended */
}
/*
- * Archive streamer is reading a non-WAL file or an irrelevant WAL
- * file.
- */
- if (entry == NULL)
- continue;
-
- /*
- * The streamer is producing a WAL segment that isn't the one asked
- * for; it must be arriving out of order. Spill its data to disk so
- * it can be read back when needed.
- */
- Assert(strcmp(fname, entry->fname) != 0);
-
- /* Create a temporary file if one does not already exist */
- if (!entry->spilled)
- {
- write_fp = prepare_tmp_write(entry->fname, privateInfo);
- entry->spilled = true;
- }
-
- /* Flush data from the buffer to the file */
- perform_tmp_write(entry->fname, entry->buf, write_fp);
- resetStringInfo(entry->buf);
-
- /*
- * If cur_file changed since we captured entry above, the archive
- * streamer has finished this segment and moved on. Close its spill
- * file handle so data is flushed to disk before the next segment
- * starts writing to a different handle.
+ * If the streamer has moved on to a different entry than the one we
+ * were spilling, flush any remaining data for the old entry and close
+ * its spill file.
*/
- if (entry != privateInfo->cur_file && write_fp != NULL)
+ if (spill_entry != NULL && spill_entry != privateInfo->cur_file)
{
+ if (spill_entry->buf->len > 0)
+ {
+ perform_tmp_write(spill_entry->fname, spill_entry->buf,
+ write_fp);
+ resetStringInfo(spill_entry->buf);
+ }
fclose(write_fp);
write_fp = NULL;
+ spill_entry = NULL;
}
+
+ /*
+ * If no WAL file is currently being streamed (cur_file is NULL), or
+ * the current spill entry's buffer has been fully flushed, we need
+ * more data from the archive.
+ */
+ if (privateInfo->cur_file == NULL ||
+ (spill_entry != NULL && spill_entry->buf->len == 0))
+ {
+ if (!read_archive_file(privateInfo, READ_CHUNK_SIZE))
+ break; /* archive fully exhausted */
+ continue; /* re-check hash table and cur_file */
+ }
+
+ /*
+ * cur_file points to a WAL segment that isn't the one asked for; it
+ * must be arriving out of order. Spill its data to disk so it can be
+ * read back when needed.
+ */
+ spill_entry = privateInfo->cur_file;
+ Assert(strcmp(fname, spill_entry->fname) != 0);
+
+ /* Create a temporary file if one does not already exist */
+ if (!spill_entry->spilled)
+ {
+ write_fp = prepare_tmp_write(spill_entry->fname, privateInfo);
+ spill_entry->spilled = true;
+ }
+
+ /* Flush data from the buffer to the file */
+ perform_tmp_write(spill_entry->fname, spill_entry->buf, write_fp);
+ resetStringInfo(spill_entry->buf);
+
+ /*
+ * Read more data from the archive. This may add data to the current
+ * spill_entry's buffer, advance cur_file to a new entry, or set
+ * cur_file to NULL (member trailer).
+ */
+ if (!read_archive_file(privateInfo, READ_CHUNK_SIZE))
+ break; /* archive fully exhausted */
}
+ /* Close any open spill file before erroring out. */
+ if (write_fp != NULL)
+ fclose(write_fp);
+
/* Requested WAL segment not found */
pg_fatal("could not find WAL \"%s\" in archive \"%s\"",
fname, privateInfo->archive_name);
--
2.43.0
[text/x-patch] v5-0002-Fix-failure-to-finalize-the-decompression-pipelin.patch (6.6K, 6-v5-0002-Fix-failure-to-finalize-the-decompression-pipelin.patch)
download | inline diff:
From 0de5fd971602ae00fd3bd62cf5da0d8f0a3cce5a Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Sun, 22 Mar 2026 05:32:45 -0400
Subject: [PATCH v5 2/5] Fix failure to finalize the decompression pipeline at
archive EOF.
archive_waldump.c called astreamer_finalize() nowhere. This meant
that any data retained in decompression buffers at the moment we
detect archive EOF would never reach astreamer_waldump_content(),
resulting in surprising failures if we actually need the last few
bytes of the archive file.
To fix, make read_archive_file() do the finalize once it detects
EOF. Change its API to return a boolean "yes there's more data"
rather than the entirely-misleading raw count of bytes read.
Also document the contract that cur_file can change (or become NULL)
during a single read_archive_file() call, since the decompression
pipeline may produce enough output to trigger multiple astreamer
callbacks.
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/bin/pg_waldump/archive_waldump.c | 50 +++++++++++++++++++++++-----
src/bin/pg_waldump/pg_waldump.h | 1 +
2 files changed, 42 insertions(+), 9 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index b078c2d6960..cd092a057ef 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -89,7 +89,7 @@ typedef struct astreamer_waldump
static ArchivedWALFile *get_archive_wal_entry(const char *fname,
XLogDumpPrivate *privateInfo);
-static int read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static bool read_archive_file(XLogDumpPrivate *privateInfo, Size count);
static void setup_tmpwal_dir(const char *waldir);
static void cleanup_tmpwal_dir_atexit(void);
@@ -139,6 +139,7 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
privateInfo->archive_fd = fd;
+ privateInfo->archive_fd_eof = false;
streamer = astreamer_waldump_new(privateInfo);
@@ -178,7 +179,7 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
*/
while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
{
- if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+ if (!read_archive_file(privateInfo, XLOG_BLCKSZ))
pg_fatal("could not find WAL in archive \"%s\"",
privateInfo->archive_name);
@@ -236,9 +237,10 @@ free_archive_reader(XLogDumpPrivate *privateInfo)
/*
* NB: Normally, astreamer_finalize() is called before astreamer_free() to
* flush any remaining buffered data or to ensure the end of the tar
- * archive is reached. However, when decoding WAL, once we hit the end
- * LSN, any remaining buffered data or unread portion of the archive can
- * be safely ignored.
+ * archive is reached. read_archive_file() may have done so. However,
+ * when decoding WAL we can stop once we hit the end LSN, so we may never
+ * have read all of the input file. In that case any remaining buffered
+ * data or unread portion of the archive can be safely ignored.
*/
astreamer_free(privateInfo->archive_streamer);
@@ -384,7 +386,7 @@ read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
fname, privateInfo->archive_name,
(long long int) (count - nbytes),
(long long int) count);
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ if (!read_archive_file(privateInfo, READ_CHUNK_SIZE))
pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes",
privateInfo->archive_name, fname,
(long long int) (count - nbytes),
@@ -490,7 +492,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
*/
if (entry == NULL || entry->buf->len == 0)
{
- if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+ if (!read_archive_file(privateInfo, READ_CHUNK_SIZE))
break; /* archive file ended */
}
@@ -540,8 +542,22 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
/*
* Reads a chunk from the archive file and passes it through the streamer
* pipeline for decompression (if needed) and tar member extraction.
+ *
+ * count is the maximum amount to try to read this time. Note that it's
+ * measured in raw file bytes, and may have little to do with how much
+ * comes out of decompression/extraction.
+ *
+ * Returns true if successful, false if there is no more data.
+ *
+ * Callers must be aware that a single call may trigger multiple callbacks
+ * in astreamer_waldump_content, so privateInfo->cur_file can change value
+ * (or become NULL) during a call. In particular, cur_file is set to NULL
+ * when the ASTREAMER_MEMBER_TRAILER callback fires at the end of a tar
+ * member; it is then set to a new entry when the next WAL member's
+ * ASTREAMER_MEMBER_HEADER callback fires, which may or may not happen
+ * within the same call.
*/
-static int
+static bool
read_archive_file(XLogDumpPrivate *privateInfo, Size count)
{
int rc;
@@ -549,6 +565,11 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
/* The read request must not exceed the allocated buffer size. */
Assert(privateInfo->archive_read_buf_size >= count);
+ /* Fail if we already reached EOF in a prior call. */
+ if (privateInfo->archive_fd_eof)
+ return false;
+
+ /* Try to read some more data. */
rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count);
if (rc < 0)
pg_fatal("could not read file \"%s\": %m",
@@ -562,8 +583,19 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
astreamer_content(privateInfo->archive_streamer, NULL,
privateInfo->archive_read_buf, rc,
ASTREAMER_UNKNOWN);
+ else
+ {
+ /*
+ * We reached EOF, but there is probably still data queued in the
+ * astreamer pipeline's buffers. Flush it out to ensure that we
+ * process everything.
+ */
+ astreamer_finalize(privateInfo->archive_streamer);
+ /* Set flag to ensure we don't finalize more than once. */
+ privateInfo->archive_fd_eof = true;
+ }
- return rc;
+ return true;
}
/*
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 36893624f53..cde7c6ca3f2 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -35,6 +35,7 @@ typedef struct XLogDumpPrivate
char *archive_dir;
char *archive_name; /* Tar archive filename */
int archive_fd; /* File descriptor for the open tar file */
+ bool archive_fd_eof; /* Have we reached EOF on archive_fd? */
astreamer *archive_streamer;
char *archive_read_buf; /* Reusable read buffer for archive I/O */
--
2.43.0
[text/x-patch] v5-0005-Remove-nonfunctional-tar-file-trailer-size-check.patch (2.2K, 7-v5-0005-Remove-nonfunctional-tar-file-trailer-size-check.patch)
download | inline diff:
From cab3e116c1064afcc16e638555c74f7f460be55b Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <[email protected]>
Date: Sun, 22 Mar 2026 06:53:58 -0400
Subject: [PATCH v5 5/5] Remove nonfunctional tar file trailer size check.
The ASTREAMER_ARCHIVE_TRAILER case in astreamer_tar_parser_content()
intended to reject tar files whose trailer exceeded 2 blocks. However,
the check compared 'len' after astreamer_buffer_bytes() had already
consumed all the data and set len to 0, so the pg_fatal() could never
fire.
Moreover, per the POSIX specification for the ustar format, the last
physical block of a tar archive is always full-sized, and "logical
records after the two zero logical records may contain undefined data."
GNU tar, for example, zero-pads its output to a 10kB boundary by
default. So rejecting extra data after the two zero blocks would be
wrong even if the check worked.
Remove the dead check and update the comment to explain why trailing
data is expected and harmless.
Per report from Tom Lane.
Discussion: https://postgr.es/m/[email protected]
---
src/fe_utils/astreamer_tar.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/fe_utils/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
index 4b187f0a8c4..f8be5e4ff8a 100644
--- a/src/fe_utils/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -236,12 +236,16 @@ astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
/*
* We've seen an end-of-archive indicator, so anything more is
- * buffered and sent as part of the archive trailer. But we
- * don't expect more than 2 blocks.
+ * buffered and sent as part of the archive trailer.
+ *
+ * Per POSIX, the last physical block of a tar archive is
+ * always full-sized, so there may be undefined data after the
+ * two zero blocks that mark end-of-archive. GNU tar, for
+ * example, zero-pads to a 10kB boundary by default. We just
+ * buffer whatever we receive and pass it along at finalize
+ * time.
*/
astreamer_buffer_bytes(streamer, &data, &len, len);
- if (len > 2 * TAR_BLOCK_SIZE)
- pg_fatal("tar file trailer exceeds 2 blocks");
return;
default:
--
2.43.0
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-22 21:19 Andrew Dunstan <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Dunstan @ 2026-03-22 21:19 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sun, Mar 22, 2026 at 2:17 PM Tom Lane <[email protected]> wrote:
> I wrote:
> > ... We can make this function far simpler
> > and more obviously correct if we just accept that we'll read a
> > WAL file completely before spilling it. See my proposed
> > alternative to 0004, attached.
>
> Actually, we can make that better yet by not expecting
> get_archive_wal_entry to clean up after init_archive's
> failure to free all irrelevant hashtable entries.
> Better version attached.
>
>
>
Yeah, this looks good. I know we also still need to do something about
rmtree trying to remove files we haven't closed. But what we have so far in
this set LGTM. If you want to push this I'm good, otherwise I'll look at it
tomorrow or Tuesday.
cheers
andrew
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-24 03:11 Michael Paquier <[email protected]>
parent: Andrew Dunstan <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Michael Paquier @ 2026-03-24 03:11 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote:
> Proposed patch attached. There might be an argument for using some
> other size than 256K for the other two decompressors, but my
> inclination is to try to make all three use roughly the same block
> size. (See also 66ec01dc4.)
The buildfarm has switched mostly to green, except on this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-25 17:28 Andres Freund <[email protected]>
parent: Michael Paquier <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Andres Freund @ 2026-03-25 17:28 UTC (permalink / raw)
To: Michael Paquier <[email protected]>; +Cc: Tom Lane <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
Hi,
On 2026-03-24 12:11:44 +0900, Michael Paquier wrote:
> On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote:
> > Proposed patch attached. There might be an argument for using some
> > other size than 256K for the other two decompressors, but my
> > inclination is to try to make all three use roughly the same block
> > size. (See also 66ec01dc4.)
>
> The buildfarm has switched mostly to green, except on this one:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42
I think there's a few more failues. Fairywren regularly fails, including in a
run from today.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2026-03-25%2003%3A48%3A06
There also are a lot of CI failures. E.g.
https://cirrus-ci.com/task/6153854431002624
https://api.cirrus-ci.com/v1/artifact/task/6153854431002624/testrun/build/testrun/pg_waldump/001_bas...
# Running: pg_waldump --path C:\msys64\tmp\tNfU5IfQ4a/pg_wal.tar.gz --start 0/01806F48 --end 0/03093BD8
[22:46:25.358](3.991s) not ok 160 - runs with path option and start and end locations: exit code 0
[22:46:25.363](0.005s) # Failed test 'runs with path option and start and end locations: exit code 0'
# at C:/cirrus/src/bin/pg_waldump/t/001_basic.pl line 399.
[22:46:25.364](0.001s) ok 161 - runs with path option and start and end locations: no stderr
[22:46:25.365](0.001s) not ok 162 - runs with path option and start and end locations: matches
[22:46:25.365](0.000s) # Failed test 'runs with path option and start and end locations: matches'
# at C:/cirrus/src/bin/pg_waldump/t/001_basic.pl line 399.
[22:46:25.366](0.000s) # ''
# doesn't match '(?^:.)'
I was first suspecting that this is due to
commit 1c162c965a1
Author: Fujii Masao <[email protected]>
Date: 2026-03-24 22:33:09 +0900
Report detailed errors from XLogFindNextRecord() failures.
but there are afaict failures from before that:
https://cirrus-ci.com/task/5501609960013824
which is for 4019f725f5d, preceding 1c162c965a1
and
https://cirrus-ci.com/task/5317196043255808
It does feel however the failure frequency has increased substantially:
https://cirrus-ci.com/github/postgres/postgres/master
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-28 21:36 Thomas Munro <[email protected]>
parent: Andres Freund <[email protected]>
0 siblings, 3 replies; 29+ messages in thread
From: Thomas Munro @ 2026-03-28 21:36 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Michael Paquier <[email protected]>; Tom Lane <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Thu, Mar 26, 2026 at 6:28 AM Andres Freund <[email protected]> wrote:
> On 2026-03-24 12:11:44 +0900, Michael Paquier wrote:
> > On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote:
> > > Proposed patch attached. There might be an argument for using some
> > > other size than 256K for the other two decompressors, but my
> > > inclination is to try to make all three use roughly the same block
> > > size. (See also 66ec01dc4.)
> >
> > The buildfarm has switched mostly to green, except on this one:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42
>
> I think there's a few more failues. Fairywren regularly fails, including in a
> run from today.
This fails 100% of the time on my machine, even after e9d72348 and ff84efe4, eg:
# Running: pg_waldump --path /tmp/D8WG1Sv2HE/pg_wal.tar --start
0/017A2610 --end 0/02093848
[09:43:29.288](0.148s) not ok 104 - runs with path option and start
and end locations: exit code 0
[09:43:29.289](0.001s) # Failed test 'runs with path option and
start and end locations: exit code 0'
# at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.290](0.001s) not ok 105 - runs with path option and start
and end locations: no stderr
[09:43:29.291](0.001s) # Failed test 'runs with path option and
start and end locations: no stderr'
# at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.291](0.000s) # got: 'pg_waldump: error: could not
find WAL "000000010000000000000002" in archive "pg_wal.tar"
# '
I can see that it is wrong about the contents of the tar file:
$ pg_waldump --path _tmp_H_1gv81G1L_pg_wal.tar --start 0/017A2610
--end 0/020934F8 2>&1 | tail -3
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"
$ tar tvf _tmp_H_1gv81G1L_pg_wal.tar
drwx------ 0 tmunro tmunro 0 Mar 29 10:15 archive_status/
-rw------- 0 tmunro tmunro 0 Mar 29 10:15
archive_status/000000010000000000000002.ready
-rw------- 0 tmunro tmunro 0 Mar 29 10:15
archive_status/000000010000000000000001.ready
drwx------ 0 tmunro tmunro 0 Mar 29 10:08 summaries/
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000002
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000001
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000003
It seems like the place we'd be looking for the file is in
astreamer_tar_header(), so I added in some caveman debugging:
/*
* Parse key fields out of the header.
*/
fprintf(stderr, "XXXX [%s] XXXX\n", &buffer[TAR_OFFSET_NAME]);
strlcpy(member->pathname, &buffer[TAR_OFFSET_NAME], MAXPGPATH);
if (member->pathname[0] == '\0')
pg_fatal("tar member has empty name");
Now I see:
XXXX [archive_status/] XXXX
XXXX [archive_status/000000010000000000000002.ready] XXXX
XXXX [archive_status/000000010000000000000001.ready] XXXX
XXXX [summaries/] XXXX
XXXX [PaxHeader/000000010000000000000002] XXXX
XXXX [GNUSparseFile.0/000000010000000000000002] XXXX
XXXX [000000010000000000000001] XXXX
rmgr: XLOG len (rec/tot): 30/ 30, tx: 0, lsn:
0/017A2610, prev 0/017A25F0, desc: NEXTOID 24576
rmgr: Standby len (rec/tot): 42/ 42, tx: 692, lsn:
0/017A2630, prev 0/017A2610, desc: LOCK xid 692 db 5 rel 16384
rmgr: Storage len (rec/tot): 42/ 42, tx: 692, lsn:
0/017A2660, prev 0/017A2630, desc: CREATE base/5/16384
... lots more normal output ...
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFBED8, prev 0/01FFBE98, desc: INSERT off 97, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Heap len (rec/tot): 575/ 575, tx: 720, lsn:
0/01FFBF20, prev 0/01FFBED8, desc: INSERT off: 12, flags: 0x08, blkref
#0: rel 1663/5/16393 blk 52
rmgr: Btree len (rec/tot): 64/ 64, tx: 720,
lsn:XXXX [PaxHeader/000000010000000000000003] XXXX
XXXX [GNUSparseFile.0/000000010000000000000003] XXXX
0/01FFC178, prev 0/01FFBF20, desc: INSERT_LEAF off: 344, blkref #0:
rel 1663/5/16396 blk 2
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"
Seems like it already stepped over 000000010000000000000002 earlier?
Could it be a table-of-contents order dependency bug or something like
that?
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-28 22:08 Thomas Munro <[email protected]>
parent: Thomas Munro <[email protected]>
2 siblings, 0 replies; 29+ messages in thread
From: Thomas Munro @ 2026-03-28 22:08 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sun, Mar 29, 2026 at 10:49 AM Tom Lane <[email protected]> wrote:
> Huh. What is your platform exactly? Maybe more to the point,
> what is the tar program you're using?
FreeBSD tar 3.8.2.
> > Seems like it already stepped over 000000010000000000000002 earlier?
> > Could it be a table-of-contents order dependency bug or something like
> > that?
>
> If you look at the TAP script, you'll see that it tries to randomize
> the order of the entries in the tar file (see sub generate_archive).
> So if that's the problem, it shouldn't reproduce 100%, and also we
> should be seeing lots of freckles on the buildfarm. We're not, so
> there must be something off-the-beaten-track about your test
> environment.
Right, I see now. There is something different about
000000010000000000000002 though: it doesn't seem to have a normal
(non-PAX, non-GNU) TOC entry, unlike000000010000000000000001. Trying
to figure out why...
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-28 22:57 Tomas Vondra <[email protected]>
parent: Thomas Munro <[email protected]>
2 siblings, 1 reply; 29+ messages in thread
From: Tomas Vondra @ 2026-03-28 22:57 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Thomas Munro <[email protected]>; +Cc: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On 3/28/26 23:36, Tom Lane wrote:
> I wrote:
>> However ... I do not find any indication in the GNU tar docs
>> that it produces sparse files by default. It looks like you
>> need to say -S/--sparse to make that happen. Maybe you have
>> a version that's been hacked to make that the default?
>
> Bleah. Digging in the man pages at freebsd.org, I read
>
> --read-sparse
> (c, r, u modes only) Read sparse file information from disk.
> This is the reverse of --no-read-sparse and the default behav-
> ior.
>
> It's apparently been there and been default since FreeBSD 13.1.
> This leads one to wonder how come BF member dikkop is managing
> to run this test successfully. I speculate that it's using a
> filesystem type that doesn't do sparse files (cc'ing Vondra
> for confirmation on that).
>
It's running on ufs. But I think the explanation is very simple. We had
a short power outage on Thursday, and the FreeBSD machine failed to boot
properly after the power was restored. IIUC this test is new, right?
I fixed the machine, it'll start running the tests in a couple minutes.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-28 23:15 Thomas Munro <[email protected]>
parent: Thomas Munro <[email protected]>
2 siblings, 0 replies; 29+ messages in thread
From: Thomas Munro @ 2026-03-28 23:15 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Sun, Mar 29, 2026 at 11:37 AM Tom Lane <[email protected]> wrote:
> I wrote:
> > However ... I do not find any indication in the GNU tar docs
> > that it produces sparse files by default. It looks like you
> > need to say -S/--sparse to make that happen. Maybe you have
> > a version that's been hacked to make that the default?
>
> Bleah. Digging in the man pages at freebsd.org, I read
>
> --read-sparse
> (c, r, u modes only) Read sparse file information from disk.
> This is the reverse of --no-read-sparse and the default behav-
> ior.
>
> It's apparently been there and been default since FreeBSD 13.1.
> This leads one to wonder how come BF member dikkop is managing
> to run this test successfully. I speculate that it's using a
> filesystem type that doesn't do sparse files (cc'ing Vondra
> for confirmation on that).
>
> It looks like to make this test stable on modern FreeBSD,
> we need to see if tar accepts --no-read-sparse and use that
> switch if so.
Yeah. Here's my attempt at perl.
I think your Mac probably has a similar tar program BTW... but apfs
probably doesn't go around making holes visible to lseek()
automatically or at least as eagerly as my ZFS system.
Attachments:
[text/x-patch] 0001-Fix-pg_waldump-test-for-libarchive-tar.patch (1.8K, 2-0001-Fix-pg_waldump-test-for-libarchive-tar.patch)
download | inline diff:
From b1a9131d072dadfe9c5666e75a932ba70b8d2f99 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sun, 29 Mar 2026 12:03:42 +1300
Subject: [PATCH] Fix pg_waldump test for libarchive tar.
libarchive tar (the one shipped on macOS, *BSD systems) might decide to
archive non-standard sparse encoding with GNU extensions (unlike GNU tar
itself) by default. pg_waldump can't read them. Suppress that, if $TAR
understands --no-sparse-files.
Discussion: https://postgr.es/m/1624716.1774736283%40sss.pgh.pa.us
---
src/bin/pg_waldump/t/001_basic.pl | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 8bb8fa225f6..d911296bb66 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -11,6 +11,14 @@ use Test::More;
use List::Util qw(shuffle);
my $tar = $ENV{TAR};
+my @TAR_C_FLAGS;
+
+# libarchive tar (as found on *BSD and macOS) might create sparse files by
+# default, and we can't read them
+if (system("$tar --no-read-sparse -c - /dev/null > /dev/null") == 0)
+{
+ push(@TAR_C_FLAGS, "--no-read-sparse");
+}
program_help_ok('pg_waldump');
program_version_ok('pg_waldump');
@@ -331,7 +339,6 @@ sub test_pg_waldump
sub generate_archive
{
my ($archive, $directory, $compression_flags) = @_;
-
my @files;
opendir my $dh, $directory or die "opendir: $!";
while (my $entry = readdir $dh) {
@@ -346,7 +353,7 @@ sub generate_archive
# move into the WAL directory before archiving files
my $cwd = getcwd;
chdir($directory) || die "chdir: $!";
- command_ok([$tar, $compression_flags, $archive, @files]);
+ command_ok([$tar, @TAR_C_FLAGS, $compression_flags, $archive, @files]);
chdir($cwd) || die "chdir: $!";
}
--
2.52.0
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-29 13:33 Tomas Vondra <[email protected]>
parent: Tomas Vondra <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Tomas Vondra @ 2026-03-29 13:33 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On 3/29/26 00:12, Tom Lane wrote:
> Tomas Vondra <[email protected]> writes:
>> On 3/28/26 23:36, Tom Lane wrote:
>>> It's apparently been there and been default since FreeBSD 13.1.
>>> This leads one to wonder how come BF member dikkop is managing
>>> to run this test successfully. I speculate that it's using a
>>> filesystem type that doesn't do sparse files (cc'ing Vondra
>>> for confirmation on that).
>
>> It's running on ufs. But I think the explanation is very simple. We had
>> a short power outage on Thursday, and the FreeBSD machine failed to boot
>> properly after the power was restored. IIUC this test is new, right?
>
> Not that new, it dates to b15c15139, about a week ago.
>
> I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> using zfs, and confirmed that this cowboy hack fixes it:
>
Interesting. Then I guess it has to be due to some difference in ufs vs.
zfs, when handling sparse files. It might be useful to add a bit more
variation here, and switch some of the animals to non-default
filesystems (not just the FreeBSD ones, which we seem to have only two
that run reasonably often). I'd bet most of the linux systems run on
ext4/xfs, few on btrfs/zfs.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-29 22:11 Thomas Munro <[email protected]>
parent: Tomas Vondra <[email protected]>
0 siblings, 1 reply; 29+ messages in thread
From: Thomas Munro @ 2026-03-29 22:11 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Mon, Mar 30, 2026 at 2:33 AM Tomas Vondra <[email protected]> wrote:
> On 3/29/26 00:12, Tom Lane wrote:
> > I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> > using zfs, and confirmed that this cowboy hack fixes it:
> >
>
> Interesting. Then I guess it has to be due to some difference in ufs vs.
> zfs, when handling sparse files. It might be useful to add a bit more
> variation here, and switch some of the animals to non-default
> filesystems (not just the FreeBSD ones, which we seem to have only two
> that run reasonably often). I'd bet most of the linux systems run on
> ext4/xfs, few on btrfs/zfs.
UFS does have sparse files (its ancestor invented them some time
around (time_t) 0), it just doesn't make them unless you tell it to.
PostgreSQL only does that if you set wal_init_zero=false.
ZFS is different because it creates holes automagically when you write
zeroes, at least if compression is enabled so it has to scan all your
bytes anyway.
I was curious to know if BTRFS does that too, or hides
zero-compression at some lower invisible level:
$ echo "hello" > 1MB-sparse.dat
$ truncate -s 512KB 1MB-sparse.dat
$ echo "world" >> 1MB-sparse.dat
$ truncate -s 1MB 1MB-sparse.dat
$ ls -l 1MB-sparse.dat
-rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat
$ du -hs 1MB-sparse.dat
8.0K 1MB-sparse.dat
$ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek
lseek(4, 0, SEEK_DATA) = 0
lseek(4, 0, SEEK_HOLE) = 4096
lseek(4, 4096, SEEK_DATA) = 512000
lseek(4, 512000, SEEK_HOLE) = 516096
lseek(4, 516096, SEEK_DATA) = -1 ENXIO (No such device or address)
... so that's a yes, lseek sees holes that we didn't ask it to make,
just like on ZFS, but the rest of this trace of GNU tar -S -cf is
interesting:
lseek(5, 0, SEEK_SET) = 0
lseek(5, 0, SEEK_SET) = 0
lseek(4, 0, SEEK_SET) = 0
lseek(4, 512000, SEEK_SET) = 512000
lseek(4, 1000000, SEEK_SET) = 1000000
It didn't write out PAX format! Instead it replicated the holes into
the tar file itself with SEEK_SET.
$ strings foo.tar | grep Sparse
You have to add --format=posix to enable the GNU behaviour that BSD
tar is emulating by default:
$ tar --format=posix -S -cf foo.tar 1MB-sparse.dat
$ strings foo.tar | grep Sparse
./GNUSparseFile.4190/1MB-sparse.dat
I expected GNU tar to be forced to do that if writing to non-seekable
output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it
manages to write out only ~10KB of plain ustar format that it is able
to restore to the full 1MB apparent size using some other trick, but
... ENOTIME, I dunno how it's doing that. Might be interesting to see
if pg_waldump can read it though, 'cause the bytes aren't all there.
BTW I confirmed that Apple tar does have -S by default too, it's just
that APFS doesn't make holes magically, so this test would presumably
have broken on a Mac if wal_init_zero had been forced to zero (not
tested).
Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
unlikely to hit this in the wild, and the symptom is a confusing error
in a maintenance tool, not corruption, so I don't think this is a big
deal. I might still try teaching the astreamer code to understand PAX
1.0 when it sees it in the next cycle though, for the benefit of
FreeBSD users. A quick and dirty version could probably just unmangle
the name and skip the first block of data, since any valid WAL file
will not begin with a hole and valid WAL data will end at the first
hole and fail our verification, but of course a real implementation
should read the map properly[1]...
[1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html
^ permalink raw reply [nested|flat] 29+ messages in thread
* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-03-29 22:20 Thomas Munro <[email protected]>
parent: Thomas Munro <[email protected]>
0 siblings, 0 replies; 29+ messages in thread
From: Thomas Munro @ 2026-03-29 22:20 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>
On Mon, Mar 30, 2026 at 11:11 AM Thomas Munro <[email protected]> wrote:
> ... so that's a yes, lseek sees holes that we didn't ask it to make,
Oops, sorry, I wrote that email too fast and got my examples mixed up,
BTFS actually *doesnt* do that automatically, that was of course a
trace showing a file with explicitly made holes. So this is probably
be a ZFS-only issue unless you're using wal_init_zero=0, and then any
file system could result in PAX-sparse-format tarballs, but even then
only if you use non-default switches that in practice no one will use
with GNU tar, or if you use BSD tar. So in practice this is a
FreeBSD-only issue.
^ permalink raw reply [nested|flat] 29+ messages in thread
end of thread, other threads:[~2026-03-29 22:20 UTC | newest]
Thread overview: 29+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-10 09:36 Re: pg_waldump: support decoding of WAL inside tarfile Amul Sul <[email protected]>
2026-02-18 06:58 ` Amul Sul <[email protected]>
2026-03-02 13:00 ` Amul Sul <[email protected]>
2026-03-04 00:37 ` Andrew Dunstan <[email protected]>
2026-03-04 12:52 ` Amul Sul <[email protected]>
2026-03-04 21:50 ` Andrew Dunstan <[email protected]>
2026-03-09 12:26 ` Amul Sul <[email protected]>
2026-03-18 11:45 ` Amul Sul <[email protected]>
2026-03-18 15:16 ` Amul Sul <[email protected]>
2026-03-19 10:20 ` Amul Sul <[email protected]>
2026-03-19 20:48 ` Zsolt Parragi <[email protected]>
2026-03-20 11:31 ` Amul Sul <[email protected]>
2026-03-20 13:26 ` Amul Sul <[email protected]>
2026-03-20 19:33 ` Andrew Dunstan <[email protected]>
2026-03-21 06:19 ` Amul Sul <[email protected]>
2026-03-21 06:23 ` Michael Paquier <[email protected]>
2026-03-21 15:35 ` Amul Sul <[email protected]>
2026-03-21 17:26 ` Amul Sul <[email protected]>
2026-03-22 11:24 ` Andrew Dunstan <[email protected]>
2026-03-22 21:19 ` Andrew Dunstan <[email protected]>
2026-03-24 03:11 ` Michael Paquier <[email protected]>
2026-03-25 17:28 ` Andres Freund <[email protected]>
2026-03-28 21:36 ` Thomas Munro <[email protected]>
2026-03-28 22:08 ` Thomas Munro <[email protected]>
2026-03-28 22:57 ` Tomas Vondra <[email protected]>
2026-03-29 13:33 ` Tomas Vondra <[email protected]>
2026-03-29 22:11 ` Thomas Munro <[email protected]>
2026-03-29 22:20 ` Thomas Munro <[email protected]>
2026-03-28 23:15 ` Thomas Munro <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox