Re: pg_waldump: support decoding of WAL inside tarfile

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: pg_waldump: support decoding of WAL inside tarfile
7+ messages / 2 participants
[nested] [flat]

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-02 12:29  Amul Sul <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Amul Sul @ 2026-01-02 12:29 UTC (permalink / raw)
  To: Chao Li <[email protected]>; +Cc: Jakub Wartak <[email protected]>; Robert Haas <[email protected]>; PostgreSQL Hackers <[email protected]>

Attached is the rebased version, includes some code comment improvements.

Regards,
Amul


Attachments:

  [application/octet-stream] v10-0001-Refactor-pg_waldump-Move-some-declarations-to-ne.patch (2.4K, 2-v10-0001-Refactor-pg_waldump-Move-some-declarations-to-ne.patch)
  download | inline diff:
From 6e0f330b88eb027c36b1272b0bb7eccb1c50e39a Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 24 Jun 2025 11:33:20 +0530
Subject: [PATCH v10 1/8] Refactor: pg_waldump: Move some declarations to new
 pg_waldump.h

This change prepares for a second source file in this directory to
support reading WAL from tar files. Common structures, declarations,
and functions are being exported through this include file so
they can be used in both files.
---
 src/bin/pg_waldump/pg_waldump.c | 12 +++---------
 src/bin/pg_waldump/pg_waldump.h | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 9 deletions(-)
 create mode 100644 src/bin/pg_waldump/pg_waldump.h

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index aae655966ef..0a0e9b7caaf 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -29,6 +29,7 @@
 #include "common/logging.h"
 #include "common/relpath.h"
 #include "getopt_long.h"
+#include "pg_waldump.h"
 #include "rmgrdesc.h"
 #include "storage/bufpage.h"
 
@@ -37,21 +38,14 @@
  * give a thought about doing the same in pg_walinspect contrib module as well.
  */
 
+int			WalSegSz;
+
 static const char *progname;
 
-static int	WalSegSz;
 static volatile sig_atomic_t time_to_stop = false;
 
 static const RelFileLocator emptyRelFileLocator = {0, 0, 0};
 
-typedef struct XLogDumpPrivate
-{
-	TimeLineID	timeline;
-	XLogRecPtr	startptr;
-	XLogRecPtr	endptr;
-	bool		endptr_reached;
-} XLogDumpPrivate;
-
 typedef struct XLogDumpConfig
 {
 	/* display options */
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
new file mode 100644
index 00000000000..926d529f9d6
--- /dev/null
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_waldump.h - decode and display WAL
+ *
+ * Copyright (c) 2025, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		  src/bin/pg_waldump/pg_waldump.h
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_WALDUMP_H
+#define PG_WALDUMP_H
+
+#include "access/xlogdefs.h"
+
+extern int	WalSegSz;
+
+/* Contains the necessary information to drive WAL decoding */
+typedef struct XLogDumpPrivate
+{
+	TimeLineID	timeline;
+	XLogRecPtr	startptr;
+	XLogRecPtr	endptr;
+	bool		endptr_reached;
+} XLogDumpPrivate;
+
+#endif							/* end of PG_WALDUMP_H */
-- 
2.47.1



  [application/octet-stream] v10-0002-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch (2.5K, 3-v10-0002-Refactor-pg_waldump-Separate-logic-used-to-calcu.patch)
  download | inline diff:
From cd4adf2105028a44527fe3fbbea65153de8d8a4c Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 26 Jun 2025 11:42:53 +0530
Subject: [PATCH v10 2/8] Refactor: pg_waldump: Separate logic used to
 calculate the required read size.

This refactoring prepares the codebase for an upcoming patch that will
support reading WAL from tar files. The logic for calculating the
required read size has been updated to handle both normal WAL files
and WAL files located inside a tar archive.
---
 src/bin/pg_waldump/pg_waldump.c | 46 +++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 0a0e9b7caaf..a33fc28b9d5 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -327,6 +327,35 @@ identify_target_directory(char *directory, char *fname)
 	return NULL;				/* not reached */
 }
 
+/*
+ * Returns the size in bytes of the data to be read. Returns -1 if the end
+ * point has already been reached.
+ */
+static inline int
+required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr,
+				  int reqLen)
+{
+	int			count = XLOG_BLCKSZ;
+
+	if (unlikely(private->endptr_reached))
+		return -1;
+
+	if (XLogRecPtrIsValid(private->endptr))
+	{
+		if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
+			count = XLOG_BLCKSZ;
+		else if (targetPagePtr + reqLen <= private->endptr)
+			count = private->endptr - targetPagePtr;
+		else
+		{
+			private->endptr_reached = true;
+			return -1;
+		}
+	}
+
+	return count;
+}
+
 /* pg_waldump's XLogReaderRoutine->segment_open callback */
 static void
 WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
@@ -384,21 +413,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 				XLogRecPtr targetPtr, char *readBuff)
 {
 	XLogDumpPrivate *private = state->private_data;
-	int			count = XLOG_BLCKSZ;
+	int			count = required_read_len(private, targetPagePtr, reqLen);
 	WALReadError errinfo;
 
-	if (XLogRecPtrIsValid(private->endptr))
-	{
-		if (targetPagePtr + XLOG_BLCKSZ <= private->endptr)
-			count = XLOG_BLCKSZ;
-		else if (targetPagePtr + reqLen <= private->endptr)
-			count = private->endptr - targetPagePtr;
-		else
-		{
-			private->endptr_reached = true;
-			return -1;
-		}
-	}
+	/* Bail out if the count to be read is not valid */
+	if (count < 0)
+		return -1;
 
 	if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline,
 				 &errinfo))
-- 
2.47.1



  [application/octet-stream] v10-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch (5.6K, 4-v10-0003-Refactor-pg_waldump-Restructure-TAP-tests.patch)
  download | inline diff:
From f911d0ff34bc2c2e12fb0849785d2b5864c73594 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 16:12:11 +0530
Subject: [PATCH v10 3/8] Refactor: pg_waldump: Restructure TAP tests.

Restructured some tests to run inside a loop, facilitating their
re-execution for decoding WAL from tar archives.

== NOTE ==
  This is not intended to be committed separately. It can be merged
  with the next patch, which is the main patch implementing this
  feature.
---
 src/bin/pg_waldump/t/001_basic.pl | 123 ++++++++++++++++--------------
 1 file changed, 67 insertions(+), 56 deletions(-)

diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 5db5d20136f..3288fadcf48 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -198,28 +198,6 @@ command_like(
 	],
 	qr/./,
 	'runs with start and end segment specified');
-command_fails_like(
-	[ 'pg_waldump', '--path' => $node->data_dir ],
-	qr/error: no start WAL location given/,
-	'path option requires start location');
-command_like(
-	[
-		'pg_waldump',
-		'--path' => $node->data_dir,
-		'--start' => $start_lsn,
-		'--end' => $end_lsn,
-	],
-	qr/./,
-	'runs with path option and start and end locations');
-command_fails_like(
-	[
-		'pg_waldump',
-		'--path' => $node->data_dir,
-		'--start' => $start_lsn,
-	],
-	qr/error: error in WAL record at/,
-	'falling off the end of the WAL results in an error');
-
 command_like(
 	[
 		'pg_waldump', '--quiet',
@@ -227,15 +205,6 @@ command_like(
 	],
 	qr/^$/,
 	'no output with --quiet option');
-command_fails_like(
-	[
-		'pg_waldump', '--quiet',
-		'--path' => $node->data_dir,
-		'--start' => $start_lsn
-	],
-	qr/error: error in WAL record at/,
-	'errors are shown with --quiet');
-
 
 # Test for: Display a message that we're skipping data if `from`
 # wasn't a pointer to the start of a record.
@@ -272,7 +241,6 @@ sub test_pg_waldump
 
 	my $result = IPC::Run::run [
 		'pg_waldump',
-		'--path' => $node->data_dir,
 		'--start' => $start_lsn,
 		'--end' => $end_lsn,
 		@opts
@@ -288,38 +256,81 @@ sub test_pg_waldump
 
 my @lines;
 
-@lines = test_pg_waldump;
-is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
+my @scenarios = (
+	{
+		'path' => $node->data_dir
+	});
 
-@lines = test_pg_waldump('--limit' => 6);
-is(@lines, 6, 'limit option observed');
+for my $scenario (@scenarios)
+{
+	my $path = $scenario->{'path'};
 
-@lines = test_pg_waldump('--fullpage');
-is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
+	SKIP:
+	{
+		command_fails_like(
+			[ 'pg_waldump', '--path' => $path ],
+			qr/error: no start WAL location given/,
+			'path option requires start location');
+		command_like(
+			[
+				'pg_waldump',
+				'--path' => $path,
+				'--start' => $start_lsn,
+				'--end' => $end_lsn,
+			],
+			qr/./,
+			'runs with path option and start and end locations');
+		command_fails_like(
+			[
+				'pg_waldump',
+				'--path' => $path,
+				'--start' => $start_lsn,
+			],
+			qr/error: error in WAL record at/,
+			'falling off the end of the WAL results in an error');
 
-@lines = test_pg_waldump('--stats');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+		command_fails_like(
+			[
+				'pg_waldump', '--quiet',
+				'--path' => $path,
+				'--start' => $start_lsn
+			],
+			qr/error: error in WAL record at/,
+			'errors are shown with --quiet');
 
-@lines = test_pg_waldump('--stats=record');
-like($lines[0], qr/WAL statistics/, "statistics on stdout");
-is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
+		@lines = test_pg_waldump('--path' => $path);
+		is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
 
-@lines = test_pg_waldump('--rmgr' => 'Btree');
-is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+		@lines = test_pg_waldump('--path' => $path, '--limit' => 6);
+		is(@lines, 6, 'limit option observed');
 
-@lines = test_pg_waldump('--fork' => 'init');
-is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+		@lines = test_pg_waldump('--path' => $path, '--fullpage');
+		is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
 
-@lines = test_pg_waldump(
-	'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
-is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
-	0, 'only lines for selected relation');
+		@lines = test_pg_waldump('--path' => $path, '--stats');
+		like($lines[0], qr/WAL statistics/, "statistics on stdout");
+		is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
 
-@lines = test_pg_waldump(
-	'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
-	'--block' => 1);
-is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+		@lines = test_pg_waldump('--path' => $path, '--stats=record');
+		like($lines[0], qr/WAL statistics/, "statistics on stdout");
+		is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
 
+		@lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree');
+		is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
+
+		@lines = test_pg_waldump('--path' => $path, '--fork' => 'init');
+		is(grep(!/fork init/, @lines), 0, 'only init fork lines');
+
+		@lines = test_pg_waldump('--path' => $path,
+			'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
+		is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
+			0, 'only lines for selected relation');
+
+		@lines = test_pg_waldump('--path' => $path,
+			'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
+			'--block' => 1);
+		is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+	}
+}
 
 done_testing();
-- 
2.47.1



  [application/octet-stream] v10-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch (37.3K, 5-v10-0004-pg_waldump-Add-support-for-archived-WAL-decoding.patch)
  download | inline diff:
From b8fba73c2abf78fd6972ac077b3d22332a4d477b Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Fri, 2 Jan 2026 17:07:34 +0530
Subject: [PATCH v10 4/8] pg_waldump: Add support for archived WAL decoding.

pg_waldump can now accept the path to a tar archive containing WAL
files and decode them. This feature was added primarily for
pg_verifybackup, which previously disabled WAL parsing for
tar-formatted backups.

Note that this patch requires that the WAL files within the archive be
in sequential order; an error will be reported otherwise. The next
patch is planned to remove this restriction.
---
 doc/src/sgml/ref/pg_waldump.sgml     |   8 +-
 src/bin/pg_waldump/Makefile          |   7 +-
 src/bin/pg_waldump/archive_waldump.c | 594 +++++++++++++++++++++++++++
 src/bin/pg_waldump/meson.build       |   4 +-
 src/bin/pg_waldump/pg_waldump.c      | 218 +++++++---
 src/bin/pg_waldump/pg_waldump.h      |  34 ++
 src/bin/pg_waldump/t/001_basic.pl    |  84 +++-
 src/tools/pgindent/typedefs.list     |   3 +
 8 files changed, 876 insertions(+), 76 deletions(-)
 create mode 100644 src/bin/pg_waldump/archive_waldump.c

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index ce23add5577..d004bb0f67e 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -141,13 +141,17 @@ PostgreSQL documentation
       <term><option>--path=<replaceable>path</replaceable></option></term>
       <listitem>
        <para>
-        Specifies a directory to search for WAL segment files or a
-        directory with a <literal>pg_wal</literal> subdirectory that
+        Specifies a tar archive or a directory to search for WAL segment files
+        or a directory with a <literal>pg_wal</literal> subdirectory that
         contains such files.  The default is to search in the current
         directory, the <literal>pg_wal</literal> subdirectory of the
         current directory, and the <literal>pg_wal</literal> subdirectory
         of <envar>PGDATA</envar>.
        </para>
+       <para>
+        If a tar archive is provided, its WAL segment files must be in
+        sequential order; otherwise, an error will be reported.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile
index 4c1ee649501..aabb87566a2 100644
--- a/src/bin/pg_waldump/Makefile
+++ b/src/bin/pg_waldump/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_waldump - decode and display WAL"
 PGAPPICON=win32
 
+# make these available to TAP test scripts
+export TAR
+
 subdir = src/bin/pg_waldump
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
@@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global
 OBJS = \
 	$(RMGRDESCOBJS) \
 	$(WIN32RES) \
+	archive_waldump.o \
 	compat.o \
 	pg_waldump.o \
 	rmgrdesc.o \
 	xlogreader.o \
 	xlogstats.o
 
-override CPPFLAGS := -DFRONTEND $(CPPFLAGS)
+override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS)
+LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils
 
 RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c)))
 RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES))
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
new file mode 100644
index 00000000000..6ba067163d5
--- /dev/null
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -0,0 +1,594 @@
+/*-------------------------------------------------------------------------
+ *
+ * archive_waldump.c
+ *		A generic facility for reading WAL data from tar archives via archive
+ *		streamer.
+ *
+ * Portions Copyright (c) 2026, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		src/bin/pg_waldump/archive_waldump.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include <unistd.h>
+
+#include "access/xlog_internal.h"
+#include "common/hashfn.h"
+#include "common/logging.h"
+#include "fe_utils/simple_list.h"
+#include "pg_waldump.h"
+
+/*
+ * How many bytes should we try to read from a file at once?
+ */
+#define READ_CHUNK_SIZE				(128 * 1024)
+
+/* Hash entry structure for holding WAL segment data read from the archive */
+typedef struct ArchivedWALEntry
+{
+	uint32		status;			/* hash status */
+	XLogSegNo	segno;			/* hash key: WAL segment number */
+	TimeLineID	timeline;		/* timeline of this wal file */
+
+	StringInfoData buf;
+	bool		tmpseg_exists;	/* spill file exists? */
+
+	int			total_read;		/* total read of archived WAL segment */
+} ArchivedWALEntry;
+
+#define SH_PREFIX				ArchivedWAL
+#define SH_ELEMENT_TYPE			ArchivedWALEntry
+#define SH_KEY_TYPE				XLogSegNo
+#define SH_KEY					segno
+#define SH_HASH_KEY(tb, key)	murmurhash64((uint64) key)
+#define SH_EQUAL(tb, a, b)		(a == b)
+#define SH_GET_HASH(tb, a)		a->hash
+#define SH_SCOPE				static inline
+#define SH_RAW_ALLOCATOR		pg_malloc0
+#define SH_DECLARE
+#define SH_DEFINE
+#include "lib/simplehash.h"
+
+static ArchivedWAL_hash *ArchivedWAL_HTAB = NULL;
+
+typedef struct astreamer_waldump
+{
+	astreamer	base;
+	XLogDumpPrivate *privateInfo;
+} astreamer_waldump;
+
+static int	read_archive_file(XLogDumpPrivate *privateInfo, Size count);
+static ArchivedWALEntry *get_archive_wal_entry(XLogSegNo segno,
+											   XLogDumpPrivate *privateInfo);
+
+static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
+static void astreamer_waldump_content(astreamer *streamer,
+									  astreamer_member *member,
+									  const char *data, int len,
+									  astreamer_archive_context context);
+static void astreamer_waldump_finalize(astreamer *streamer);
+static void astreamer_waldump_free(astreamer *streamer);
+
+static bool member_is_wal_file(astreamer_waldump *mystreamer,
+							   astreamer_member *member,
+							   XLogSegNo *curSegNo,
+							   TimeLineID *curTimeline);
+
+static const astreamer_ops astreamer_waldump_ops = {
+	.content = astreamer_waldump_content,
+	.finalize = astreamer_waldump_finalize,
+	.free = astreamer_waldump_free
+};
+
+/*
+ * Returns true if the given file is a tar archive and outputs its compression
+ * algorithm.
+ */
+bool
+is_archive_file(const char *fname, pg_compress_algorithm *compression)
+{
+	int			fname_len = strlen(fname);
+	pg_compress_algorithm compress_algo;
+
+	/* Now, check the compression type of the tar */
+	if (fname_len > 4 &&
+		strcmp(fname + fname_len - 4, ".tar") == 0)
+		compress_algo = PG_COMPRESSION_NONE;
+	else if (fname_len > 4 &&
+			 strcmp(fname + fname_len - 4, ".tgz") == 0)
+		compress_algo = PG_COMPRESSION_GZIP;
+	else if (fname_len > 7 &&
+			 strcmp(fname + fname_len - 7, ".tar.gz") == 0)
+		compress_algo = PG_COMPRESSION_GZIP;
+	else if (fname_len > 8 &&
+			 strcmp(fname + fname_len - 8, ".tar.lz4") == 0)
+		compress_algo = PG_COMPRESSION_LZ4;
+	else if (fname_len > 8 &&
+			 strcmp(fname + fname_len - 8, ".tar.zst") == 0)
+		compress_algo = PG_COMPRESSION_ZSTD;
+	else
+		return false;
+
+	*compression = compress_algo;
+
+	return true;
+}
+
+/*
+ * Initializes the tar archive reader, creates a hash table for WAL entries,
+ * checks for existing valid WAL segments in the archive file and retrieves the
+ * segment size, and sets up filters for relevant entries.
+ */
+void
+init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
+					pg_compress_algorithm compression)
+{
+	int			fd;
+	astreamer  *streamer;
+	ArchivedWALEntry *entry = NULL;
+	XLogLongPageHeader longhdr;
+
+	/* Open tar archive and store its file descriptor */
+	fd = open_file_in_directory(waldir, privateInfo->archive_name);
+
+	if (fd < 0)
+		pg_fatal("could not open file \"%s\"", privateInfo->archive_name);
+
+	privateInfo->archive_fd = fd;
+
+	streamer = astreamer_waldump_new(privateInfo);
+
+	/* Before that we must parse the tar archive. */
+	streamer = astreamer_tar_parser_new(streamer);
+
+	/* Before that we must decompress, if archive is compressed. */
+	if (compression == PG_COMPRESSION_GZIP)
+		streamer = astreamer_gzip_decompressor_new(streamer);
+	else if (compression == PG_COMPRESSION_LZ4)
+		streamer = astreamer_lz4_decompressor_new(streamer);
+	else if (compression == PG_COMPRESSION_ZSTD)
+		streamer = astreamer_zstd_decompressor_new(streamer);
+
+	privateInfo->archive_streamer = streamer;
+
+	/* Hash table storing WAL entries read from the archive */
+	ArchivedWAL_HTAB = ArchivedWAL_create(16, NULL);
+
+	/*
+	 * Verify that the archive contains valid WAL files and fetch WAL segment
+	 * size
+	 */
+	while (entry == NULL || entry->buf.len < XLOG_BLCKSZ)
+	{
+		if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
+			pg_fatal("could not find WAL in \"%s\" archive",
+					 privateInfo->archive_name);
+
+		entry = privateInfo->cur_wal;
+	}
+
+	/* Set WalSegSz if WAL data is successfully read */
+	longhdr = (XLogLongPageHeader) entry->buf.data;
+
+	WalSegSz = longhdr->xlp_seg_size;
+
+	if (!IsValidWalSegSize(WalSegSz))
+	{
+		pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)",
+							  "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)",
+							  WalSegSz),
+					 privateInfo->archive_name, WalSegSz);
+		pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB.");
+		exit(1);
+	}
+
+	/*
+	 * With the WAL segment size available, we can now initialize the
+	 * dependent start and end segment numbers.
+	 */
+	Assert(!XLogRecPtrIsInvalid(privateInfo->startptr));
+	XLByteToSeg(privateInfo->startptr, privateInfo->startSegNo, WalSegSz);
+
+	if (XLogRecPtrIsInvalid(privateInfo->endptr))
+		privateInfo->endSegNo = UINT64_MAX;
+	else
+		XLByteToSeg(privateInfo->endptr, privateInfo->endSegNo, WalSegSz);
+}
+
+/*
+ * Release the archive streamer chain and close the archive file.
+ */
+void
+free_archive_reader(XLogDumpPrivate *privateInfo)
+{
+	/*
+	 * NB: Normally, astreamer_finalize() is called before astreamer_free() to
+	 * flush any remaining buffered data or to ensure the end of the tar
+	 * archive is reached. However, when decoding a WAL file, once we hit the
+	 * end LSN, any remaining WAL data in the buffer or the tar archive's
+	 * unreached end can be safely ignored.
+	 */
+	astreamer_free(privateInfo->archive_streamer);
+
+	/* Close the file. */
+	if (close(privateInfo->archive_fd) != 0)
+		pg_log_error("could not close file \"%s\": %m",
+					 privateInfo->archive_name);
+}
+
+/*
+ * Copies WAL data from astreamer to readBuff; if unavailable, fetches more
+ * from the tar archive via astreamer.
+ */
+int
+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+					  Size count, char *readBuff)
+{
+	char	   *p = readBuff;
+	Size		nbytes = count;
+	XLogRecPtr	recptr = targetPagePtr;
+	XLogSegNo	segno;
+	ArchivedWALEntry *entry;
+
+	XLByteToSeg(targetPagePtr, segno, WalSegSz);
+	entry = get_archive_wal_entry(segno, privateInfo);
+
+	while (nbytes > 0)
+	{
+		char	   *buf = entry->buf.data;
+		int			len = entry->buf.len;
+
+		/* WAL record range that the buffer contains */
+		XLogRecPtr	endPtr;
+		XLogRecPtr	startPtr;
+
+		XLogSegNoOffsetToRecPtr(entry->segno, entry->total_read,
+								WalSegSz, endPtr);
+		startPtr = endPtr - len;
+
+		/*
+		 * pg_waldump may request to re-read the currently active page, but
+		 * never a page older than the current one. Therefore, any fully
+		 * consumed WAL data preceding the current page can be safely
+		 * discarded.
+		 */
+		if (recptr >= endPtr)
+		{
+			/* Discard the buffered data */
+			resetStringInfo(&entry->buf);
+			len = 0;
+
+			/*
+			 * Push back the partial page data for the current page to the
+			 * buffer, ensuring it remains full page available for re-reading
+			 * if requested.
+			 */
+			if (p > readBuff)
+			{
+				Assert((count - nbytes) > 0);
+				appendBinaryStringInfo(&entry->buf, readBuff, count - nbytes);
+			}
+		}
+
+		if (len > 0 && recptr > startPtr)
+		{
+			int			skipBytes = 0;
+
+			/*
+			 * The required offset is not at the start of the buffer, so skip
+			 * bytes until reaching the desired offset of the target page.
+			 */
+			skipBytes = recptr - startPtr;
+
+			buf += skipBytes;
+			len -= skipBytes;
+		}
+
+		if (len > 0)
+		{
+			int			readBytes = len >= nbytes ? nbytes : len;
+
+			/* Ensure the reading page is in the buffer */
+			Assert(recptr >= startPtr && recptr < endPtr);
+
+			memcpy(p, buf, readBytes);
+
+			/* Update state for read */
+			nbytes -= readBytes;
+			p += readBytes;
+			recptr += readBytes;
+		}
+		else
+		{
+			/*
+			 * Fetch more data; raise an error if it's not the current segment
+			 * being read by the archive streamer or if reading of the
+			 * archived file has finished.
+			 */
+			if (privateInfo->cur_wal != entry ||
+				read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+			{
+				char		fname[MAXFNAMELEN];
+
+				XLogFileName(fname, privateInfo->timeline, entry->segno,
+							 WalSegSz);
+				pg_fatal("could not read file \"%s\" from archive \"%s\": read %lld of %lld",
+						 fname, privateInfo->archive_name,
+						 (long long int) count - nbytes,
+						 (long long int) nbytes);
+			}
+		}
+	}
+
+	/*
+	 * Should have either have successfully read all the requested bytes or
+	 * reported a failure before this point.
+	 */
+	Assert(nbytes == 0);
+
+	/*
+	 * NB: We return the fixed value provided as input. Although we could
+	 * return a boolean since we either successfully read the WAL page or
+	 * raise an error, but the caller expects this value to be returned. The
+	 * routine that reads WAL pages from the physical WAL file follows the
+	 * same convention.
+	 */
+	return count;
+}
+
+/*
+ * Reads the archive file and passes it to the archive streamer for
+ * decompression.
+ */
+static int
+read_archive_file(XLogDumpPrivate *privateInfo, Size count)
+{
+	int			rc;
+	char	   *buffer;
+
+	buffer = pg_malloc(READ_CHUNK_SIZE * sizeof(uint8));
+
+	rc = read(privateInfo->archive_fd, buffer, count);
+	if (rc < 0)
+		pg_fatal("could not read file \"%s\": %m",
+				 privateInfo->archive_name);
+
+	/*
+	 * Decompress (if required), and then parse the previously read contents
+	 * of the tar file.
+	 */
+	if (rc > 0)
+		astreamer_content(privateInfo->archive_streamer, NULL,
+						  buffer, rc, ASTREAMER_UNKNOWN);
+	pg_free(buffer);
+
+	return rc;
+}
+
+/*
+ * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * it invokes the routine to read the archived file, which then populates the
+ * entry in the hash table.
+ */
+static ArchivedWALEntry *
+get_archive_wal_entry(XLogSegNo segno, XLogDumpPrivate *privateInfo)
+{
+	ArchivedWALEntry *entry = NULL;
+	char		fname[MAXFNAMELEN];
+
+	/* Search hash table */
+	entry = ArchivedWAL_lookup(ArchivedWAL_HTAB, segno);
+
+	if (entry != NULL)
+		return entry;
+
+	/* Needed WAL yet to be decoded from archive, do the same */
+	while (1)
+	{
+		entry = privateInfo->cur_wal;
+
+		/* Fetch more data */
+		if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+			break;			/* archive file ended */
+
+		/*
+		 * Either, here for the first time, or the archived streamer is
+		 * reading a non-WAL file or an irrelevant WAL file.
+		 */
+		if (entry == NULL)
+			continue;
+
+		/* Found the required entry */
+		if (entry->segno == segno)
+			return entry;
+
+		/*
+		 * Ignore if the timeline is different or the current segment is not
+		 * the desired one.
+		 */
+		if (privateInfo->timeline != entry->timeline ||
+			privateInfo->startSegNo > entry->segno ||
+			privateInfo->endSegNo < entry->segno)
+		{
+			privateInfo->cur_wal = NULL;
+			continue;
+		}
+
+		/*
+		 * XXX: If the segment being read not the requested one, the data must
+		 * be buffered, as we currently lack the mechanism to write it to a
+		 * temporary file. This is a known limitation that will be fixed in the
+		 * next patch, as the buffer could grow up to the full WAL segment
+		 * size.
+		 */
+		if (segno > entry->segno)
+			continue;
+
+		/* WAL segments must be archived in order */
+		pg_log_error("WAL files are not archived in sequential order");
+		pg_log_error_detail("Expecting segment number " UINT64_FORMAT " but found " UINT64_FORMAT ".",
+							segno, entry->segno);
+		exit(1);
+	}
+
+	/* Requested WAL segment not found */
+	XLogFileName(fname, privateInfo->timeline, segno, WalSegSz);
+	pg_fatal("could not find file \"%s\" in archive", fname);
+}
+
+/*
+ * Create an astreamer that can read WAL from a tar file.
+ */
+static astreamer *
+astreamer_waldump_new(XLogDumpPrivate *privateInfo)
+{
+	astreamer_waldump *streamer;
+
+	streamer = palloc0(sizeof(astreamer_waldump));
+	*((const astreamer_ops **) &streamer->base.bbs_ops) =
+		&astreamer_waldump_ops;
+
+	streamer->privateInfo = privateInfo;
+
+	return &streamer->base;
+}
+
+/*
+ * Main entry point of the archive streamer for reading WAL data from a tar
+ * file. If a member is identified as a valid WAL file, a hash entry is created
+ * for it, and its contents are copied into that entry's buffer, making them
+ * accessible to the decoding routine.
+ */
+static void
+astreamer_waldump_content(astreamer *streamer, astreamer_member *member,
+						  const char *data, int len,
+						  astreamer_archive_context context)
+{
+	astreamer_waldump *mystreamer = (astreamer_waldump *) streamer;
+	XLogDumpPrivate *privateInfo = mystreamer->privateInfo;
+
+	Assert(context != ASTREAMER_UNKNOWN);
+
+	switch (context)
+	{
+		case ASTREAMER_MEMBER_HEADER:
+			{
+				XLogSegNo	segno;
+				TimeLineID	timeline;
+				ArchivedWALEntry *entry;
+				bool		found;
+
+				pg_log_debug("reading \"%s\"", member->pathname);
+
+				if (!member_is_wal_file(mystreamer, member,
+										&segno, &timeline))
+					break;
+
+				entry = ArchivedWAL_insert(ArchivedWAL_HTAB, segno, &found);
+
+				/*
+				 * Shouldn't happen, but if it does, simply ignore the
+				 * duplicate WAL file.
+				 */
+				if (found)
+				{
+					pg_log_warning("ignoring duplicate WAL file found in archive: \"%s\"",
+								   member->pathname);
+					break;
+				}
+
+				initStringInfo(&entry->buf);
+				entry->timeline = timeline;
+				entry->total_read = 0;
+
+				privateInfo->cur_wal = entry;
+			}
+			break;
+
+		case ASTREAMER_MEMBER_CONTENTS:
+			if (privateInfo->cur_wal)
+			{
+				appendBinaryStringInfo(&privateInfo->cur_wal->buf, data, len);
+				privateInfo->cur_wal->total_read += len;
+			}
+			break;
+
+		case ASTREAMER_MEMBER_TRAILER:
+			privateInfo->cur_wal = NULL;
+			break;
+
+		case ASTREAMER_ARCHIVE_TRAILER:
+			break;
+
+		default:
+			/* Shouldn't happen. */
+			pg_fatal("unexpected state while parsing tar file");
+	}
+}
+
+/*
+ * End-of-stream processing for an astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_finalize(astreamer *streamer)
+{
+	Assert(streamer->bbs_next == NULL);
+}
+
+/*
+ * Free memory associated with a astreamer_waldump stream.
+ */
+static void
+astreamer_waldump_free(astreamer *streamer)
+{
+	Assert(streamer->bbs_next == NULL);
+	pfree(streamer);
+}
+
+/*
+ * Returns true if the archive member name matches the WAL naming format. If
+ * successful, it also outputs the WAL segment number, and timeline.
+ */
+static bool
+member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
+				   XLogSegNo *curSegNo, TimeLineID *curTimeline)
+{
+	int			pathlen;
+	XLogSegNo	segNo;
+	TimeLineID	timeline;
+	char	   *fname;
+
+	/* We are only interested in normal files. */
+	if (member->is_directory || member->is_link)
+		return false;
+
+	pathlen = strlen(member->pathname);
+	if (pathlen < XLOG_FNAME_LEN)
+		return false;
+
+	/* WAL file could be with full path */
+	fname = member->pathname + (pathlen - XLOG_FNAME_LEN);
+	if (!IsXLogFileName(fname))
+		return false;
+
+	/*
+	 * XXX: On some systems (e.g., OpenBSD), the tar utility includes
+	 * PaxHeaders when creating an archive. These are special entries that
+	 * store extended metadata for the file entry immediately following them,
+	 * and they share the exact same name as that file.
+	 */
+	if (strstr(member->pathname, "PaxHeaders."))
+		return false;
+
+	/* Parse position from file */
+	XLogFromFileName(fname, &timeline, &segNo, WalSegSz);
+
+	*curSegNo = segNo;
+	*curTimeline = timeline;
+
+	return true;
+}
diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build
index 633a9874bb5..a93bd947ca2 100644
--- a/src/bin/pg_waldump/meson.build
+++ b/src/bin/pg_waldump/meson.build
@@ -1,6 +1,7 @@
 # Copyright (c) 2022-2026, PostgreSQL Global Development Group
 
 pg_waldump_sources = files(
+  'archive_waldump.c',
   'compat.c',
   'pg_waldump.c',
   'rmgrdesc.c',
@@ -18,7 +19,7 @@ endif
 
 pg_waldump = executable('pg_waldump',
   pg_waldump_sources,
-  dependencies: [frontend_code, lz4, zstd],
+  dependencies: [frontend_code, lz4, zstd, libpq],
   c_args: ['-DFRONTEND'], # needed for xlogreader et al
   kwargs: default_bin_args,
 )
@@ -29,6 +30,7 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
+    'env': {'TAR': tar.found() ? tar.full_path() : ''},
     'tests': [
       't/001_basic.pl',
       't/002_save_fullpage.pl',
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a33fc28b9d5..e2c96c3f4ca 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -38,7 +38,7 @@
  * give a thought about doing the same in pg_walinspect contrib module as well.
  */
 
-int			WalSegSz;
+int			WalSegSz = DEFAULT_XLOG_SEG_SIZE;
 
 static const char *progname;
 
@@ -178,7 +178,7 @@ split_path(const char *path, char **dir, char **fname)
  *
  * return a read only fd
  */
-static int
+int
 open_file_in_directory(const char *directory, const char *fname)
 {
 	int			fd = -1;
@@ -444,6 +444,45 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 	return count;
 }
 
+/*
+ * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL
+ * files from tar archives.
+ */
+static void
+TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo,
+					  TimeLineID *tli_p)
+{
+	/* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->segment_close callback.
+ */
+static void
+TarWALDumpCloseSegment(XLogReaderState *state)
+{
+	/* No action needed */
+}
+
+/*
+ * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL
+ * files from tar archives.
+ */
+static int
+TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
+				   XLogRecPtr targetPtr, char *readBuff)
+{
+	XLogDumpPrivate *private = state->private_data;
+	int			count = required_read_len(private, targetPagePtr, reqLen);
+
+	/* Bail out if the count to be read is not valid */
+	if (count < 0)
+		return -1;
+
+	/* Read the WAL page from the archive streamer */
+	return read_archive_wal_page(private, targetPagePtr, count, readBuff);
+}
+
 /*
  * Boolean to return whether the given WAL record matches a specific relation
  * and optionally block.
@@ -781,8 +820,8 @@ usage(void)
 	printf(_("  -F, --fork=FORK        only show records that modify blocks in fork FORK;\n"
 			 "                         valid names are main, fsm, vm, init\n"));
 	printf(_("  -n, --limit=N          number of records to display\n"));
-	printf(_("  -p, --path=PATH        directory in which to find WAL segment files or a\n"
-			 "                         directory with a ./pg_wal that contains such files\n"
+	printf(_("  -p, --path=PATH        tar archive or a directory in which to find WAL segment files or\n"
+			 "                         a directory with a ./pg_wal that contains such files\n"
 			 "                         (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n"));
 	printf(_("  -q, --quiet            do not print any output, except for errors\n"));
 	printf(_("  -r, --rmgr=RMGR        only show records generated by resource manager RMGR;\n"
@@ -814,7 +853,10 @@ main(int argc, char **argv)
 	XLogRecord *record;
 	XLogRecPtr	first_record;
 	char	   *waldir = NULL;
+	char	   *walpath = NULL;
 	char	   *errormsg;
+	bool		is_archive = false;
+	pg_compress_algorithm compression;
 
 	static struct option long_options[] = {
 		{"bkp-details", no_argument, NULL, 'b'},
@@ -946,7 +988,7 @@ main(int argc, char **argv)
 				}
 				break;
 			case 'p':
-				waldir = pg_strdup(optarg);
+				walpath = pg_strdup(optarg);
 				break;
 			case 'q':
 				config.quiet = true;
@@ -1110,10 +1152,20 @@ main(int argc, char **argv)
 		goto bad_argument;
 	}
 
-	if (waldir != NULL)
+	if (walpath != NULL)
 	{
+		/* validate path points to tar archive */
+		if (is_archive_file(walpath, &compression))
+		{
+			char	   *fname = NULL;
+
+			split_path(walpath, &waldir, &fname);
+
+			private.archive_name = fname;
+			is_archive = true;
+		}
 		/* validate path points to directory */
-		if (!verify_directory(waldir))
+		else if (!verify_directory(walpath))
 		{
 			pg_log_error("could not open directory \"%s\": %m", waldir);
 			goto bad_argument;
@@ -1131,6 +1183,17 @@ main(int argc, char **argv)
 		int			fd;
 		XLogSegNo	segno;
 
+		/*
+		 * If a tar archive is passed using the --path option, all other
+		 * arguments become unnecessary.
+		 */
+		if (is_archive)
+		{
+			pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")",
+						 argv[optind]);
+			goto bad_argument;
+		}
+
 		split_path(argv[optind], &directory, &fname);
 
 		if (waldir == NULL && directory != NULL)
@@ -1141,69 +1204,77 @@ main(int argc, char **argv)
 				pg_fatal("could not open directory \"%s\": %m", waldir);
 		}
 
-		waldir = identify_target_directory(waldir, fname);
-		fd = open_file_in_directory(waldir, fname);
-		if (fd < 0)
-			pg_fatal("could not open file \"%s\"", fname);
-		close(fd);
-
-		/* parse position from file */
-		XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
-
-		if (!XLogRecPtrIsValid(private.startptr))
-			XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
-		else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+		if (fname != NULL && is_archive_file(fname, &compression))
 		{
-			pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
-						 LSN_FORMAT_ARGS(private.startptr),
-						 fname);
-			goto bad_argument;
+			private.archive_name = fname;
+			is_archive = true;
 		}
-
-		/* no second file specified, set end position */
-		if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
-			XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
-
-		/* parse ENDSEG if passed */
-		if (optind + 1 < argc)
+		else
 		{
-			XLogSegNo	endsegno;
-
-			/* ignore directory, already have that */
-			split_path(argv[optind + 1], &directory, &fname);
-
+			waldir = identify_target_directory(waldir, fname);
 			fd = open_file_in_directory(waldir, fname);
 			if (fd < 0)
 				pg_fatal("could not open file \"%s\"", fname);
 			close(fd);
 
 			/* parse position from file */
-			XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+			XLogFromFileName(fname, &private.timeline, &segno, WalSegSz);
 
-			if (endsegno < segno)
-				pg_fatal("ENDSEG %s is before STARTSEG %s",
-						 argv[optind + 1], argv[optind]);
+			if (!XLogRecPtrIsValid(private.startptr))
+				XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr);
+			else if (!XLByteInSeg(private.startptr, segno, WalSegSz))
+			{
+				pg_log_error("start WAL location %X/%08X is not inside file \"%s\"",
+							 LSN_FORMAT_ARGS(private.startptr),
+							 fname);
+				goto bad_argument;
+			}
 
-			if (!XLogRecPtrIsValid(private.endptr))
-				XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
-										private.endptr);
+			/* no second file specified, set end position */
+			if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr))
+				XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr);
 
-			/* set segno to endsegno for check of --end */
-			segno = endsegno;
-		}
+			/* parse ENDSEG if passed */
+			if (optind + 1 < argc)
+			{
+				XLogSegNo	endsegno;
 
+				/* ignore directory, already have that */
+				split_path(argv[optind + 1], &directory, &fname);
 
-		if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
-			private.endptr != (segno + 1) * WalSegSz)
-		{
-			pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
-						 LSN_FORMAT_ARGS(private.endptr),
-						 argv[argc - 1]);
-			goto bad_argument;
+				fd = open_file_in_directory(waldir, fname);
+				if (fd < 0)
+					pg_fatal("could not open file \"%s\"", fname);
+				close(fd);
+
+				/* parse position from file */
+				XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz);
+
+				if (endsegno < segno)
+					pg_fatal("ENDSEG %s is before STARTSEG %s",
+							 argv[optind + 1], argv[optind]);
+
+				if (!XLogRecPtrIsValid(private.endptr))
+					XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz,
+											private.endptr);
+
+				/* set segno to endsegno for check of --end */
+				segno = endsegno;
+			}
+
+
+			if (!XLByteInSeg(private.endptr, segno, WalSegSz) &&
+				private.endptr != (segno + 1) * WalSegSz)
+			{
+				pg_log_error("end WAL location %X/%08X is not inside file \"%s\"",
+							 LSN_FORMAT_ARGS(private.endptr),
+							 argv[argc - 1]);
+				goto bad_argument;
+			}
 		}
 	}
-	else
-		waldir = identify_target_directory(waldir, NULL);
+	else if (!is_archive)
+		waldir = identify_target_directory(walpath, NULL);
 
 	/* we don't know what to print */
 	if (!XLogRecPtrIsValid(private.startptr))
@@ -1215,12 +1286,36 @@ main(int argc, char **argv)
 	/* done with argument parsing, do the actual work */
 
 	/* we have everything we need, start reading */
-	xlogreader_state =
-		XLogReaderAllocate(WalSegSz, waldir,
-						   XL_ROUTINE(.page_read = WALDumpReadPage,
-									  .segment_open = WALDumpOpenSegment,
-									  .segment_close = WALDumpCloseSegment),
-						   &private);
+	if (is_archive)
+	{
+		/*
+		 * A NULL WAL directory indicates that the archive file is located in
+		 * the current working directory of the pg_waldump execution
+		 */
+		waldir = waldir ? pg_strdup(waldir) : pg_strdup(".");
+
+		/* Set up for reading tar file */
+		init_archive_reader(&private, waldir, compression);
+
+		/* Routine to decode WAL files in tar archive */
+		xlogreader_state =
+			XLogReaderAllocate(WalSegSz, waldir,
+							   XL_ROUTINE(.page_read = TarWALDumpReadPage,
+										  .segment_open = TarWALDumpOpenSegment,
+										  .segment_close = TarWALDumpCloseSegment),
+							   &private);
+	}
+	else
+	{
+		/* Routine to decode WAL files */
+		xlogreader_state =
+			XLogReaderAllocate(WalSegSz, waldir,
+							   XL_ROUTINE(.page_read = WALDumpReadPage,
+										  .segment_open = WALDumpOpenSegment,
+										  .segment_close = WALDumpCloseSegment),
+							   &private);
+	}
+
 	if (!xlogreader_state)
 		pg_fatal("out of memory while allocating a WAL reading processor");
 
@@ -1329,6 +1424,9 @@ main(int argc, char **argv)
 
 	XLogReaderFree(xlogreader_state);
 
+	if (is_archive)
+		free_archive_reader(&private);
+
 	return EXIT_SUCCESS;
 
 bad_argument:
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index 926d529f9d6..ec7a33d40e0 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -12,9 +12,13 @@
 #define PG_WALDUMP_H
 
 #include "access/xlogdefs.h"
+#include "fe_utils/astreamer.h"
 
 extern int	WalSegSz;
 
+/* Forward declaration */
+struct ArchivedWALEntry;
+
 /* Contains the necessary information to drive WAL decoding */
 typedef struct XLogDumpPrivate
 {
@@ -22,6 +26,36 @@ typedef struct XLogDumpPrivate
 	XLogRecPtr	startptr;
 	XLogRecPtr	endptr;
 	bool		endptr_reached;
+
+	/* Fields required to read WAL from archive */
+	char	   *archive_name;	/* Tar archive name */
+	int			archive_fd;		/* File descriptor for the open tar file */
+
+	astreamer  *archive_streamer;
+
+	/* What the archive streamer is currently reading */
+	struct ArchivedWALEntry *cur_wal;
+
+	/*
+	 * Although these values can be easily derived from startptr and endptr,
+	 * doing so repeatedly for each archived member would be inefficient, as
+	 * it would involve recalculating and filtering out irrelevant WAL
+	 * segments.
+	 */
+	XLogSegNo	startSegNo;
+	XLogSegNo	endSegNo;
 } XLogDumpPrivate;
 
+extern int	open_file_in_directory(const char *directory, const char *fname);
+
+extern bool is_archive_file(const char *fname,
+							pg_compress_algorithm *compression);
+extern void init_archive_reader(XLogDumpPrivate *privateInfo,
+								const char *waldir,
+								pg_compress_algorithm compression);
+extern void free_archive_reader(XLogDumpPrivate *privateInfo);
+extern int	read_archive_wal_page(XLogDumpPrivate *privateInfo,
+								  XLogRecPtr targetPagePtr,
+								  Size count, char *readBuff);
+
 #endif							/* end of PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 3288fadcf48..13567fbdba1 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -3,10 +3,13 @@
 
 use strict;
 use warnings FATAL => 'all';
+use Cwd;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
+my $tar = $ENV{TAR};
+
 program_help_ok('pg_waldump');
 program_version_ok('pg_waldump');
 program_options_handling_ok('pg_waldump');
@@ -235,7 +238,7 @@ command_like(
 sub test_pg_waldump
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
-	my @opts = @_;
+	my ($path, @opts) = @_;
 
 	my ($stdout, $stderr);
 
@@ -243,6 +246,7 @@ sub test_pg_waldump
 		'pg_waldump',
 		'--start' => $start_lsn,
 		'--end' => $end_lsn,
+		'--path' => $path,
 		@opts
 	  ],
 	  '>' => \$stdout,
@@ -254,11 +258,50 @@ sub test_pg_waldump
 	return @lines;
 }
 
-my @lines;
+# Create a tar archive, sorting the file order
+sub generate_archive
+{
+	my ($archive, $directory, $compression_flags) = @_;
+
+	my @files;
+	opendir my $dh, $directory or die "opendir: $!";
+	while (my $entry = readdir $dh) {
+		# Skip '.' and '..'
+		next if $entry eq '.' || $entry eq '..';
+		push @files, $entry;
+	}
+	closedir $dh;
+
+	@files = sort @files;
+
+	# move into the WAL directory before archiving files
+	my $cwd = getcwd;
+	chdir($directory) || die "chdir: $!";
+	command_ok([$tar, $compression_flags, $archive, @files]);
+	chdir($cwd) || die "chdir: $!";
+}
+
+my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short();
 
 my @scenarios = (
 	{
-		'path' => $node->data_dir
+		'path' => $node->data_dir,
+		'is_archive' => 0,
+		'enabled' => 1
+	},
+	{
+		'path' => "$tmp_dir/pg_wal.tar",
+		'compression_method' => 'none',
+		'compression_flags' => '-cf',
+		'is_archive' => 1,
+		'enabled' => 1
+	},
+	{
+		'path' => "$tmp_dir/pg_wal.tar.gz",
+		'compression_method' => 'gzip',
+		'compression_flags' => '-czf',
+		'is_archive' => 1,
+		'enabled' => check_pg_config("#define HAVE_LIBZ 1")
 	});
 
 for my $scenario (@scenarios)
@@ -267,6 +310,19 @@ for my $scenario (@scenarios)
 
 	SKIP:
 	{
+		skip "tar command is not available", 3
+		  if !defined $tar;
+		skip "$scenario->{'compression_method'} compression not supported by this build", 3
+		  if !$scenario->{'enabled'} && $scenario->{'is_archive'};
+
+		  # create pg_wal archive
+		  if ($scenario->{'is_archive'})
+		  {
+			  generate_archive($path,
+				  $node->data_dir . '/pg_wal',
+				  $scenario->{'compression_flags'});
+		  }
+
 		command_fails_like(
 			[ 'pg_waldump', '--path' => $path ],
 			qr/error: no start WAL location given/,
@@ -298,38 +354,42 @@ for my $scenario (@scenarios)
 			qr/error: error in WAL record at/,
 			'errors are shown with --quiet');
 
-		@lines = test_pg_waldump('--path' => $path);
+		my @lines;
+		@lines = test_pg_waldump($path);
 		is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines');
 
-		@lines = test_pg_waldump('--path' => $path, '--limit' => 6);
+		@lines = test_pg_waldump($path, '--limit' => 6);
 		is(@lines, 6, 'limit option observed');
 
-		@lines = test_pg_waldump('--path' => $path, '--fullpage');
+		@lines = test_pg_waldump($path, '--fullpage');
 		is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW');
 
-		@lines = test_pg_waldump('--path' => $path, '--stats');
+		@lines = test_pg_waldump($path, '--stats');
 		like($lines[0], qr/WAL statistics/, "statistics on stdout");
 		is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
 
-		@lines = test_pg_waldump('--path' => $path, '--stats=record');
+		@lines = test_pg_waldump($path, '--stats=record');
 		like($lines[0], qr/WAL statistics/, "statistics on stdout");
 		is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output');
 
-		@lines = test_pg_waldump('--path' => $path, '--rmgr' => 'Btree');
+		@lines = test_pg_waldump($path, '--rmgr' => 'Btree');
 		is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines');
 
-		@lines = test_pg_waldump('--path' => $path, '--fork' => 'init');
+		@lines = test_pg_waldump($path, '--fork' => 'init');
 		is(grep(!/fork init/, @lines), 0, 'only init fork lines');
 
-		@lines = test_pg_waldump('--path' => $path,
+		@lines = test_pg_waldump($path,
 			'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid");
 		is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines),
 			0, 'only lines for selected relation');
 
-		@lines = test_pg_waldump('--path' => $path,
+		@lines = test_pg_waldump($path,
 			'--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid",
 			'--block' => 1);
 		is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block');
+
+		# Cleanup.
+		unlink $path if $scenario->{'is_archive'};
 	}
 }
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b9e671fcda8..b0d73743cea 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -144,6 +144,8 @@ ArchiveOpts
 ArchiveShutdownCB
 ArchiveStartupCB
 ArchiveStreamState
+ArchivedWALEntry
+ArchivedWAL_hash
 ArchiverOutput
 ArchiverStage
 ArrayAnalyzeExtraData
@@ -3495,6 +3497,7 @@ astreamer_recovery_injector
 astreamer_tar_archiver
 astreamer_tar_parser
 astreamer_verify
+astreamer_waldump
 astreamer_zstd_frame
 auth_password_hook_typ
 autovac_table
-- 
2.47.1



  [application/octet-stream] v10-0005-pg_waldump-Remove-the-restriction-on-the-order-o.patch (13.3K, 6-v10-0005-pg_waldump-Remove-the-restriction-on-the-order-o.patch)
  download | inline diff:
From c2811b147d50a7e2e8f6566344e9f1f85614560d Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Thu, 6 Nov 2025 13:48:33 +0530
Subject: [PATCH v10 5/8] pg_waldump: Remove the restriction on the order of
 archived WAL files.

With previous patch, pg_waldump would stop decoding if WAL files were
not in the required sequence. With this patch, decoding will now
continue.  Any WAL file that is out of order will be written to a
temporary location, from which it will be read later. Once a temporary
file has been read, it will be removed.
---
 doc/src/sgml/ref/pg_waldump.sgml     |   8 +-
 src/bin/pg_waldump/archive_waldump.c | 230 ++++++++++++++++++++++++---
 src/bin/pg_waldump/pg_waldump.c      |  41 ++++-
 src/bin/pg_waldump/pg_waldump.h      |   4 +
 src/bin/pg_waldump/t/001_basic.pl    |   3 +-
 5 files changed, 264 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml
index d004bb0f67e..27adf77755c 100644
--- a/doc/src/sgml/ref/pg_waldump.sgml
+++ b/doc/src/sgml/ref/pg_waldump.sgml
@@ -149,8 +149,12 @@ PostgreSQL documentation
         of <envar>PGDATA</envar>.
        </para>
        <para>
-        If a tar archive is provided, its WAL segment files must be in
-        sequential order; otherwise, an error will be reported.
+        If a tar archive is provided and its WAL segment files are not in
+        sequential order, those files will be written to a temporary directory
+        named starting with <filename>waldump_tmp</filename>. This directory will be
+        created inside the directory specified by the <envar>TMPDIR</envar>
+        environment variable if it is set; otherwise, it will be created within
+        the same directory as the tar archive.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index 6ba067163d5..aa64da9a57d 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -17,6 +17,7 @@
 #include <unistd.h>
 
 #include "access/xlog_internal.h"
+#include "common/file_perm.h"
 #include "common/hashfn.h"
 #include "common/logging.h"
 #include "fe_utils/simple_list.h"
@@ -27,6 +28,9 @@
  */
 #define READ_CHUNK_SIZE				(128 * 1024)
 
+/* Temporary exported WAL file directory */
+static char *TmpWalSegDir = NULL;
+
 /* Hash entry structure for holding WAL segment data read from the archive */
 typedef struct ArchivedWALEntry
 {
@@ -64,6 +68,11 @@ typedef struct astreamer_waldump
 static int	read_archive_file(XLogDumpPrivate *privateInfo, Size count);
 static ArchivedWALEntry *get_archive_wal_entry(XLogSegNo segno,
 											   XLogDumpPrivate *privateInfo);
+static void setup_tmpseg_dir(const char *waldir);
+static void cleanup_tmpseg_dir_atexit(void);
+
+static FILE *prepare_tmp_write(XLogSegNo segno);
+static void perform_tmp_write(XLogSegNo segno, StringInfo buf, FILE *file);
 
 static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo);
 static void astreamer_waldump_content(astreamer *streamer,
@@ -121,7 +130,9 @@ is_archive_file(const char *fname, pg_compress_algorithm *compression)
 /*
  * Initializes the tar archive reader, creates a hash table for WAL entries,
  * checks for existing valid WAL segments in the archive file and retrieves the
- * segment size, and sets up filters for relevant entries.
+ * segment size, and sets up filters for relevant entries. It also configures a
+ * temporary directory for out-of-order WAL data and registers an exit callback
+ * to clean up temporary files.
  */
 void
 init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
@@ -197,6 +208,13 @@ init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir,
 		privateInfo->endSegNo = UINT64_MAX;
 	else
 		XLByteToSeg(privateInfo->endptr, privateInfo->endSegNo, WalSegSz);
+
+	/*
+	 * Setup temporary directory to store WAL segments and set up an exit
+	 * callback to remove it upon completion.
+	 */
+	setup_tmpseg_dir(waldir);
+	atexit(cleanup_tmpseg_dir_atexit);
 }
 
 /*
@@ -370,15 +388,18 @@ read_archive_file(XLogDumpPrivate *privateInfo, Size count)
 }
 
 /*
- * Returns the archived WAL entry from the hash table if it exists. Otherwise,
+ * Returns the archived WAL entry from the hash table if it exists.  Otherwise,
  * it invokes the routine to read the archived file, which then populates the
- * entry in the hash table.
+ * entry in the hash table.  If the archive streamer happens to be reading a
+ * WAL from archive file that is not currently needed, that WAL data is written
+ * to a temporary file.
  */
 static ArchivedWALEntry *
 get_archive_wal_entry(XLogSegNo segno, XLogDumpPrivate *privateInfo)
 {
 	ArchivedWALEntry *entry = NULL;
 	char		fname[MAXFNAMELEN];
+	FILE	   *write_fp = NULL;
 
 	/* Search hash table */
 	entry = ArchivedWAL_lookup(ArchivedWAL_HTAB, segno);
@@ -392,8 +413,11 @@ get_archive_wal_entry(XLogSegNo segno, XLogDumpPrivate *privateInfo)
 		entry = privateInfo->cur_wal;
 
 		/* Fetch more data */
-		if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
-			break;			/* archive file ended */
+		if (entry == NULL || entry->buf.len == 0)
+		{
+			if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+				break;			/* archive file ended */
+		}
 
 		/*
 		 * Either, here for the first time, or the archived streamer is
@@ -419,20 +443,31 @@ get_archive_wal_entry(XLogSegNo segno, XLogDumpPrivate *privateInfo)
 		}
 
 		/*
-		 * XXX: If the segment being read not the requested one, the data must
-		 * be buffered, as we currently lack the mechanism to write it to a
-		 * temporary file. This is a known limitation that will be fixed in the
-		 * next patch, as the buffer could grow up to the full WAL segment
-		 * size.
+		 * Archive streamer is currently reading a file that isn't the one
+		 * asked for, but it's required in the future. It should be written to
+		 * a temporary location for retrieval when needed.
 		 */
-		if (segno > entry->segno)
-			continue;
 
-		/* WAL segments must be archived in order */
-		pg_log_error("WAL files are not archived in sequential order");
-		pg_log_error_detail("Expecting segment number " UINT64_FORMAT " but found " UINT64_FORMAT ".",
-							segno, entry->segno);
-		exit(1);
+		/* Create a temporary file if one does not already exist */
+		if (!entry->tmpseg_exists)
+		{
+			write_fp = prepare_tmp_write(entry->segno);
+			entry->tmpseg_exists = true;
+		}
+
+		/* Flush data from the buffer to the file */
+		perform_tmp_write(entry->segno, &entry->buf, write_fp);
+		resetStringInfo(&entry->buf);
+
+		/*
+		 * The change in the current segment entry indicates that the reading
+		 * of this file has ended.
+		 */
+		if (entry != privateInfo->cur_wal && write_fp != NULL)
+		{
+			fclose(write_fp);
+			write_fp = NULL;
+		}
 	}
 
 	/* Requested WAL segment not found */
@@ -441,7 +476,166 @@ get_archive_wal_entry(XLogSegNo segno, XLogDumpPrivate *privateInfo)
 }
 
 /*
- * Create an astreamer that can read WAL from a tar file.
+ * Set up a temporary directory to temporarily store WAL segments.
+ */
+static void
+setup_tmpseg_dir(const char *waldir)
+{
+	char	   *template;
+
+	/*
+	 * Use the directory specified by the TMPDIR environment variable. If it's
+	 * not set, use the provided WAL directory to extract WAL file
+	 * temporarily.
+	 */
+	template = psprintf("%s/waldump_tmp-XXXXXX",
+						getenv("TMPDIR") ? getenv("TMPDIR") : waldir);
+	TmpWalSegDir = mkdtemp(template);
+
+	if (TmpWalSegDir == NULL)
+		pg_fatal("could not create directory \"%s\": %m", template);
+
+	canonicalize_path(TmpWalSegDir);
+
+	pg_log_debug("created directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Removes the temporarily store WAL segments, if any, at exiting.
+ */
+static void
+cleanup_tmpseg_dir_atexit(void)
+{
+	ArchivedWAL_iterator it;
+	ArchivedWALEntry *entry;
+
+	/* Remove temporary segments */
+	ArchivedWAL_start_iterate(ArchivedWAL_HTAB, &it);
+	while ((entry = ArchivedWAL_iterate(ArchivedWAL_HTAB, &it)) != NULL)
+	{
+		if (entry->tmpseg_exists)
+		{
+			remove_tmp_walseg(entry->segno, false);
+			entry->tmpseg_exists = false;
+		}
+	}
+
+	/* Remove temporary directory */
+	if (rmdir(TmpWalSegDir) == 0)
+		pg_log_debug("removed directory \"%s\"", TmpWalSegDir);
+}
+
+/*
+ * Generate the temporary WAL file path.
+ *
+ * Note that the caller is responsible to pfree it.
+ */
+char *
+get_tmp_walseg_path(XLogSegNo segno)
+{
+	char	   *fpath = (char *) palloc(MAXPGPATH);
+
+	Assert(TmpWalSegDir);
+
+	snprintf(fpath, MAXPGPATH, "%s/%08X%08X",
+			 TmpWalSegDir,
+			 (uint32) (segno / XLogSegmentsPerXLogId(WalSegSz)),
+			 (uint32) (segno % XLogSegmentsPerXLogId(WalSegSz)));
+
+	return fpath;
+}
+
+/*
+ * Routine to check whether a temporary file exists for the corresponding WAL
+ * segment number.
+ */
+bool
+tmp_walseg_exists(XLogSegNo segno)
+{
+	ArchivedWALEntry *entry;
+
+	entry = ArchivedWAL_lookup(ArchivedWAL_HTAB, segno);
+
+	if (entry == NULL)
+		return false;
+
+	return entry->tmpseg_exists;
+}
+
+/*
+ * Create an empty placeholder file and return its handle.
+ */
+static FILE *
+prepare_tmp_write(XLogSegNo segno)
+{
+	FILE	   *file;
+	char	   *fpath;
+
+	fpath = get_tmp_walseg_path(segno);
+
+	/* Create an empty placeholder */
+	file = fopen(fpath, PG_BINARY_W);
+	if (file == NULL)
+		pg_fatal("could not create file \"%s\": %m", fpath);
+
+#ifndef WIN32
+	if (chmod(fpath, pg_file_create_mode))
+		pg_fatal("could not set permissions on file \"%s\": %m",
+				 fpath);
+#endif
+
+	pg_log_debug("temporarily exporting file \"%s\"", fpath);
+	pfree(fpath);
+
+	return file;
+}
+
+/*
+ * Write buffer data to the given file handle.
+ */
+static void
+perform_tmp_write(XLogSegNo segno, StringInfo buf, FILE *file)
+{
+	Assert(file);
+
+	errno = 0;
+	if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1)
+	{
+		/*
+		 * If write didn't set errno, assume problem is no disk space
+		 */
+		if (errno == 0)
+			errno = ENOSPC;
+		pg_fatal("could not write to file \"%s\": %m",
+				 get_tmp_walseg_path(segno));
+	}
+}
+
+/*
+ * Remove temporary file
+ */
+void
+remove_tmp_walseg(XLogSegNo segno, bool update_entry)
+{
+	char	   *fpath = get_tmp_walseg_path(segno);
+
+	if (unlink(fpath) == 0)
+		pg_log_debug("removed file \"%s\"", fpath);
+	pfree(fpath);
+
+	/* Update entry if requested */
+	if (update_entry)
+	{
+		ArchivedWALEntry *entry;
+
+		entry = ArchivedWAL_lookup(ArchivedWAL_HTAB, segno);
+		Assert(entry != NULL);
+		entry->tmpseg_exists = false;
+	}
+}
+
+/*
+ * Create an astreamer that can read WAL from tar file.
  */
 static astreamer *
 astreamer_waldump_new(XLogDumpPrivate *privateInfo)
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index e2c96c3f4ca..96d17e99690 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -474,12 +474,51 @@ TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen,
 {
 	XLogDumpPrivate *private = state->private_data;
 	int			count = required_read_len(private, targetPagePtr, reqLen);
+	XLogSegNo	nextSegNo;
 
 	/* Bail out if the count to be read is not valid */
 	if (count < 0)
 		return -1;
 
-	/* Read the WAL page from the archive streamer */
+	/*
+	 * If the target page is in a different segment, first check for the WAL
+	 * segment's physical existence in the temporary directory.
+	 */
+	nextSegNo = state->seg.ws_segno;
+	if (!XLByteInSeg(targetPagePtr, nextSegNo, WalSegSz))
+	{
+		if (state->seg.ws_file >= 0)
+		{
+			close(state->seg.ws_file);
+			state->seg.ws_file = -1;
+
+			/* Remove this file, as it is no longer needed. */
+			remove_tmp_walseg(nextSegNo, true);
+		}
+
+		XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz);
+		state->seg.ws_tli = private->timeline;
+		state->seg.ws_segno = nextSegNo;
+
+		/*
+		 * If the next segment exists, open it and continue reading from there
+		 */
+		if (tmp_walseg_exists(nextSegNo))
+		{
+			char	   *fpath;
+
+			fpath = get_tmp_walseg_path(nextSegNo);
+			state->seg.ws_file = open(fpath, O_RDONLY | PG_BINARY, 0);
+			pfree(fpath);
+		}
+	}
+
+	/* Continue reading from the open WAL segment, if any */
+	if (state->seg.ws_file >= 0)
+		return WALDumpReadPage(state, targetPagePtr, count, targetPtr,
+							   readBuff);
+
+	/* Otherwise, read the WAL page from the archive streamer */
 	return read_archive_wal_page(private, targetPagePtr, count, readBuff);
 }
 
diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h
index ec7a33d40e0..03e02625ba1 100644
--- a/src/bin/pg_waldump/pg_waldump.h
+++ b/src/bin/pg_waldump/pg_waldump.h
@@ -58,4 +58,8 @@ extern int	read_archive_wal_page(XLogDumpPrivate *privateInfo,
 								  XLogRecPtr targetPagePtr,
 								  Size count, char *readBuff);
 
+extern char *get_tmp_walseg_path(XLogSegNo segno);
+extern bool tmp_walseg_exists(XLogSegNo segno);
+extern void remove_tmp_walseg(XLogSegNo segno, bool update_entry);
+
 #endif							/* end of PG_WALDUMP_H */
diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index 13567fbdba1..68b0cdd29e5 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -7,6 +7,7 @@ use Cwd;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
+use List::Util qw(shuffle);
 
 my $tar = $ENV{TAR};
 
@@ -272,7 +273,7 @@ sub generate_archive
 	}
 	closedir $dh;
 
-	@files = sort @files;
+	@files = shuffle @files;
 
 	# move into the WAL directory before archiving files
 	my $cwd = getcwd;
-- 
2.47.1



  [application/octet-stream] v10-0006-pg_verifybackup-Delay-default-WAL-directory-prep.patch (1.7K, 7-v10-0006-pg_verifybackup-Delay-default-WAL-directory-prep.patch)
  download | inline diff:
From 4f62c2263fb567811dd11b275d883ea1e0b7a4db Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Wed, 16 Jul 2025 14:47:43 +0530
Subject: [PATCH v10 6/8] pg_verifybackup: Delay default WAL directory
 preparation.

We are not sure whether to parse WAL from a directory or an archive
until the backup format is known. Therefore, we delay preparing the
default WAL directory until the point of parsing. This delay is
harmless, as the WAL directory is not used elsewhere.
---
 src/bin/pg_verifybackup/pg_verifybackup.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 456e0375e13..c6be75eba74 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -285,10 +285,6 @@ main(int argc, char **argv)
 		manifest_path = psprintf("%s/backup_manifest",
 								 context.backup_directory);
 
-	/* By default, look for the WAL in the backup directory, too. */
-	if (wal_directory == NULL)
-		wal_directory = psprintf("%s/pg_wal", context.backup_directory);
-
 	/*
 	 * Try to read the manifest. We treat any errors encountered while parsing
 	 * the manifest as fatal; there doesn't seem to be much point in trying to
@@ -368,6 +364,10 @@ main(int argc, char **argv)
 	if (context.format == 'p' && !context.skip_checksums)
 		verify_backup_checksums(&context);
 
+	/* By default, look for the WAL in the backup directory, too. */
+	if (wal_directory == NULL)
+		wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+
 	/*
 	 * Try to parse the required ranges of WAL records, unless we were told
 	 * not to do so.
-- 
2.47.1



  [application/octet-stream] v10-0007-pg_verifybackup-Rename-the-wal-directory-switch-.patch (5.9K, 8-v10-0007-pg_verifybackup-Rename-the-wal-directory-switch-.patch)
  download | inline diff:
From d547aeb99303cc83cb1f8ad233804eb2e603d72e Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:32:14 +0530
Subject: [PATCH v10 7/8] pg_verifybackup: Rename the wal-directory switch to
 wal-path

With previous patches to pg_waldump can now decode WAL directly from
tar files.  This means you'll be able to specify a tar archive path
instead of a traditional WAL directory.

To keep things consistent and more versatile, we should also
generalize the input switch for pg_verifybackup. It should accept
either a directory or a tar file path that contains WALs. This change
will also aligning it with the existing manifest-path switch naming.

== NOTE ==
  The corresponding PO files require updating due to this change.
---
 doc/src/sgml/ref/pg_verifybackup.sgml     |  2 +-
 src/bin/pg_verifybackup/pg_verifybackup.c | 22 +++++++++++-----------
 src/bin/pg_verifybackup/t/007_wal.pl      |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index 61c12975e4a..e9b8bfd51b1 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -261,7 +261,7 @@ PostgreSQL documentation
 
      <varlistentry>
       <term><option>-w <replaceable class="parameter">path</replaceable></option></term>
-      <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term>
+      <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term>
       <listitem>
        <para>
         Try to parse WAL files stored in the specified directory, rather than
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index c6be75eba74..97178d585c3 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -93,7 +93,7 @@ static void verify_file_checksum(verifier_context *context,
 								 uint8 *buffer);
 static void parse_required_wal(verifier_context *context,
 							   char *pg_waldump_path,
-							   char *wal_directory);
+							   char *wal_path);
 static astreamer *create_archive_verifier(verifier_context *context,
 										  char *archive_name,
 										  Oid tblspc_oid,
@@ -126,7 +126,7 @@ main(int argc, char **argv)
 		{"progress", no_argument, NULL, 'P'},
 		{"quiet", no_argument, NULL, 'q'},
 		{"skip-checksums", no_argument, NULL, 's'},
-		{"wal-directory", required_argument, NULL, 'w'},
+		{"wal-path", required_argument, NULL, 'w'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -135,7 +135,7 @@ main(int argc, char **argv)
 	char	   *manifest_path = NULL;
 	bool		no_parse_wal = false;
 	bool		quiet = false;
-	char	   *wal_directory = NULL;
+	char	   *wal_path = NULL;
 	char	   *pg_waldump_path = NULL;
 	DIR		   *dir;
 
@@ -221,8 +221,8 @@ main(int argc, char **argv)
 				context.skip_checksums = true;
 				break;
 			case 'w':
-				wal_directory = pstrdup(optarg);
-				canonicalize_path(wal_directory);
+				wal_path = pstrdup(optarg);
+				canonicalize_path(wal_path);
 				break;
 			default:
 				/* getopt_long already emitted a complaint */
@@ -365,15 +365,15 @@ main(int argc, char **argv)
 		verify_backup_checksums(&context);
 
 	/* By default, look for the WAL in the backup directory, too. */
-	if (wal_directory == NULL)
-		wal_directory = psprintf("%s/pg_wal", context.backup_directory);
+	if (wal_path == NULL)
+		wal_path = psprintf("%s/pg_wal", context.backup_directory);
 
 	/*
 	 * Try to parse the required ranges of WAL records, unless we were told
 	 * not to do so.
 	 */
 	if (!no_parse_wal)
-		parse_required_wal(&context, pg_waldump_path, wal_directory);
+		parse_required_wal(&context, pg_waldump_path, wal_path);
 
 	/*
 	 * If everything looks OK, tell the user this, unless we were asked to
@@ -1198,7 +1198,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m,
  */
 static void
 parse_required_wal(verifier_context *context, char *pg_waldump_path,
-				   char *wal_directory)
+				   char *wal_path)
 {
 	manifest_data *manifest = context->manifest;
 	manifest_wal_range *this_wal_range = manifest->first_wal_range;
@@ -1208,7 +1208,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path,
 		char	   *pg_waldump_cmd;
 
 		pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n",
-								  pg_waldump_path, wal_directory, this_wal_range->tli,
+								  pg_waldump_path, wal_path, this_wal_range->tli,
 								  LSN_FORMAT_ARGS(this_wal_range->start_lsn),
 								  LSN_FORMAT_ARGS(this_wal_range->end_lsn));
 		fflush(NULL);
@@ -1376,7 +1376,7 @@ usage(void)
 	printf(_("  -P, --progress              show progress information\n"));
 	printf(_("  -q, --quiet                 do not print any output, except for errors\n"));
 	printf(_("  -s, --skip-checksums        skip checksum verification\n"));
-	printf(_("  -w, --wal-directory=PATH    use specified path for WAL files\n"));
+	printf(_("  -w, --wal-path=PATH         use specified path for WAL files\n"));
 	printf(_("  -V, --version               output version information, then exit\n"));
 	printf(_("  -?, --help                  show this help, then exit\n"));
 	printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl
index 79087a1f6be..8ad2234453d 100644
--- a/src/bin/pg_verifybackup/t/007_wal.pl
+++ b/src/bin/pg_verifybackup/t/007_wal.pl
@@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ],
 command_ok(
 	[
 		'pg_verifybackup',
-		'--wal-directory' => $relocated_pg_wal,
+		'--wal-path' => $relocated_pg_wal,
 		$backup_path
 	],
-	'--wal-directory can be used to specify WAL directory');
+	'--wal-path can be used to specify WAL directory');
 
 # Move directory back to original location.
 rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!";
-- 
2.47.1



  [application/octet-stream] v10-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch (9.9K, 9-v10-0008-pg_verifybackup-enabled-WAL-parsing-for-tar-form.patch)
  download | inline diff:
From 31791ed0320af216c80a8a72017670f0b0e06752 Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Tue, 25 Nov 2025 17:34:26 +0530
Subject: [PATCH v10 8/8] pg_verifybackup: enabled WAL parsing for tar-format
 backup

Now that pg_waldump supports decoding from tar archives, we should
leverage this functionality to remove the previous restriction on WAL
parsing for tar-backed formats.
---
 doc/src/sgml/ref/pg_verifybackup.sgml         |  5 +-
 src/bin/pg_verifybackup/pg_verifybackup.c     | 66 +++++++++++++------
 src/bin/pg_verifybackup/t/002_algorithm.pl    |  4 --
 src/bin/pg_verifybackup/t/003_corruption.pl   |  4 +-
 src/bin/pg_verifybackup/t/008_untar.pl        |  5 +-
 src/bin/pg_verifybackup/t/010_client_untar.pl |  5 +-
 6 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml
index e9b8bfd51b1..16b50b5a4df 100644
--- a/doc/src/sgml/ref/pg_verifybackup.sgml
+++ b/doc/src/sgml/ref/pg_verifybackup.sgml
@@ -36,10 +36,7 @@ PostgreSQL documentation
    <literal>backup_manifest</literal> generated by the server at the time
    of the backup. The backup may be stored either in the "plain" or the "tar"
    format; this includes tar-format backups compressed with any algorithm
-   supported by <application>pg_basebackup</application>. However, at present,
-   <literal>WAL</literal> verification is supported only for plain-format
-   backups. Therefore, if the backup is stored in tar-format, the
-   <literal>-n, --no-parse-wal</literal> option should be used.
+   supported by <application>pg_basebackup</application>.
   </para>
 
   <para>
diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c
index 97178d585c3..baa5f9dfa5e 100644
--- a/src/bin/pg_verifybackup/pg_verifybackup.c
+++ b/src/bin/pg_verifybackup/pg_verifybackup.c
@@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context,
 											  const char *fmt,...)
 			pg_attribute_printf(2, 3);
 
-static void verify_tar_backup(verifier_context *context, DIR *dir);
+static void verify_tar_backup(verifier_context *context, DIR *dir,
+							  char **base_archive_path,
+							  char **wal_archive_path);
 static void verify_plain_backup_directory(verifier_context *context,
 										  char *relpath, char *fullpath,
 										  DIR *dir);
@@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath,
 static void verify_control_file(const char *controlpath,
 								uint64 manifest_system_identifier);
 static void precheck_tar_backup_file(verifier_context *context, char *relpath,
-									 char *fullpath, SimplePtrList *tarfiles);
+									 char *fullpath, SimplePtrList *tarfiles,
+									 char **base_archive_path,
+									 char **wal_archive_path);
 static void verify_tar_file(verifier_context *context, char *relpath,
 							char *fullpath, astreamer *streamer);
 static void report_extra_backup_files(verifier_context *context);
@@ -136,6 +140,8 @@ main(int argc, char **argv)
 	bool		no_parse_wal = false;
 	bool		quiet = false;
 	char	   *wal_path = NULL;
+	char	   *base_archive_path = NULL;
+	char	   *wal_archive_path = NULL;
 	char	   *pg_waldump_path = NULL;
 	DIR		   *dir;
 
@@ -327,17 +333,6 @@ main(int argc, char **argv)
 		pfree(path);
 	}
 
-	/*
-	 * XXX: In the future, we should consider enhancing pg_waldump to read WAL
-	 * files from an archive.
-	 */
-	if (!no_parse_wal && context.format == 't')
-	{
-		pg_log_error("pg_waldump cannot read tar files");
-		pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup.");
-		exit(1);
-	}
-
 	/*
 	 * Perform the appropriate type of verification appropriate based on the
 	 * backup format. This will close 'dir'.
@@ -346,7 +341,7 @@ main(int argc, char **argv)
 		verify_plain_backup_directory(&context, NULL, context.backup_directory,
 									  dir);
 	else
-		verify_tar_backup(&context, dir);
+		verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path);
 
 	/*
 	 * The "matched" flag should now be set on every entry in the hash table.
@@ -364,9 +359,28 @@ main(int argc, char **argv)
 	if (context.format == 'p' && !context.skip_checksums)
 		verify_backup_checksums(&context);
 
-	/* By default, look for the WAL in the backup directory, too. */
+	/*
+	 * By default, WAL files are expected to be found in the backup directory
+	 * for plain-format backups. In the case of tar-format backups, if a
+	 * separate WAL archive is not found, the WAL files are most likely
+	 * included within the main data directory archive.
+	 */
 	if (wal_path == NULL)
-		wal_path = psprintf("%s/pg_wal", context.backup_directory);
+	{
+		if (context.format == 'p')
+			wal_path = psprintf("%s/pg_wal", context.backup_directory);
+		else if (wal_archive_path)
+			wal_path = wal_archive_path;
+		else if (base_archive_path)
+			wal_path = base_archive_path;
+		else
+		{
+			pg_log_error("WAL archive not found");
+			pg_log_error_hint("Specify the correct path using the option -w/--wal-path.  "
+							  "Or you must use -n/--no-parse-wal when verifying a tar-format backup.");
+			exit(1);
+		}
+	}
 
 	/*
 	 * Try to parse the required ranges of WAL records, unless we were told
@@ -787,7 +801,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier)
  * close when we're done with it.
  */
 static void
-verify_tar_backup(verifier_context *context, DIR *dir)
+verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path,
+				  char **wal_archive_path)
 {
 	struct dirent *dirent;
 	SimplePtrList tarfiles = {NULL, NULL};
@@ -816,7 +831,8 @@ verify_tar_backup(verifier_context *context, DIR *dir)
 			char	   *fullpath;
 
 			fullpath = psprintf("%s/%s", context->backup_directory, filename);
-			precheck_tar_backup_file(context, filename, fullpath, &tarfiles);
+			precheck_tar_backup_file(context, filename, fullpath, &tarfiles,
+									 base_archive_path, wal_archive_path);
 			pfree(fullpath);
 		}
 	}
@@ -875,11 +891,13 @@ verify_tar_backup(verifier_context *context, DIR *dir)
  *
  * The arguments to this function are mostly the same as the
  * verify_plain_backup_file. The additional argument outputs a list of valid
- * tar files.
+ * tar files, along with the full paths to the main archive and the WAL
+ * directory archive.
  */
 static void
 precheck_tar_backup_file(verifier_context *context, char *relpath,
-						 char *fullpath, SimplePtrList *tarfiles)
+						 char *fullpath, SimplePtrList *tarfiles,
+						 char **base_archive_path, char **wal_archive_path)
 {
 	struct stat sb;
 	Oid			tblspc_oid = InvalidOid;
@@ -918,9 +936,17 @@ precheck_tar_backup_file(verifier_context *context, char *relpath,
 	 * extension such as .gz, .lz4, or .zst.
 	 */
 	if (strncmp("base", relpath, 4) == 0)
+	{
 		suffix = relpath + 4;
+
+		*base_archive_path = pstrdup(fullpath);
+	}
 	else if (strncmp("pg_wal", relpath, 6) == 0)
+	{
 		suffix = relpath + 6;
+
+		*wal_archive_path = pstrdup(fullpath);
+	}
 	else
 	{
 		/* Expected a <tablespaceoid>.tar file here. */
diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl
index 0556191ec9d..edc515d5904 100644
--- a/src/bin/pg_verifybackup/t/002_algorithm.pl
+++ b/src/bin/pg_verifybackup/t/002_algorithm.pl
@@ -30,10 +30,6 @@ sub test_checksums
 	{
 		# Add switch to get a tar-format backup
 		push @backup, ('--format' => 'tar');
-
-		# Add switch to skip WAL verification, which is not yet supported for
-		# tar-format backups
-		push @verify, ('--no-parse-wal');
 	}
 
 	# A backup with a bogus algorithm should fail.
diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl
index b1d65b8aa0f..882d75d9dc2 100644
--- a/src/bin/pg_verifybackup/t/003_corruption.pl
+++ b/src/bin/pg_verifybackup/t/003_corruption.pl
@@ -193,10 +193,8 @@ for my $scenario (@scenario)
 			command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]);
 			chdir($cwd) || die "chdir: $!";
 
-			# Now check that the backup no longer verifies. We must use -n
-			# here, because pg_waldump can't yet read WAL from a tarfile.
 			command_fails_like(
-				[ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ],
+				[ 'pg_verifybackup', $tar_backup_path ],
 				$scenario->{'fails_like'},
 				"corrupt backup fails verification: $name");
 
diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl
index ae67ae85a31..161c08c190d 100644
--- a/src/bin/pg_verifybackup/t/008_untar.pl
+++ b/src/bin/pg_verifybackup/t/008_untar.pl
@@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql(
 		SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1'));
 
 my $backup_path = $primary->backup_dir . '/server-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
 
 my @test_configuration = (
 	{
@@ -123,14 +122,12 @@ for my $tc (@test_configuration)
 		# Verify tar backup.
 		$primary->command_ok(
 			[
-				'pg_verifybackup', '--no-parse-wal',
-				'--exit-on-error', $backup_path,
+				'pg_verifybackup', '--exit-on-error', $backup_path,
 			],
 			"verify backup, compression $method");
 
 		# Cleanup.
 		rmtree($backup_path);
-		rmtree($extract_path);
 	}
 }
 
diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl
index 1ac7b5db75a..9670fbe4fda 100644
--- a/src/bin/pg_verifybackup/t/010_client_untar.pl
+++ b/src/bin/pg_verifybackup/t/010_client_untar.pl
@@ -32,7 +32,6 @@ print $jf $junk_data;
 close $jf;
 
 my $backup_path = $primary->backup_dir . '/client-backup';
-my $extract_path = $primary->backup_dir . '/extracted-backup';
 
 my @test_configuration = (
 	{
@@ -137,13 +136,11 @@ for my $tc (@test_configuration)
 		# Verify tar backup.
 		$primary->command_ok(
 			[
-				'pg_verifybackup', '--no-parse-wal',
-				'--exit-on-error', $backup_path,
+				'pg_verifybackup', '--exit-on-error', $backup_path,
 			],
 			"verify backup, compression $method");
 
 		# Cleanup.
-		rmtree($extract_path);
 		rmtree($backup_path);
 	}
 }
-- 
2.47.1



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-16 20:38  Robert Haas <[email protected]>
  parent: Amul Sul <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Robert Haas @ 2026-01-16 20:38 UTC (permalink / raw)
  To: Amul Sul <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Jan 2, 2026 at 7:30 AM Amul Sul <[email protected]> wrote:
> Attached is the rebased version, includes some code comment improvements.

Regarding 0001, I don't think that passing the WAL segment size around
in a global variable is a good practice. It makes it hard to tell
whether the variable is guaranteed to have been set properly at all
the places where it is used. Of course, this patch set does not
introduce that problem, but I think it could avoid making it worse by
adding WalSegSz to XLogDumpPrivate instead of exporting the global
variable everywhere. That struct seems to be available everyplace
where we need the value. Actually, though, I'm inclined to remove the
file-level global as well. Patch for that attached. I'm not against
global variables in general, but I'm against global variables that are
just there to avoid passing the right stuff around via function
argument lists.

0002 claims to be just a refactoring but if
(unlikely(private->endptr_reached)) return -1 is net new code, so I
object. I realize this contradicts the review comment from Chao Li,
but the principal that refactoring shouldn't change the logic is more
important than whatever value we might get out of adding this sanity
check here -- and I bet if we look into it we'll find that there are
lots of other places in PostgreSQL that would also need to be changed
if we wanted such a sanity check everywhere that we have code like
this. I'd also like to point out that the commit message isn't really
adequate to understand why this helps achieve the overall goal of the
patch set.

0003 looks functionally correct but notice that you've stolen some but
not all of the tests from the "# various ways of specifying WAL range"
section without copying the comment or explaining in the commit
message why you've moved some tests but not others.

Regarding 0004:

Instead of making ArchivedWAL_HTAB a global variable, can we just
store a pointer to it in XLogDumpPrivate? (I would change the name to
something else, something all lower case.) Or maybe the pointer should
go someplace else, but I'm skeptical of a global variable here for the
same reasons as 0001.

I think ArchivedWALFile would be a better name than ArchivedWALEntry,
and cur_wal_file or cur_file would be a better name than cur_wal.

I'm extremely happy with the way that the hash table looks and the new
logic in init_archive_reader() to make sure that we don't need to
reread the beginning of the WAL but can reuse the data already read.
That seems like a huge improvement to me.

+               if (recptr >= endPtr)
+               {
+                       /* Discard the buffered data */
+                       resetStringInfo(&entry->buf);
+                       len = 0;
+
+                       /*
+                        * Push back the partial page data for the
current page to the
+                        * buffer, ensuring it remains full page
available for re-reading
+                        * if requested.
+                        */
+                       if (p > readBuff)
+                       {
+                               Assert((count - nbytes) > 0);
+                               appendBinaryStringInfo(&entry->buf,
readBuff, count - nbytes);
+                       }
+               }

It feels pretty inelegant to me to pull data back into &entry->buf
from readBuff here. I think what you should do is just do this test
once, before entering the while (nbytes > 0) loop. Then you'll always
have p == readBuff, so the nested if statement is no longer required.
Also, the way you have it, if this triggers for the first loop
iteration, it will trigger for every subsquent loop iteration, which
seems wasteful.

+               if (len > 0 && recptr > startPtr)
+               {
+                       int                     skipBytes = 0;
+
+                       /*
+                        * The required offset is not at the start of
the buffer, so skip
+                        * bytes until reaching the desired offset of
the target page.
+                        */
+                       skipBytes = recptr - startPtr;
+
+                       buf += skipBytes;
+                       len -= skipBytes;
+               }

This logic seems pretty confusing. It looks as though len can go
negative, because what if skipBytes > len? I'm not sure that's really
possible here, but there's no comments explaining why it can't and no
assertion verifying that it doesn't. I think you should probably try
to unify this with the following if statement that is gated by (len >
0). In other places where we have code like this, what I've usually
found useful is to compute the number of bytes to copy as Min(bytes
available from source, bytes that will fit into the target), and I
think that technique might be useful here. For example, in
BlockRefTableRead, length is always the number of bytes that the
caller wants which we haven't yet read, and we compute bytes_to_copy =
Min(length, buffer->used - buffer->cursor). We then memcpy(data,
&buffer->data[buffer->cursor], bytes_to_copy) and then adjust data and
length. A similar bit of code not written by me can be found in
WALReadFromBuffers, which maintains nbytes as the remaining number of
bytes to be copied and then computes the size of the next copy as
npagebytes = Min(nbytes, XLOG_BLCKSZ - offset). I think your if (len >
0 && recptr > startPtr) block is trying to do something similar to
what the Min() is doing in those examples, but I find it hard to
understand whether it is doing it correctly. I think you need some
combination of clearer variable names, more comments, and/or rewriting
this to be easier to read and understand without help.

+               entry = privateInfo->cur_wal;
+
+               /* Fetch more data */
+               if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
+                       break;                  /* archive file ended */

Why do we set entry to privateInfo->cur_wal and then call
read_archive_file() afterwards? Doesn't read_archive_file() have the
potential to change cur_wal? If so, don't we want to look at the entry
that is current after we've called read_archive_file(), rather than
before? Also, to harp on the naming a bit more, stuff like entry =
privateInfo->cur_wal kind of shows that you haven't made the naming
real consistent. This could be more clear if it said something like
wal_file = privateInfo->cur_wal_file.

+               /* Found the required entry */
+               if (entry->segno == segno)
+                       return entry;
+
+               /*
+                * Ignore if the timeline is different or the current
segment is not
+                * the desired one.
+                */
+               if (privateInfo->timeline != entry->timeline ||
+                       privateInfo->startSegNo > entry->segno ||
+                       privateInfo->endSegNo < entry->segno)
+               {
+                       privateInfo->cur_wal = NULL;
+                       continue;
+               }
+

I don't really understand what's happening here. One thing that's odd
is that the first of these two if statements considers the timeline
and the second does not. It seems unlikely that this is correct. I
suspect that the hash table needs to be keyed by (TLI, segno) rather
than just by TLI, and that get_archive_wal_entry needs to take both a
TLI and a segment number as an argument.  Otherwise, what could
possibly prevent the first if statement from returning the correct
segment from the wrong timeline?

(Likewise, I think stuff in 0005 like prepare_tmp_write() should work
on a TLI and a segno, not just a segno. There probably should be very
few places in the patch where you pass only a segno and no TLI.
get_tmp_walseg_path() is a pretty clear sign of the problem: If you
were including the TLI in the file name, you would just use
XLogFileName() here, but because you left out the TLI, you end up
having to invent your own file naming convention. That seems pretty
bad: wouldn't we like the temporary file names to have the same names
as the files inside the archive? What happens if the same WAL file
exists in the archive with two different TLIs?)

Another thing that's odd is: why is this function resetting
privateInfo->cur_wal to NULL? I feel like it should be the job of
astreamer_waldump_content() to manage the value of
privateInfo->cur_wal, and this function doesn't seem to have any
business touching it. Even if it thinks that we want to ignore that
WAL file for purposes of *this* call to get_archive_wal_entry(),
surely it has no right to decide that the *next* call to this function
should ignore that entry as well.

Honestly, I don't really understand why we need to poke into
privateInfo at all here. I feel like the shape of this loop should be:
while (1) { if (read_archive_file(...) == 0) break; if (we have the
correct entry) return entry; } }. It seems to me that it ought to be
the job of the archive streamer to handle everything else, including
buffering more data and spilling to files. This should just iterate,
repeatedly calling read_archive_file(), until we either get to the
file we want or run out of archive.

+               /*
+                * XXX: If the segment being read not the requested
one, the data must
+                * be buffered, as we currently lack the mechanism to
write it to a
+                * temporary file. This is a known limitation that
will be fixed in the
+                * next patch, as the buffer could grow up to the full
WAL segment
+                * size.
+                */
+               if (segno > entry->segno)
+                       continue;

I couldn't figure out what was going on here for a while. Then I
realized that this makes some sense: if (segno > entry->segno), that
means that the segment number that we've found in the archive is older
than the requested segment number. Because we don't back up, that
means we don't need the data. I think the idea is that we would fall
through here and buffer the data if this if-test fails, but patch 0005
actually removes these two lines of code, so I think something here is
kind of messed up. I think the result will be that with both patches
applied, we'll buffer data we don't actually need for anything.

Generally, I think that some details here should be revisited:

+typedef struct ArchivedWALEntry
+{
+       uint32          status;                 /* hash status */
+       XLogSegNo       segno;                  /* hash key: WAL
segment number */
+       TimeLineID      timeline;               /* timeline of this wal file */
+
+       StringInfoData buf;
+       bool            tmpseg_exists;  /* spill file exists? */
+
+       int                     total_read;             /* total read
of archived WAL segment */
+} ArchivedWALEntry;

The first few members are great, except that timeline needs to be part
of the hash key as well, so you probably need to make a two-element
struct with a segno and a TLI and make that struct be the thing that
comes right after status. After that, I'm not so sure. Essentially, I
think what's happening here is that segno*file_size+total_read is the
end LSN for buf, and the start LSN value is that value minus buf.len.
And if tmpseg_exists is true, then the buffer might be empty, and you
can read however much of the segment we found from the temp file. I
find that confusing. If we're going to do something like this, I feel
like it would make more sense to store XLogRecPtr starttli instead of
int total_read, so that we explicitly mention the first LSN stored in
the buffer, and you infer the ending LSN from the length of the
buffer.

But, stepping back a bit, why do we have a StringInfoData in every
ArchivedWALEntry? It makes sense to do it that way if we're going to
buffer all of the WAL data in memory. In that case, you need a
separate buffer for every ArchivedWALEntry, and so this is logical.
But if we're going to spill to disk, then there should only ever be
one buffer in use at any given time, so why have a buffer in every
ArchivedWALEntry instead of a single buffer inside of XLogDumpPrivate?
We could choose to treat the hash table as the set of previous
segments for which we have written data to disk, and the current
segment and any associated buffer would be tracked inside the
XLogDumpPrivate or astreamer_waldump, which would mean that only one
copy of them would exist. I'm tempted to say that the direction I'm
sketching here is the right way to go, but I'm not 100% certain that's
true. It could be write to have a buffer per WAL file, as you have
here, but if so, we should know why we're doing that and how it fits
into the design, and I'm not currently seeing a real reason for it.

+       if (strstr(member->pathname, "PaxHeaders."))
+               return false;

There is no way that a filename containing the string "PaxHeaders."
could ever pass the IsXLogFileName test just above. We shouldn't need
this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

  [application/octet-stream] 0001-Remove-file-level-global-WalSegSz.patch.nocfbot (4.6K, 2-0001-Remove-file-level-global-WalSegSz.patch.nocfbot)
  download

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-19 11:13  Amul Sul <[email protected]>
  parent: Robert Haas <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Amul Sul @ 2026-01-19 11:13 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Jan 17, 2026 at 2:08 AM Robert Haas <[email protected]> wrote:
>
> On Fri, Jan 2, 2026 at 7:30 AM Amul Sul <[email protected]> wrote:
> > Attached is the rebased version, includes some code comment improvements.
>

Thanks for the feedback. I am replying to a few comments inline below
regarding elements that are unlikely to change -- explaining the
reasoning why I believe they are correct -- just to ensure we are
aligned. For the remaining points, I generally agree and will either
implement the improvements or provide my reasoning if I don't.

>
> +               if (len > 0 && recptr > startPtr)
> +               {
> +                       int                     skipBytes = 0;
> +
> +                       /*
> +                        * The required offset is not at the start of
> the buffer, so skip
> +                        * bytes until reaching the desired offset of
> the target page.
> +                        */
> +                       skipBytes = recptr - startPtr;
> +
> +                       buf += skipBytes;
> +                       len -= skipBytes;
> +               }
>
> This logic seems pretty confusing. It looks as though len can go
> negative, because what if skipBytes > len? I'm not sure that's really
> possible here, but there's no comments explaining why it can't and no
> assertion verifying that it doesn't. I think you should probably try
> to unify this with the following if statement that is gated by (len >
> 0).

The preceding check if (recptr >= endPtr) ensures that within this
block, recptr is always less than endPtr. Since startPtr is derived as
endPtr - len, recptr naturally falls within the expected range (i.e.,
recptr - startPtr <= len). However, I agree that we can simplify this
by combining it with the subsequent if statement and/or by adding an
assert there.

>
> +               entry = privateInfo->cur_wal;
> +
> +               /* Fetch more data */
> +               if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0)
> +                       break;                  /* archive file ended */
>
> Why do we set entry to privateInfo->cur_wal and then call
> read_archive_file() afterwards? Doesn't read_archive_file() have the
> potential to change cur_wal? If so, don't we want to look at the entry
> that is current after we've called read_archive_file(), rather than
> before? Also, to harp on the naming a bit more, stuff like entry =
> privateInfo->cur_wal kind of shows that you haven't made the naming
> real consistent. This could be more clear if it said something like
> wal_file = privateInfo->cur_wal_file.

I see your point, and you’re right that assigning the entry to
privateInfo->cur_wal here seems useless; however, it becomes necessary
for the logic in patch 0005. I assigned it just before
read_archive_file() because that function can change where the current
WAL ends and a new one starts. I need to track the completed WAL entry
at that moment, as it may contain data that needs to be spilled to a
temporary file. I'll add a comment to clarify this. I also agree with
your feedback on variable naming and will improve that in the next
revision.

>
> +               /* Found the required entry */
> +               if (entry->segno == segno)
> +                       return entry;
> +
> +               /*
> +                * Ignore if the timeline is different or the current
> segment is not
> +                * the desired one.
> +                */
> +               if (privateInfo->timeline != entry->timeline ||
> +                       privateInfo->startSegNo > entry->segno ||
> +                       privateInfo->endSegNo < entry->segno)
> +               {
> +                       privateInfo->cur_wal = NULL;
> +                       continue;
> +               }
> +
>
> I don't really understand what's happening here. One thing that's odd
> is that the first of these two if statements considers the timeline
> and the second does not. It seems unlikely that this is correct. I
> suspect that the hash table needs to be keyed by (TLI, segno) rather
> than just by TLI, and that get_archive_wal_entry needs to take both a
> TLI and a segment number as an argument.  Otherwise, what could
> possibly prevent the first if statement from returning the correct
> segment from the wrong timeline?

Yeah, I need to swap the positions of these checks.

>
> Another thing that's odd is: why is this function resetting
> privateInfo->cur_wal to NULL? I feel like it should be the job of
> astreamer_waldump_content() to manage the value of
> privateInfo->cur_wal, and this function doesn't seem to have any
> business touching it. Even if it thinks that we want to ignore that
> WAL file for purposes of *this* call to get_archive_wal_entry(),
> surely it has no right to decide that the *next* call to this function
> should ignore that entry as well.
>
> Honestly, I don't really understand why we need to poke into
> privateInfo at all here. I feel like the shape of this loop should be:
> while (1) { if (read_archive_file(...) == 0) break; if (we have the
> correct entry) return entry; } }. It seems to me that it ought to be
> the job of the archive streamer to handle everything else, including
> buffering more data and spilling to files. This should just iterate,
> repeatedly calling read_archive_file(), until we either get to the
> file we want or run out of archive.
>

We previously tried having the astreamer function perform all
necessary checks (e.g., TLI and segment number), but this became
problematic in init_archive_reader when we needed to read a single WAL
page just to determine the WAL segment size without any validation. To
bypass those checks, I initially used a READ_ANY_WAL() macro, but it
felt like a hack.

In the current design, the archive streamer is responsible only for
reading and copying data to the buffer. To optimize this and avoid
unnecessary memcpy and  data buffering for WAL files that aren't
needed, we set privateInfo->cur_wal to NULL, signaling the streamer
code to skip those operations.

But, I am thinking that instead of setting privateInfo->cur_wal to
NULL, we could simply discard the buffer data at that place and let
memcpy in astreamer_content copy if it would be that much of an issue.

> +               /*
> +                * XXX: If the segment being read not the requested
> one, the data must
> +                * be buffered, as we currently lack the mechanism to
> write it to a
> +                * temporary file. This is a known limitation that
> will be fixed in the
> +                * next patch, as the buffer could grow up to the full
> WAL segment
> +                * size.
> +                */
> +               if (segno > entry->segno)
> +                       continue;
>
> I couldn't figure out what was going on here for a while. Then I
> realized that this makes some sense: if (segno > entry->segno), that
> means that the segment number that we've found in the archive is older
> than the requested segment number. Because we don't back up, that
> means we don't need the data. I think the idea is that we would fall
> through here and buffer the data if this if-test fails, but patch 0005
> actually removes these two lines of code, so I think something here is
> kind of messed up. I think the result will be that with both patches
> applied, we'll buffer data we don't actually need for anything.
>

Yeah, should discard buffered data too.

> Generally, I think that some details here should be revisited:
>
> +typedef struct ArchivedWALEntry
> +{
> +       uint32          status;                 /* hash status */
> +       XLogSegNo       segno;                  /* hash key: WAL
> segment number */
> +       TimeLineID      timeline;               /* timeline of this wal file */
> +
> +       StringInfoData buf;
> +       bool            tmpseg_exists;  /* spill file exists? */
> +
> +       int                     total_read;             /* total read
> of archived WAL segment */
> +} ArchivedWALEntry;
>
> The first few members are great, except that timeline needs to be part
> of the hash key as well, so you probably need to make a two-element
> struct with a segno and a TLI and make that struct be the thing that
> comes right after status. After that, I'm not so sure. Essentially, I
> think what's happening here is that segno*file_size+total_read is the
> end LSN for buf, and the start LSN value is that value minus buf.len.
> And if tmpseg_exists is true, then the buffer might be empty, and you
> can read however much of the segment we found from the temp file. I
> find that confusing. If we're going to do something like this, I feel
> like it would make more sense to store XLogRecPtr starttli instead of
> int total_read, so that we explicitly mention the first LSN stored in
> the buffer, and you infer the ending LSN from the length of the
> buffer.
>

total_read is not only the total of buffered data; it also includes
data that has been spilled to a temporary file. If the buffer is not
empty, the most recently read data from the archive exists there;
otherwise, if a temporary file exists, that data will be found there.
In short, it tracks the total bytes of WAL that have been read from
the archive. I will improve the code comment to reflect this.

> But, stepping back a bit, why do we have a StringInfoData in every
> ArchivedWALEntry? It makes sense to do it that way if we're going to
> buffer all of the WAL data in memory. In that case, you need a
> separate buffer for every ArchivedWALEntry, and so this is logical.
> But if we're going to spill to disk, then there should only ever be
> one buffer in use at any given time, so why have a buffer in every
> ArchivedWALEntry instead of a single buffer inside of XLogDumpPrivate?
> We could choose to treat the hash table as the set of previous
> segments for which we have written data to disk, and the current
> segment and any associated buffer would be tracked inside the
> XLogDumpPrivate or astreamer_waldump, which would mean that only one
> copy of them would exist. I'm tempted to say that the direction I'm
> sketching here is the right way to go, but I'm not 100% certain that's
> true. It could be write to have a buffer per WAL file, as you have
> here, but if so, we should know why we're doing that and how it fits
> into the design, and I'm not currently seeing a real reason for it.
>

In the earlier version, we had only one buffer where we copied
streamed data, but a problem (previously discussed [1]) arose when the
archive streamer reads data from two WAL files -- i.e., one is ending
and the next is starting. We have to keep track of where the previous
file ends and the next one starts, along with the WAL segment
information that each belongs to.

> +       if (strstr(member->pathname, "PaxHeaders."))
> +               return false;
>
> There is no way that a filename containing the string "PaxHeaders."
> could ever pass the IsXLogFileName test just above. We shouldn't need
> this.
>

This checks the directory name (e.g.,
x_dir/y_dir/PaxHeaders.NNNN/wal_file_name). The name of that metadata
file is exactly the same as the WAL file name, which is why
IsXLogFileName() doesn't help here.

Regards,
Amul

1] http://postgr.es/m/CA+Tgmoardk4VuthHc23vov+AVkhq7eT0mFUs-2ctAnP1uiTaog@mail.gmail.com






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-19 13:30  Robert Haas <[email protected]>
  parent: Amul Sul <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Robert Haas @ 2026-01-19 13:30 UTC (permalink / raw)
  To: Amul Sul <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, Jan 19, 2026 at 6:14 AM Amul Sul <[email protected]> wrote:
> We previously tried having the astreamer function perform all
> necessary checks (e.g., TLI and segment number), but this became
> problematic in init_archive_reader when we needed to read a single WAL
> page just to determine the WAL segment size without any validation. To
> bypass those checks, I initially used a READ_ANY_WAL() macro, but it
> felt like a hack.
>
> In the current design, the archive streamer is responsible only for
> reading and copying data to the buffer. To optimize this and avoid
> unnecessary memcpy and  data buffering for WAL files that aren't
> needed, we set privateInfo->cur_wal to NULL, signaling the streamer
> code to skip those operations.

Hmm. In that case, the code definitely deserves a comment, as that was
not at all clear to me.

> But, I am thinking that instead of setting privateInfo->cur_wal to
> NULL, we could simply discard the buffer data at that place and let
> memcpy in astreamer_content copy if it would be that much of an issue.

I don't think doing unnecessary copying is the right way forward. The
copying itself could be expensive, but I am a little concerned about
the memory utilization of this code. Suppose the user has increased
the WAL segment size to 1GB or even higher. It seems like we could
buffer a whole segment or maybe even more in some scenarios. If we
avoid copying data we don't need, then we also avoid buffering it in
memory.

In terms of the separation of concerns, we could view setting
privateInfo->cur_wal = NULL as a form of signaling, a way for this
code to tell the astreamer that it doesn't need the data buffering.
However, I think it might be better to make the signaling more
explicit. Instead of having the caller directly set the buffer to
NULL, or directly trim data out of the buffer, maybe it should set
some value in privateInfo that tells the astreamer what to do. For
instance, suppose it sets the oldest LSN that it might still care
about in privateInfo, and then the astreamer is free to do skip
copying of any data prior to that LSN, and discard any that it already
has. Especially if properly commented, I think this might be more
clear than what you have now. I am not 100% sure that's the right
idea, though. I just think right now it's a bit murky who does what
and why the division of responsibilities is what it is.

For example, read_archive_wal_page() currently has some responsibility
for discarding data, but is that because it is the right place to
discard data, or is that just because that's where we have an LSN
available that tells us what we can discard? I don't know, but an
explicit signaling mechanism like what I describe in the previous
paragraph can make it possible for those two things to happen in
different places.

> In the earlier version, we had only one buffer where we copied
> streamed data, but a problem (previously discussed [1]) arose when the
> archive streamer reads data from two WAL files -- i.e., one is ending
> and the next is starting. We have to keep track of where the previous
> file ends and the next one starts, along with the WAL segment
> information that each belongs to.

Hmm, interesting. My first thought was that if we were starting a new
WAL file, isn't it time to flush the data from the old one to a spill
file? But maybe it's not, if the WAL reader can reread records, and a
record can span file boundaries. But probably we need some comment so
that the next reader is not confused, and we definitely need to make
sure that the structure members in ArchivedWALEntry are well-chosen
and well-documented.

> > +       if (strstr(member->pathname, "PaxHeaders."))
> > +               return false;
> >
> > There is no way that a filename containing the string "PaxHeaders."
> > could ever pass the IsXLogFileName test just above. We shouldn't need
> > this.
> >
>
> This checks the directory name (e.g.,
> x_dir/y_dir/PaxHeaders.NNNN/wal_file_name). The name of that metadata
> file is exactly the same as the WAL file name, which is why
> IsXLogFileName() doesn't help here.

I think this code should only be considering files in the toplevel
directory, and skipping over any directories it finds. I absolutely
promise I am not going to commit anything that is specifically looking
for PaxHeaders. Nothing we've ever done with tar files up to now has
required that, and I don't think this should, either.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-23 12:27  Amul Sul <[email protected]>
  parent: Robert Haas <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Amul Sul @ 2026-01-23 12:27 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, Jan 19, 2026 at 7:01 PM Robert Haas <[email protected]> wrote:
>
> > But, I am thinking that instead of setting privateInfo->cur_wal to
> > NULL, we could simply discard the buffer data at that place and let
> > memcpy in astreamer_content copy if it would be that much of an issue.
>
> I don't think doing unnecessary copying is the right way forward. The
> copying itself could be expensive, but I am a little concerned about
> the memory utilization of this code. Suppose the user has increased
> the WAL segment size to 1GB or even higher. It seems like we could
> buffer a whole segment or maybe even more in some scenarios. If we
> avoid copying data we don't need, then we also avoid buffering it in
> memory.
>
> In terms of the separation of concerns, we could view setting
> privateInfo->cur_wal = NULL as a form of signaling, a way for this
> code to tell the astreamer that it doesn't need the data buffering.
> However, I think it might be better to make the signaling more
> explicit. Instead of having the caller directly set the buffer to
> NULL, or directly trim data out of the buffer, maybe it should set
> some value in privateInfo that tells the astreamer what to do. For
> instance, suppose it sets the oldest LSN that it might still care
> about in privateInfo, and then the astreamer is free to do skip
> copying of any data prior to that LSN, and discard any that it already
> has. Especially if properly commented, I think this might be more
> clear than what you have now. I am not 100% sure that's the right
> idea, though. I just think right now it's a bit murky who does what
> and why the division of responsibilities is what it is.
>

The current implementation of astreamer_waldump_content() does not
have sufficient information to skip WAL segments during the initial
hash entry preparation and data-copying phase. Because the filtration
parameters -- which determine if a segment should be skipped -- depend
on the WAL segment size, we must first read a WAL page through the
streamer to calculate that size, which is done in
init_archive_reader(). Therefore, the responsibility of the archive
streamer is strictly to copy the WAL segment data to the buffer.

The skipping decision is handled inside get_archive_wal_entry(), which
sets privateInfo->cur_wal to NULL. In the next version, I am planning
to add a separate routine (with better commenting) that, along with
setting the pointer to NULL, releases that hash entry to avoid
unnecessary memory usage.

Another option I previously considered was adding the filtration logic
inside the archive streamer itself. However, since the very first read
is required to calculate the WAL segment size, the filter check cannot
be performed immediately. However, we could send a signal to the
archive streamer via privateInfo (e.g., a read_any_wal or
skip_wal_check boolean flag) to disable the filtration check until the
size is calculated. But that approach isn't very elegant; if the first
WAL page we read belongs to a segment we actually want to skip, we
would still have to run the filter check and handle the skip/removal
logic outside of the streamer (i.e., inside init_archive_reader()).
This would result in performing the same filtration check in two
different places.

Therefore, I believe performing the filtration check through
get_archive_wal_entry() and then calling a routine to clear
privateInfo->cur_wal and the hash entry is the better approach, IMO.
Additionally, once we consume a WAL file and move to the next one, the
hash entry and buffer for that WAL can be released to prevent
unnecessary memory consumption by calling the same routine that I am
planning to add.

> > > +       if (strstr(member->pathname, "PaxHeaders."))
> > > +               return false;
> > >
> > > There is no way that a filename containing the string "PaxHeaders."
> > > could ever pass the IsXLogFileName test just above. We shouldn't need
> > > this.
> > >
> >
> > This checks the directory name (e.g.,
> > x_dir/y_dir/PaxHeaders.NNNN/wal_file_name). The name of that metadata
> > file is exactly the same as the WAL file name, which is why
> > IsXLogFileName() doesn't help here.
>
> I think this code should only be considering files in the toplevel
> directory, and skipping over any directories it finds. I absolutely
> promise I am not going to commit anything that is specifically looking
> for PaxHeaders. Nothing we've ever done with tar files up to now has
> required that, and I don't think this should, either.

Ok, fair enough, my intention was to allow decoding of valid WAL data
from any directory in the tar archive, but I will go ahead and add
that restriction as suggested.

Regards,
Amul

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-27 12:23  Amul Sul <[email protected]>
  parent: Robert Haas <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Amul Sul @ 2026-01-27 12:23 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Jan 17, 2026 at 2:08 AM Robert Haas <[email protected]> wrote:
>
> On Fri, Jan 2, 2026 at 7:30 AM Amul Sul <[email protected]> wrote:
> > Attached is the rebased version, includes some code comment improvements.
>

Here are my replies to the remaining review comments. An updated
version of the patch was posted in the previous email [1].

> Regarding 0001, I don't think that passing the WAL segment size around
> in a global variable is a good practice. It makes it hard to tell
> whether the variable is guaranteed to have been set properly at all
> the places where it is used. Of course, this patch set does not
> introduce that problem, but I think it could avoid making it worse by
> adding WalSegSz to XLogDumpPrivate instead of exporting the global
> variable everywhere. That struct seems to be available everyplace
> where we need the value. Actually, though, I'm inclined to remove the
> file-level global as well. Patch for that attached. I'm not against
> global variables in general, but I'm against global variables that are
> just there to avoid passing the right stuff around via function
> argument lists.
>

Yes, it looks good. Added to the v11 patch set.

> 0002 claims to be just a refactoring but if
> (unlikely(private->endptr_reached)) return -1 is net new code, so I
> object. I realize this contradicts the review comment from Chao Li,
> but the principal that refactoring shouldn't change the logic is more
> important than whatever value we might get out of adding this sanity
> check here -- and I bet if we look into it we'll find that there are
> lots of other places in PostgreSQL that would also need to be changed
> if we wanted such a sanity check everywhere that we have code like
> this. I'd also like to point out that the commit message isn't really
> adequate to understand why this helps achieve the overall goal of the
> patch set.
>

Understood, removed it.

> 0003 looks functionally correct but notice that you've stolen some but
> not all of the tests from the "# various ways of specifying WAL range"
> section without copying the comment or explaining in the commit
> message why you've moved some tests but not others.
>

The excluded tests provide WAL files as input to pg_waldump, which is
incompatible with tar archive input. I have updated the commit message
to reflect this.

> Regarding 0004:
>
> Instead of making ArchivedWAL_HTAB a global variable, can we just
> store a pointer to it in XLogDumpPrivate? (I would change the name to
> something else, something all lower case.) Or maybe the pointer should
> go someplace else, but I'm skeptical of a global variable here for the
> same reasons as 0001.
>

Done.

> I think ArchivedWALFile would be a better name than ArchivedWALEntry,
> and cur_wal_file or cur_file would be a better name than cur_wal.
>

Done.

> I'm extremely happy with the way that the hash table looks and the new
> logic in init_archive_reader() to make sure that we don't need to
> reread the beginning of the WAL but can reuse the data already read.
> That seems like a huge improvement to me.
>
> +               if (recptr >= endPtr)
> +               {
> +                       /* Discard the buffered data */
> +                       resetStringInfo(&entry->buf);
> +                       len = 0;
> +
> +                       /*
> +                        * Push back the partial page data for the
> current page to the
> +                        * buffer, ensuring it remains full page
> available for re-reading
> +                        * if requested.
> +                        */
> +                       if (p > readBuff)
> +                       {
> +                               Assert((count - nbytes) > 0);
> +                               appendBinaryStringInfo(&entry->buf,
> readBuff, count - nbytes);
> +                       }
> +               }
>
> It feels pretty inelegant to me to pull data back into &entry->buf
> from readBuff here. I think what you should do is just do this test
> once, before entering the while (nbytes > 0) loop. Then you'll always
> have p == readBuff, so the nested if statement is no longer required.
> Also, the way you have it, if this triggers for the first loop
> iteration, it will trigger for every subsquent loop iteration, which
> seems wasteful.
>

I'm not sure I follow the logic. How would this trigger a copy from
readBuff back into &entry->buf on every iteration? I  believe that
this is a rare case: it should only happen when we reach the end of
the buffer with less than the requested 8kB available. In that
scenario, we copy the remaining bytes, reset the buffer, and then --
since (p > readBuff)  -- perform the pull-back before invoking the
archive stream for the next segment. Am I missing something in the
loop condition?


I replied to the rest of the review comments previously, but I’m not
sure whether replying in multiple emails is the correct approach.
Perhaps, since the patch is taking a bit longer, I wanted to keep the
conversation moving.

Regards,
Amul

1] http://postgr.es/m/CAAJ_b97mWgAkj7CqdRsof4N_J-gxqjKh79b0Aw3JRo0PoZZCJg@mail.gmail.com






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-01-27 15:55  Robert Haas <[email protected]>
  parent: Amul Sul <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Robert Haas @ 2026-01-27 15:55 UTC (permalink / raw)
  To: Amul Sul <[email protected]>; +Cc: Chao Li <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Tue, Jan 27, 2026 at 7:23 AM Amul Sul <[email protected]> wrote:
> Yes, it looks good. Added to the v11 patch set.

I'll respond to the rest of this later, but I went ahead and committed that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

end of thread, other threads:[~2026-01-27 15:55 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-02 12:29 Re: pg_waldump: support decoding of WAL inside tarfile Amul Sul <[email protected]>
2026-01-16 20:38 ` Robert Haas <[email protected]>
2026-01-19 11:13   ` Amul Sul <[email protected]>
2026-01-19 13:30     ` Robert Haas <[email protected]>
2026-01-23 12:27       ` Amul Sul <[email protected]>
2026-01-27 12:23   ` Amul Sul <[email protected]>
2026-01-27 15:55     ` Robert Haas <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox