public inbox for [email protected]  
help / color / mirror / Atom feed
Re: pg_waldump: support decoding of WAL inside tarfile
35+ messages / 9 participants
[nested] [flat]

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-01 02:05  Thomas Munro <[email protected]>
  0 siblings, 2 replies; 35+ messages in thread

From: Thomas Munro @ 2026-04-01 02:05 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Mon, Mar 30, 2026 at 11:23 AM Tom Lane <[email protected]> wrote:
> Thomas Munro <[email protected]> writes:
> > Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
> > unlikely to hit this in the wild, and the symptom is a confusing error
> > in a maintenance tool, not corruption, so I don't think this is a big
> > deal.  I might still try teaching the astreamer code to understand PAX
> > 1.0 when it sees it in the next cycle though, for the benefit of
> > FreeBSD users.
>
> I agree that this isn't too critical if the effects are confined to
> pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
> parse files made by tar(1) and not by our own sparseness-ignorant
> tar-writing code.  If they can be, that'd be a higher-priority reason
> to fill in this gap.

I pushed the workaround for the test.

Yeah I can't see any reason why pg_verifybackup --wal-path=foo.tar
won't suffer the same problem in the wild.  Again, it's not the end of
the world because it'll just fail and you'll probably eventually
figure out why.  So perhaps we should just improve our detection of
archives that we can't handle?  Straw man algorithm:

If you can't find $NAME in the archive, then check if PaxHeaders/$NAME
exists, and if so, fail with 'unsupported TAR format for WAL file "%s"
in archive "%s"' instead.  That'd probably work well enough in
practice, because astreamer_tar.c treats PAX extended header
pseudo-files as regular files (they're not, they have type 'x'), and
both GNU and BSD tar happen to use that.

POSIX doesn't require that naming, so it would in theory be more
correct to teach astreamer_tar.c to recognise PAX extended headers and
fish out enough information and link it to the following archive
member, but a simple test to improve error messaging seems like the
right level of effort here.

Here's a test patch that shows the problem on any system with GNU tar
or BSD tar and a file system that supports sparse files.  The test
succeeds because it looks for "error: could not find WAL" but the idea
would be to change it to look for a new error message like that.  My
motivation was to make this reproducible on any system, in case that's
helpful for Amul and Andrew if they're interested in trying to improve
this edge case in time for the release.  Otherwise I'll come back to
it, but probably not in time...


Attachments:

  [text/x-patch] 0001-Add-a-pg_waldump-test-with-GNU-tar-PAX-format.patch (4.2K, 2-0001-Add-a-pg_waldump-test-with-GNU-tar-PAX-format.patch)
  download | inline diff:
From 084d71f81143f0462caf03569722b5f0b2a147e6 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Mon, 30 Mar 2026 18:20:09 +1300
Subject: [PATCH] Add a pg_waldump test with GNU tar PAX format.

XXX Update this to test for a new improved error message!

XXX Should this test run for all the scenarios?  Doesn't seem like
compression is relevant to this problem so I just added it as a
standalone test...

XXX No doubt the perl isn't the greatest...
---
 src/bin/pg_waldump/t/001_basic.pl | 73 +++++++++++++++++++++++++++++--
 1 file changed, 69 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index ce1f6aa30c0..7f8a319c85d 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -6,6 +6,7 @@ use warnings FATAL => 'all';
 use Cwd;
 use File::Copy;
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::RecursiveCopy;
 use PostgreSQL::Test::Utils;
 use Test::More;
 use List::Util qw(shuffle);
@@ -212,9 +213,13 @@ $node->safe_psql('postgres',
 	qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))}
 );
 
-my ($end_lsn, $end_walfile) = split /\|/,
+my ($end_lsn, $end_walfile, $wal_segsize) = split /\|/,
   $node->safe_psql('postgres',
-	q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())}
+	q{SELECT pg_current_wal_insert_lsn(),
+			 pg_walfile_name(pg_current_wal_insert_lsn()),
+			 setting
+        FROM pg_settings
+       WHERE name = 'wal_segment_size'}
   );
 
 my $default_ts_oid = $node->safe_psql('postgres',
@@ -339,7 +344,7 @@ sub test_pg_waldump
 # Create a tar archive, shuffle the file order
 sub generate_archive
 {
-	my ($archive, $directory, $compression_flags) = @_;
+	my ($archive, $directory, $compression_flags, @extra_flags) = @_;
 
 	my @files;
 	opendir my $dh, $directory or die "opendir: $!";
@@ -350,12 +355,17 @@ sub generate_archive
 	}
 	closedir $dh;
 
+	if (!@extra_flags)
+	{
+		@extra_flags = @tar_c_flags;
+	}
+
 	@files = shuffle @files;
 
 	# move into the WAL directory before archiving files
 	my $cwd = getcwd;
 	chdir($directory) || die "chdir: $!";
-	command_ok([$tar, @tar_c_flags, $compression_flags, $archive, @files]);
+	command_ok([$tar, @extra_flags, $compression_flags, $archive, @files]);
 	chdir($cwd) || die "chdir: $!";
 }
 
@@ -477,4 +487,59 @@ for my $scenario (@scenarios)
 	}
 }
 
+SKIP:
+	skip "tar command is not available", 1
+		if !defined $tar;
+
+	my @sparse_flags;
+
+	# Tell $TAR to use GNU tar's PAX sparse file archive format, so we can test
+	# our handling of that.
+
+	# GNU tar
+	@sparse_flags = ("--sparse", "--format=pax")
+		if system("$tar --sparse --format=pax -c " .
+				  $node->data_dir . "/pg_wal/* /dev/null > /dev/null") == 0;
+	# BSD tar (this is the default, but we still need to detect BSD tar)
+	@sparse_flags = ("--read-sparse", "--format=pax")
+		if system("$tar --read-sparse --format=pax -c " .
+				  $node->data_dir . "/pg_wal/* /dev/null > /dev/null") == 0;
+
+	skip "tar command doesn't support GNU PAX format for sparse files", 1
+		if !@sparse_flags;
+
+	PostgreSQL::Test::RecursiveCopy::copypath($node->data_dir . '/pg_wal',
+											  $tmp_dir . '/pg_wal_sparse');
+
+	# truncate the unused part of final WAL file
+	my $end_byte = $end_lsn;
+	$end_byte =~ s/\///;
+	$end_byte = hex($end_byte);
+	$end_byte %= $wal_segsize;
+	truncate $tmp_dir . '/pg_wal_sparse/' . $end_walfile, $end_byte;
+
+	# now re-extend it to create a hole
+	truncate $tmp_dir . '/pg_wal_sparse/' . $end_walfile, $wal_segsize;
+
+	# XXX maybe we should detect sparse files with stat (size > blocks * block
+	# size?), and skip the test if truncate failed to make one... that
+	# might happen on eg windows I think?  otherwise we'd have to tolerate
+	# the pg_waldump command succeeding OR failing with a certain message
+
+	generate_archive($tmp_dir . '/pg_wal_sparse.tar',
+					 $tmp_dir . '/pg_wal_sparse',
+					 '-cf',
+					 @sparse_flags);
+
+	# XXX change this to check for new improved error message
+	command_fails_like(
+		[
+			'pg_waldump',
+			'--path' => $tmp_dir . '/pg_wal_sparse.tar',
+			'--start' => $start_lsn,
+			'--end' => $end_lsn,
+		],
+		qr/error: could not find WAL/,
+		'fails with GNU tar PAX-format sparse files');
+
 done_testing();
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-01 10:39  Andrew Dunstan <[email protected]>
  parent: Thomas Munro <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Andrew Dunstan @ 2026-04-01 10:39 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>


On 2026-03-31 Tu 10:05 PM, Thomas Munro wrote:
> On Mon, Mar 30, 2026 at 11:23 AM Tom Lane<[email protected]> wrote:
>> Thomas Munro<[email protected]> writes:
>>> Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
>>> unlikely to hit this in the wild, and the symptom is a confusing error
>>> in a maintenance tool, not corruption, so I don't think this is a big
>>> deal.  I might still try teaching the astreamer code to understand PAX
>>> 1.0 when it sees it in the next cycle though, for the benefit of
>>> FreeBSD users.
>> I agree that this isn't too critical if the effects are confined to
>> pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
>> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
>> parse files made by tar(1) and not by our own sparseness-ignorant
>> tar-writing code.  If they can be, that'd be a higher-priority reason
>> to fill in this gap.
> I pushed the workaround for the test.


It occurred to me this morning that we probably shouldn't run this test 
on Windows, and if we do we shouldn't be using /dev/null (the Windows 
equivalent of which is just "nul"). The simplest fix would just be to 
add a "!$windows_os" to the if test.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-01 13:26  Andres Freund <[email protected]>
  parent: Andrew Dunstan <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Andres Freund @ 2026-04-01 13:26 UTC (permalink / raw)
  To: Andrew Dunstan <[email protected]>; +Cc: Thomas Munro <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

On 2026-04-01 06:39:05 -0400, Andrew Dunstan wrote:
> 
> On 2026-03-31 Tu 10:05 PM, Thomas Munro wrote:
> > On Mon, Mar 30, 2026 at 11:23 AM Tom Lane<[email protected]> wrote:
> > > Thomas Munro<[email protected]> writes:
> > > > Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
> > > > unlikely to hit this in the wild, and the symptom is a confusing error
> > > > in a maintenance tool, not corruption, so I don't think this is a big
> > > > deal.  I might still try teaching the astreamer code to understand PAX
> > > > 1.0 when it sees it in the next cycle though, for the benefit of
> > > > FreeBSD users.
> > > I agree that this isn't too critical if the effects are confined to
> > > pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
> > > astreamer_tar.c, but it's not clear to me if they'd ever be asked to
> > > parse files made by tar(1) and not by our own sparseness-ignorant
> > > tar-writing code.  If they can be, that'd be a higher-priority reason
> > > to fill in this gap.
> > I pushed the workaround for the test.
> 
> 
> It occurred to me this morning that we probably shouldn't run this test on
> Windows, and if we do we shouldn't be using /dev/null (the Windows
> equivalent of which is just "nul"). The simplest fix would just be to add a
> "!$windows_os" to the if test.

Why should we skip this test on windows?

I think we have historically been way too liberal about sprinkling
!$windows_os test disablements around. More than once there were actual bugs
that we just swept under the rug by disabling the tests that detected them.
Either we support windows or we don't.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-01 14:19  Andrew Dunstan <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2026-04-01 14:19 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Thomas Munro <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>


On 2026-04-01 We 9:26 AM, Andres Freund wrote:
> Hi,
>
> On 2026-04-01 06:39:05 -0400, Andrew Dunstan wrote:
>> On 2026-03-31 Tu 10:05 PM, Thomas Munro wrote:
>>> On Mon, Mar 30, 2026 at 11:23 AM Tom Lane<[email protected]> wrote:
>>>> Thomas Munro<[email protected]> writes:
>>>>> Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
>>>>> unlikely to hit this in the wild, and the symptom is a confusing error
>>>>> in a maintenance tool, not corruption, so I don't think this is a big
>>>>> deal.  I might still try teaching the astreamer code to understand PAX
>>>>> 1.0 when it sees it in the next cycle though, for the benefit of
>>>>> FreeBSD users.
>>>> I agree that this isn't too critical if the effects are confined to
>>>> pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
>>>> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
>>>> parse files made by tar(1) and not by our own sparseness-ignorant
>>>> tar-writing code.  If they can be, that'd be a higher-priority reason
>>>> to fill in this gap.
>>> I pushed the workaround for the test.
>>
>> It occurred to me this morning that we probably shouldn't run this test on
>> Windows, and if we do we shouldn't be using /dev/null (the Windows
>> equivalent of which is just "nul"). The simplest fix would just be to add a
>> "!$windows_os" to the if test.
> Why should we skip this test on windows?
>
> I think we have historically been way too liberal about sprinkling
> !$windows_os test disablements around. More than once there were actual bugs
> that we just swept under the rug by disabling the tests that detected them.
> Either we support windows or we don't.
>

Maybe I misunderstood, but I didn't think this was going to be an issue 
on NTFS.

In general I agree with you, though. I try to avoid skipping things on 
Windows.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-01 18:25  Tom Lane <[email protected]>
  parent: Thomas Munro <[email protected]>
  1 sibling, 2 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-01 18:25 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> On Mon, Mar 30, 2026 at 11:23 AM Tom Lane <[email protected]> wrote:
>> I agree that this isn't too critical if the effects are confined to
>> pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
>> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
>> parse files made by tar(1) and not by our own sparseness-ignorant
>> tar-writing code.  If they can be, that'd be a higher-priority reason
>> to fill in this gap.

> Yeah I can't see any reason why pg_verifybackup --wal-path=foo.tar
> won't suffer the same problem in the wild.  Again, it's not the end of
> the world because it'll just fail and you'll probably eventually
> figure out why.  So perhaps we should just improve our detection of
> archives that we can't handle?

After reading the POSIX spec for pax format (in the pax(1) man page),
I think it's absolutely essential that we reject files that contain
pax extension headers.  Those can change the interpretation of the
following file header(s) in nearly arbitrary ways, so we have plenty
of problems besides this sparse-file issue if we just ignore them.

(Of course, later we can consider improving the code to handle them
correctly, but that ain't happening in time for v19.)

Also, if we are admitting the possibility that what we are reading
was made by a platform-supplied tar and not our own code, I think
it verges on lunacy to behave as though unsupported typeflags are
regular files.

So I think we need something more or less like the attached.

			regards, tom lane



Attachments:

  [text/x-diff] v1-tighten-tar-typeflag-handling.patch (5.3K, 2-v1-tighten-tar-typeflag-handling.patch)
  download | inline diff:
diff --git a/src/bin/pg_basebackup/astreamer_inject.c b/src/bin/pg_basebackup/astreamer_inject.c
index 14a96733bad..a5dff0ac1c6 100644
--- a/src/bin/pg_basebackup/astreamer_inject.c
+++ b/src/bin/pg_basebackup/astreamer_inject.c
@@ -224,8 +224,9 @@ astreamer_inject_file(astreamer *streamer, char *pathname, char *data,
 	strlcpy(member.pathname, pathname, MAXPGPATH);
 	member.size = len;
 	member.mode = pg_file_create_mode;
+	member.is_regular = true;
 	member.is_directory = false;
-	member.is_link = false;
+	member.is_symlink = false;
 	member.linktarget[0] = '\0';
 
 	/*
diff --git a/src/bin/pg_verifybackup/astreamer_verify.c b/src/bin/pg_verifybackup/astreamer_verify.c
index 26c98186530..99bd69ce7c9 100644
--- a/src/bin/pg_verifybackup/astreamer_verify.c
+++ b/src/bin/pg_verifybackup/astreamer_verify.c
@@ -165,7 +165,7 @@ member_verify_header(astreamer *streamer, astreamer_member *member)
 	char		pathname[MAXPGPATH];
 
 	/* We are only interested in normal files. */
-	if (member->is_directory || member->is_link)
+	if (!member->is_regular)
 		return;
 
 	/*
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index ff1cc3fa5f0..e4a4bf44a7e 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -815,7 +815,7 @@ member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member,
 	char	   *filename;
 
 	/* We are only interested in normal files */
-	if (member->is_directory || member->is_link)
+	if (!member->is_regular)
 		return false;
 
 	if (strlen(member->pathname) < XLOG_FNAME_LEN)
diff --git a/src/fe_utils/astreamer_file.c b/src/fe_utils/astreamer_file.c
index 158e9a14f2c..0fca70a4f86 100644
--- a/src/fe_utils/astreamer_file.c
+++ b/src/fe_utils/astreamer_file.c
@@ -228,9 +228,13 @@ astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
 				mystreamer->filename[fnamelen - 1] = '\0';
 
 			/* Dispatch based on file type. */
-			if (member->is_directory)
+			if (member->is_regular)
+				mystreamer->file =
+					create_file_for_extract(mystreamer->filename,
+											member->mode);
+			else if (member->is_directory)
 				extract_directory(mystreamer->filename, member->mode);
-			else if (member->is_link)
+			else if (member->is_symlink)
 			{
 				const char *linktarget = member->linktarget;
 
@@ -238,10 +242,6 @@ astreamer_extractor_content(astreamer *streamer, astreamer_member *member,
 					linktarget = mystreamer->link_map(linktarget);
 				extract_link(mystreamer->filename, linktarget);
 			}
-			else
-				mystreamer->file =
-					create_file_for_extract(mystreamer->filename,
-											member->mode);
 
 			/* Report output file change. */
 			if (mystreamer->report_output_file)
diff --git a/src/fe_utils/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
index 3b094fc0328..3de6f8d6fb2 100644
--- a/src/fe_utils/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -272,6 +272,9 @@ astreamer_tar_header(astreamer_tar_parser *mystreamer)
 
 	Assert(mystreamer->base.bbs_buffer.len == TAR_BLOCK_SIZE);
 
+	/* Zero out fields of *member, just for consistency. */
+	memset(member, 0, sizeof(astreamer_member));
+
 	/* Check whether we've got a block of all zero bytes. */
 	for (i = 0; i < TAR_BLOCK_SIZE; ++i)
 	{
@@ -299,12 +302,28 @@ astreamer_tar_header(astreamer_tar_parser *mystreamer)
 	member->mode = read_tar_number(&buffer[TAR_OFFSET_MODE], 8);
 	member->uid = read_tar_number(&buffer[TAR_OFFSET_UID], 8);
 	member->gid = read_tar_number(&buffer[TAR_OFFSET_GID], 8);
-	member->is_directory =
-		(buffer[TAR_OFFSET_TYPEFLAG] == TAR_FILETYPE_DIRECTORY);
-	member->is_link =
-		(buffer[TAR_OFFSET_TYPEFLAG] == TAR_FILETYPE_SYMLINK);
-	if (member->is_link)
-		strlcpy(member->linktarget, &buffer[TAR_OFFSET_LINKNAME], 100);
+
+	switch (buffer[TAR_OFFSET_TYPEFLAG])
+	{
+		case TAR_FILETYPE_PLAIN:
+		case '\0':				/* backwards compatibility hack, per POSIX */
+			member->is_regular = true;
+			break;
+		case TAR_FILETYPE_DIRECTORY:
+			member->is_directory = true;
+			break;
+		case TAR_FILETYPE_SYMLINK:
+			member->is_symlink = true;
+			strlcpy(member->linktarget, &buffer[TAR_OFFSET_LINKNAME], 100);
+			break;
+		case TAR_FILETYPE_PAX_EXTENDED:
+		case TAR_FILETYPE_PAX_EXTENDED_GLOBAL:
+			pg_fatal("pax extensions to tar format are not supported");
+			break;
+		default:
+			/* For special files, set none of the three is_xxx flags */
+			break;
+	}
 
 	/* Compute number of padding bytes. */
 	mystreamer->pad_bytes_expected = tarPaddingBytesRequired(member->size);
diff --git a/src/include/fe_utils/astreamer.h b/src/include/fe_utils/astreamer.h
index f370ef62720..8329e4efbc5 100644
--- a/src/include/fe_utils/astreamer.h
+++ b/src/include/fe_utils/astreamer.h
@@ -83,8 +83,10 @@ typedef struct
 	mode_t		mode;
 	uid_t		uid;
 	gid_t		gid;
+	/* note: special filetypes will set none of these flags */
+	bool		is_regular;
 	bool		is_directory;
-	bool		is_link;
+	bool		is_symlink;
 	char		linktarget[MAXPGPATH];
 } astreamer_member;
 
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index eb93bdef5c4..d6c434e09b0 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -60,6 +60,8 @@ enum tarFileType
 	TAR_FILETYPE_PLAIN = '0',
 	TAR_FILETYPE_SYMLINK = '2',
 	TAR_FILETYPE_DIRECTORY = '5',
+	TAR_FILETYPE_PAX_EXTENDED = 'x',
+	TAR_FILETYPE_PAX_EXTENDED_GLOBAL = 'g',
 };
 
 extern enum tarError tarCreateHeader(char *h, const char *filename,


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 00:16  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-02 00:16 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Thu, Apr 2, 2026 at 7:25 AM Tom Lane <[email protected]> wrote:
> Also, if we are admitting the possibility that what we are reading
> was made by a platform-supplied tar and not our own code, I think
> it verges on lunacy to behave as though unsupported typeflags are
> regular files.

Yeah, if this is the first time we parse files we didn't make then
that makes total sense.  I was a bit unsure of that question when I
suggested we reject pax only after we've failed to find a file, in
case there are scenarios that work today with harmless ignorable pax
headers that don't change the file name.

> So I think we need something more or less like the attached.

LGTM.  Tested with both tars here.  I updated that little test patch
for this.  Not sure if you think it's worth a test though, now that
it's so simple.

@Andrew: I tried using File::Spec->devnull() this time.  Are you able
to check if this works OK on Windows, applied on top of Tom's patch?
AFAIK should be able to run this new test and pass, not skip it.  But
it could be that the shell invocation needs tweaking.  It's hard to
tell from CI.  (Huh, apparently Windows ships a copy of BSD tar as
C:\Windows\System32\tar.exe these days.)


Attachments:

  [text/x-patch] 0001-Test-rejection-of-pax-extended-tar-files.patch (3.0K, 2-0001-Test-rejection-of-pax-extended-tar-files.patch)
  download | inline diff:
From 57b9d1a58a9eb30feec72b0ccce6bdfd5e1ef92e Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Mon, 30 Mar 2026 18:20:09 +1300
Subject: [PATCH] Test rejection of pax extended tar files.

Also change 852de579's shell invocation to use NUL on Windows instead of
/dev/null, per feedback from Andrew Dunstan.  That command wasn't
expected to succeed on Windows, but the similar usage added here might.
---
 src/bin/pg_waldump/t/001_basic.pl | 42 +++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl
index ce1f6aa30c0..3b99a8f6d4f 100644
--- a/src/bin/pg_waldump/t/001_basic.pl
+++ b/src/bin/pg_waldump/t/001_basic.pl
@@ -5,6 +5,7 @@ use strict;
 use warnings FATAL => 'all';
 use Cwd;
 use File::Copy;
+use File::Spec;
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
@@ -13,12 +14,14 @@ use List::Util qw(shuffle);
 my $tar = $ENV{TAR};
 my @tar_c_flags;
 
-# By default, bsdtar archives sparse files in GNU tar's --format=posix --sparse
-# format, so pg_waldump can't find files that ZFS has decided to store with
-# holes.  Turn that off.
-if (system("$tar --no-read-sparse -c - /dev/null > /dev/null") == 0)
+my $devnull = File::Spec->devnull();
+
+# By default, bsdtar archives sparse files in GNU tar's --format=pax --sparse
+# format, so pg_waldump rejects WAL that ZFS has decided to store with holes.
+# Turn that off.
+if (system("$tar --no-read-sparse -c $devnull > $devnull") == 0)
 {
-  push(@tar_c_flags, "--no-read-sparse");
+	push(@tar_c_flags, "--no-read-sparse");
 }
 
 program_help_ok('pg_waldump');
@@ -339,7 +342,7 @@ sub test_pg_waldump
 # Create a tar archive, shuffle the file order
 sub generate_archive
 {
-	my ($archive, $directory, $compression_flags) = @_;
+	my ($archive, $directory, $compression_flags, @extra_flags) = @_;
 
 	my @files;
 	opendir my $dh, $directory or die "opendir: $!";
@@ -355,7 +358,7 @@ sub generate_archive
 	# move into the WAL directory before archiving files
 	my $cwd = getcwd;
 	chdir($directory) || die "chdir: $!";
-	command_ok([$tar, @tar_c_flags, $compression_flags, $archive, @files]);
+	command_ok([$tar, @extra_flags, @tar_c_flags, $compression_flags, $archive, @files]);
 	chdir($cwd) || die "chdir: $!";
 }
 
@@ -477,4 +480,29 @@ for my $scenario (@scenarios)
 	}
 }
 
+SKIP:
+{
+	skip "tar command is not available", 1
+		if !defined $tar;
+
+	skip "tar command doesn't understand --format=pax", 1
+		if system("$tar --format=pax -c " .
+				  $node->data_dir . "/pg_wal/* > $devnull") != 0;
+
+	generate_archive($tmp_dir . '/pg_wal_pax.tar',
+					 $node->data_dir . '/pg_wal',
+					 '-cf',
+					 ("--format=pax"));
+
+	command_fails_like(
+		[
+			'pg_waldump',
+			'--path' => $tmp_dir . '/pg_wal_pax.tar',
+			'--start' => $start_lsn,
+			'--end' => $end_lsn,
+		],
+		qr/error: pax extensions to tar format are not supported/,
+		'fails if pax extended header is detected');
+}
+
 done_testing();
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 01:22  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Tom Lane @ 2026-04-02 01:22 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> On Thu, Apr 2, 2026 at 7:25 AM Tom Lane <[email protected]> wrote:
>> Also, if we are admitting the possibility that what we are reading
>> was made by a platform-supplied tar and not our own code, I think
>> it verges on lunacy to behave as though unsupported typeflags are
>> regular files.

> Yeah, if this is the first time we parse files we didn't make then
> that makes total sense.  I was a bit unsure of that question when I
> suggested we reject pax only after we've failed to find a file, in
> case there are scenarios that work today with harmless ignorable pax
> headers that don't change the file name.

Of course this is all new so far as pg_waldump is concerned.
I'm a bit unclear about whether pg_verifybackup's exposure
is large enough to warrant back-patching any of this.

Looking again at astreamer_tar.c, I suddenly realized that it doesn't
do any meaningful input validation.  So if you feed it junk input,
you get garbage errors that aren't even predictable:

$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: tar member has empty name
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: COPY stream ended before last file was finished
$ head -c 32768 /dev/urandom >junk.tar
$ pg_waldump --path junk.tar --start 1/0x10000000
pg_waldump: error: could not find WAL in archive "junk.tar"

tar itself is considerably saner:

$ tar tf junk.tar 
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

So I think we need something like the attached, in addition
to what I sent before.  This just makes astreamer_tar.c use
the isValidTarHeader function that pg_dump already had.
(I decided to const-ify isValidTarHeader's argument while
moving it to a shared location, which in turn requires
const-ifying tarChecksum.)

			regards, tom lane



Attachments:

  [text/x-diff] v1-validate-tar-headers.patch (5.1K, 2-v1-validate-tar-headers.patch)
  download | inline diff:
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 271a2c3e481..fecf6f2d1ce 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -44,6 +44,7 @@
 #include "pg_backup_archiver.h"
 #include "pg_backup_db.h"
 #include "pg_backup_utils.h"
+#include "pgtar.h"
 
 #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n"
 #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\n\n"
@@ -2372,7 +2373,7 @@ _discoverArchiveFormat(ArchiveHandle *AH)
 		}
 
 		if (!isValidTarHeader(AH->lookahead))
-			pg_fatal("input file does not appear to be a valid archive");
+			pg_fatal("input file does not appear to be a valid tar archive");
 
 		AH->format = archTar;
 	}
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 365073b3eae..9c3aca6543a 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -465,8 +465,6 @@ extern void InitArchiveFmt_Null(ArchiveHandle *AH);
 extern void InitArchiveFmt_Directory(ArchiveHandle *AH);
 extern void InitArchiveFmt_Tar(ArchiveHandle *AH);
 
-extern bool isValidTarHeader(char *header);
-
 extern void ReconnectToServer(ArchiveHandle *AH, const char *dbname);
 extern void IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te,
 								const char *cmdBegin, const char *cmdEnd);
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index d94d0de2a5d..a3879410c94 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -984,31 +984,6 @@ tarPrintf(TAR_MEMBER *th, const char *fmt,...)
 	return (int) cnt;
 }
 
-bool
-isValidTarHeader(char *header)
-{
-	int			sum;
-	int			chk = tarChecksum(header);
-
-	sum = read_tar_number(&header[TAR_OFFSET_CHECKSUM], 8);
-
-	if (sum != chk)
-		return false;
-
-	/* POSIX tar format */
-	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar\0", 6) == 0 &&
-		memcmp(&header[TAR_OFFSET_VERSION], "00", 2) == 0)
-		return true;
-	/* GNU tar format */
-	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar  \0", 8) == 0)
-		return true;
-	/* not-quite-POSIX format written by pre-9.3 pg_dump */
-	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar00\0", 8) == 0)
-		return true;
-
-	return false;
-}
-
 /* Given the member, write the TAR header & copy the file */
 static void
 _tarAddFile(ArchiveHandle *AH, TAR_MEMBER *th)
diff --git a/src/fe_utils/astreamer_tar.c b/src/fe_utils/astreamer_tar.c
index 3b094fc0328..8fa3329237d 100644
--- a/src/fe_utils/astreamer_tar.c
+++ b/src/fe_utils/astreamer_tar.c
@@ -260,7 +260,8 @@ astreamer_tar_parser_content(astreamer *streamer, astreamer_member *member,
  * Parse a file header within a tar stream.
  *
  * The return value is true if we found a file header and passed it on to the
- * next astreamer; it is false if we have reached the archive trailer.
+ * next astreamer; it is false if we have found the archive trailer.
+ * We throw error if we see invalid data.
  */
 static bool
 astreamer_tar_header(astreamer_tar_parser *mystreamer)
@@ -289,6 +290,12 @@ astreamer_tar_header(astreamer_tar_parser *mystreamer)
 	if (!has_nonzero_byte)
 		return false;
 
+	/*
+	 * Verify that we have a reasonable-looking header.
+	 */
+	if (!isValidTarHeader(buffer))
+		pg_fatal("input file does not appear to be a valid tar archive");
+
 	/*
 	 * Parse key fields out of the header.
 	 */
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index eb93bdef5c4..55b8c7c77a4 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -68,7 +68,8 @@ extern enum tarError tarCreateHeader(char *h, const char *filename,
 									 time_t mtime);
 extern uint64 read_tar_number(const char *s, int len);
 extern void print_tar_number(char *s, int len, uint64 val);
-extern int	tarChecksum(char *header);
+extern int	tarChecksum(const char *header);
+extern bool isValidTarHeader(const char *header);
 
 /*
  * Compute the number of padding bytes required for an entry in a tar
diff --git a/src/port/tar.c b/src/port/tar.c
index 592b4fb7b0f..db462b90292 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -87,7 +87,7 @@ read_tar_number(const char *s, int len)
  * be 512 bytes, per the tar standard.
  */
 int
-tarChecksum(char *header)
+tarChecksum(const char *header)
 {
 	int			i,
 				sum;
@@ -104,6 +104,35 @@ tarChecksum(char *header)
 	return sum;
 }
 
+/*
+ * Check validity of a tar header (assumed to be 512 bytes long).
+ * We verify the checksum and the magic number / version.
+ */
+bool
+isValidTarHeader(const char *header)
+{
+	int			sum;
+	int			chk = tarChecksum(header);
+
+	sum = read_tar_number(&header[TAR_OFFSET_CHECKSUM], 8);
+
+	if (sum != chk)
+		return false;
+
+	/* POSIX tar format */
+	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar\0", 6) == 0 &&
+		memcmp(&header[TAR_OFFSET_VERSION], "00", 2) == 0)
+		return true;
+	/* GNU tar format */
+	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar  \0", 8) == 0)
+		return true;
+	/* not-quite-POSIX format written by pre-9.3 pg_dump */
+	if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar00\0", 8) == 0)
+		return true;
+
+	return false;
+}
+
 
 /*
  * Fill in the buffer pointed to by h with a tar format header. This buffer


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 02:14  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-02 02:14 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Thu, Apr 2, 2026 at 2:22 PM Tom Lane <[email protected]> wrote:
> Looking again at astreamer_tar.c, I suddenly realized that it doesn't
> do any meaningful input validation.  So if you feed it junk input,
> you get garbage errors that aren't even predictable:

Wow.

> So I think we need something like the attached, in addition
> to what I sent before.  This just makes astreamer_tar.c use
> the isValidTarHeader function that pg_dump already had.
> (I decided to const-ify isValidTarHeader's argument while
> moving it to a shared location, which in turn requires
> const-ifying tarChecksum.)

LGTM.

$ echo -n x | dd of=foo.tar bs=1 seek=257 count=1 conv=notrunc
$ strings foo.tar | grep tar | head -1
xstar
$ pg_waldump --path=foo.tar -s 0/1 -e 0/100
pg_waldump: error: input file does not appear to be a valid tar archive

$ echo -n u | dd of=foo.tar bs=1 seek=257 count=1 conv=notrunc
$ strings foo.tar | grep tar | head -1
ustar
$ pg_waldump --path=foo.tar -s 0/1 -e 0/100
... other output...





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 11:36  Andrew Dunstan <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2026-04-02 11:36 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>


On 2026-04-01 We 8:16 PM, Thomas Munro wrote:
> On Thu, Apr 2, 2026 at 7:25 AM Tom Lane<[email protected]> wrote:
>> Also, if we are admitting the possibility that what we are reading
>> was made by a platform-supplied tar and not our own code, I think
>> it verges on lunacy to behave as though unsupported typeflags are
>> regular files.
> Yeah, if this is the first time we parse files we didn't make then
> that makes total sense.  I was a bit unsure of that question when I
> suggested we reject pax only after we've failed to find a file, in
> case there are scenarios that work today with harmless ignorable pax
> headers that don't change the file name.
>
>> So I think we need something more or less like the attached.
> LGTM.  Tested with both tars here.  I updated that little test patch
> for this.  Not sure if you think it's worth a test though, now that
> it's so simple.
>
> @Andrew: I tried usingFile::Spec->devnull() this time.  Are you able
> to check if this works OK on Windows, applied on top of Tom's patch?
> AFAIK should be able to run this new test and pass, not skip it.  But
> it could be that the shell invocation needs tweaking.  It's hard to
> tell from CI.  (Huh, apparently Windows ships a copy of BSD tar as
> C:\Windows\System32\tar.exe these days.)


Yes, that appears to work. I would put a "2>&1" at the end - we don't 
care about the output, just whether or not it succeeds:


C:\Windows\system32>perl -MFile::Spec -e "print File::Spec->devnull();"
nul

C:\Windows\system32>tar --no-read-sparse -c - nul > nul 2>&1 && echo hello

C:\Windows\system32>tar --no-read-sparse -c - nul > nul 2>&1 || echo hello
hello


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 16:29  Tom Lane <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 3 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-02 16:29 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> On Thu, Apr 2, 2026 at 2:22 PM Tom Lane <[email protected]> wrote:
>> So I think we need something like the attached, in addition
>> to what I sent before.  This just makes astreamer_tar.c use
>> the isValidTarHeader function that pg_dump already had.

> LGTM.

Pushed, thanks for reviewing!  In the event I decided to back-patch to
v18, where these fixes could protect pg_verifybackup against tar files
it can't handle.  In older branches the astreamer (nee bbstreamer)
logic only exists inside pg_basebackup, and IIUC that's not really
exposed to tar files that weren't made by our own code.  So I did
not bother with back-patching further, though it'd be possible to
do that if someone knows a scenario where it'd matter.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 17:13  Tomas Vondra <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 0 replies; 35+ messages in thread

From: Tomas Vondra @ 2026-04-02 17:13 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Thomas Munro <[email protected]>; +Cc: Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On 4/2/26 18:29, Tom Lane wrote:
> Thomas Munro <[email protected]> writes:
>> On Thu, Apr 2, 2026 at 2:22 PM Tom Lane <[email protected]> wrote:
>>> So I think we need something like the attached, in addition
>>> to what I sent before.  This just makes astreamer_tar.c use
>>> the isValidTarHeader function that pg_dump already had.
> 
>> LGTM.
> 
> Pushed, thanks for reviewing!  In the event I decided to back-patch to
> v18, where these fixes could protect pg_verifybackup against tar files
> it can't handle.  In older branches the astreamer (nee bbstreamer)
> logic only exists inside pg_basebackup, and IIUC that's not really
> exposed to tar files that weren't made by our own code.  So I did
> not bother with back-patching further, though it'd be possible to
> do that if someone knows a scenario where it'd matter.
> 

It seems jay/hippopotamus failed on this, for some reason. Those two
animals are on the same host, just using different compilers. I did
update+reboot the machine this afternoon (before the test), but I don't
see why that would cause the failure.

Maybe there's something special about OpenSUSE?

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 17:43  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 1 reply; 35+ messages in thread

From: Tom Lane @ 2026-04-02 17:43 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Tomas Vondra <[email protected]> writes:
> On 4/2/26 18:29, Tom Lane wrote:
>> Pushed, thanks for reviewing!  In the event I decided to back-patch to
>> v18, where these fixes could protect pg_verifybackup against tar files
>> it can't handle.

> It seems jay/hippopotamus failed on this, for some reason.

Hmm:

#   Failed test 'corrupt backup fails verification: extra_file: matches'
#   at t/003_corruption.pl line 198.
#                   'pg_verifybackup: error: pax extensions to tar format are not supported
# '
#     doesn't match '(?^:extra_file.*present (on disk|in archive "[^"]+") but not in the manifest)'

> Maybe there's something special about OpenSUSE?

Apparently its version of "tar" will produce pax-extended files
at the drop of a hat.  I have an OpenSUSE image around here
somewhere, will see if I can reproduce this.  But while I'm
asking, what filesystem are those animals running on top of?

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 18:08  Tomas Vondra <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Tomas Vondra @ 2026-04-02 18:08 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>



On 4/2/26 19:43, Tom Lane wrote:
> Tomas Vondra <[email protected]> writes:
>> On 4/2/26 18:29, Tom Lane wrote:
>>> Pushed, thanks for reviewing!  In the event I decided to back-patch to
>>> v18, where these fixes could protect pg_verifybackup against tar files
>>> it can't handle.
> 
>> It seems jay/hippopotamus failed on this, for some reason.
> 
> Hmm:
> 
> #   Failed test 'corrupt backup fails verification: extra_file: matches'
> #   at t/003_corruption.pl line 198.
> #                   'pg_verifybackup: error: pax extensions to tar format are not supported
> # '
> #     doesn't match '(?^:extra_file.*present (on disk|in archive "[^"]+") but not in the manifest)'
> 
>> Maybe there's something special about OpenSUSE?
> 
> Apparently its version of "tar" will produce pax-extended files
> at the drop of a hat.  I have an OpenSUSE image around here
> somewhere, will see if I can reproduce this.  But while I'm
> asking, what filesystem are those animals running on top of?
> 

btrfs

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 18:47  Tom Lane <[email protected]>
  parent: Tomas Vondra <[email protected]>
  0 siblings, 4 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-02 18:47 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Tomas Vondra <[email protected]> writes:
> On 4/2/26 19:43, Tom Lane wrote:
>> Tomas Vondra <[email protected]> writes:
>>> Maybe there's something special about OpenSUSE?

>> Apparently its version of "tar" will produce pax-extended files
>> at the drop of a hat.  I have an OpenSUSE image around here
>> somewhere, will see if I can reproduce this.  But while I'm
>> asking, what filesystem are those animals running on top of?

> btrfs

Yup, so capable of making sparse WAL files.  I can reproduce the
problem here, and what I see is

> tar --version
tar (GNU tar) 1.34
Copyright (C) 2021 Free Software Foundation, Inc.
...

> tar -?
...
  -H, --format=FORMAT        create archive of the given format

 FORMAT is one of the following:
    gnu                      GNU tar 1.13.x format
    oldgnu                   GNU format as per tar <= 1.12
    pax                      POSIX 1003.1-2001 (pax) format
    posix                    same as pax
    ustar                    POSIX 1003.1-1988 (ustar) format
    v7                       old V7 tar format
...
*This* tar defaults to:
--format=posix -f- -b20 --quoting-style=escape --rmt-command=/usr/bin/rmt
--rsh-command=/usr/bin/ssh

So there you have it: pax format by default.  This is unlike what
I see on RHEL or Fedora:

...
*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/etc/rmt
--rsh-command=/usr/bin/ssh

So it looks like we need a switch hack similar to what we did for
BSD tar, but injecting "--format=gnu" (or perhaps "--format=ustar"?)
if the tar program will take that.

Interestingly, pg_verifybackup's t/003_corruption.pl test also fails
with the same issue, so apparently this platform is even more
aggressive about sparse-ifying files than Thomas' FreeBSD box.
I wonder how come we managed to pass that test case before on
these machines.

I'm inclined to push the logic for selecting these tar options
into some common subroutine in Test::Utils, rather than having
two copies (and maybe more later).

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 19:15  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 0 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-02 19:15 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

I wrote:
> Interestingly, pg_verifybackup's t/003_corruption.pl test also fails
> with the same issue, so apparently this platform is even more
> aggressive about sparse-ifying files than Thomas' FreeBSD box.
> I wonder how come we managed to pass that test case before on
> these machines.

The answer to that seems to be that the test scripts for
pg_verifybackup simply fail to detect when it's mishandling
sparse tar entries.  We only check failing cases not successful
cases, and in each case the error checked for is independent of
whether we would have extracted WAL data correctly.  Grumble.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 22:41  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 0 replies; 35+ messages in thread

From: Thomas Munro @ 2026-04-02 22:41 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 3, 2026 at 7:48 AM Tom Lane <[email protected]> wrote:
> *This* tar defaults to:
> --format=posix -f- -b20 --quoting-style=escape --rmt-command=/usr/bin/rmt
> --rsh-command=/usr/bin/ssh



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 22:50  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 1 reply; 35+ messages in thread

From: Tom Lane @ 2026-04-02 22:50 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> On Fri, Apr 3, 2026 at 7:48 AM Tom Lane <[email protected]> wrote:
>> Interestingly, pg_verifybackup's t/003_corruption.pl test also fails
>> with the same issue, so apparently this platform is even more
>> aggressive about sparse-ifying files than Thomas' FreeBSD box.

> Looks like sparse files and BTRFS might be a red herring then, if it's
> simply been told to put a pax header on every file?

It looked to me that it was only putting a pax header on sparse-ified
WAL files.

> How about using --format=ustar, instead of that sparse control stuff?

I did it that way for GNU tar, but did not research whether bsdtar
will take that option.  Feel free to hack on ebba64c08 some more.

(It seems though that the two tars' locutions for "write to stdout"
are different, so we might have to have separate tests even if they
end up pushing the same option.)

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 23:43  Sami Imseih <[email protected]>
  parent: Tom Lane <[email protected]>
  2 siblings, 1 reply; 35+ messages in thread

From: Sami Imseih @ 2026-04-02 23:43 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Thomas Munro <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

> >> So I think we need something like the attached, in addition
> >> to what I sent before.  This just makes astreamer_tar.c use
> >> the isValidTarHeader function that pg_dump already had.
>
> > LGTM.
>
> Pushed, thanks for reviewing!  In the event I decided to back-patch to
> v18, where these fixes could protect pg_verifybackup against tar files
> it can't handle.

Hi,

I just encountered a regression test failure for pg_waldump due to ebba64c08d9.

````
――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀
――――――――――――――――――――――――――――――――――――――――――――――――――――――
Listing only the last 100 lines from a long log.
#   at /local/home/simseih/pgdev/installations/worktrees/dev/src/bin/pg_waldump/t/001_basic.pl
line 432.
#          got: 'pg_waldump: error: could not find WAL in archive
"pg_wal.tar.gz"
# '
#     expected: ''
```

and regress_log_001_basic shows this:

```
# Running: /usr/bin/tar --format=ustar -cf /tmp/ja26rXZOnb/pg_wal.tar
archive_status 000000010000000000000002 000000010000000000000001
summaries 000000010000000000000003
[22:25:00.525](0.008s) not ok 101
[22:25:00.525](0.000s) #   Failed test at
/local/home/simseih/pgdev/installations/worktrees/dev/src/bin/pg_waldump/t/001_basic.pl
line 350.
[22:25:00.525](0.000s) # ---------- command failed ----------
[22:25:00.526](0.000s) # /usr/bin/tar --format=ustar -cf
/tmp/ja26rXZOnb/pg_wal.tar archive_status 000000010000000000000002
000000010000000000000001 summaries 000000010000000000000003
[22:25:00.526](0.000s) # -------------- stderr --------------
[22:25:00.526](0.000s) # /usr/bin/tar: value 10012663 out of uid_t
range 0..2097151
```

The --format=ustar has a limit of 2^21 (2097151) for UID/GID [1]
and on my machine the UID is 10012663.

So I found that one way to deal with this is to run the tar command with
--owner=0 --group=0. As far as I can tell, the owner and group IDs don't
matter for these tests, so maybe that is OK.

@@ -1333,6 +1333,10 @@ sub tar_portability_options
                == 0)
        {
                push(@tar_p_flags, "--format=ustar");
+               # ustar format supports UIDs only up to 2^21 (2097151).
+               # Override owner/group to avoid failures on systems where
+               # the running user's UID/GID exceeds that limit.
+               push(@tar_p_flags, "--owner=0", "--group=0");
        }

While this fixes the test, I am now not sure what the broader implications are
for --format=ustar for pg_waldump in the broader discussion?

[1] [https://www.gnu.org/software/tar/manual/html_section/Formats.html]

--
Sami Imseih
Amazon Web Services (AWS)





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-02 23:49  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-02 23:49 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 3, 2026 at 11:50 AM Tom Lane <[email protected]> wrote:
> > How about using --format=ustar, instead of that sparse control stuff?
>
> I did it that way for GNU tar, but did not research whether bsdtar
> will take that option.  Feel free to hack on ebba64c08 some more.
>
> (It seems though that the two tars' locutions for "write to stdout"
> are different, so we might have to have separate tests even if they
> end up pushing the same option.)

I have:

$ tar --version
bsdtar 3.8.2 - libarchive 3.8.2 zlib/1.3.1 liblzma/5.8.1 libzstd/1.5.2
openssl/3.5.4 libb2/bundled
$ gtar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html;.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

This seems to work for both:

$ tar --format=ustar -c /dev/null  > /dev/null
tar: Removing leading '/' from member names
$ gtar --format=ustar -c /dev/null  > /dev/null
gtar: Removing leading `/' from member names

The attached passes with both, and regress_log_001_basic looks like:

# Running: /usr/bin/tar --format=ustar -cf /tmp/J_ifbfUOSd/pg_wal.tar
archive_status 000000010000000000000001 000000010000000000000003
000000010000000000000002 summaries
[12:12:24.301](0.072s) ok 101

# Running: /usr/local/bin/gtar --format=ustar -cf
/tmp/pbdsHdrAdw/pg_wal.tar 000000010000000000000002 archive_status
000000010000000000000003 summaries 000000010000000000000001
[12:18:14.739](0.050s) ok 101

I think a Windows system could be using either.  BSD tar comes
pre-installed by Microsoft and people often install GNU tools.  So I
think we should use File::Spec->devnull() instead of /dev/null, and
Andrew showed that working.  I doubt Windows is capable of making
sparse files (except perhaps with ReFS?), but it's nice to use the
same code everywhere and future-proof in case GNU carries out its
thread to switch to pax by default.  Windows probably has file
attributes that ustar can't represent (?), so I guess that might
motivate it to use pax headers if they are indeed added only when
needed.

Longer term I think we need to tolerate but ignore pax headers.  If I
understand the spirit of this long evolution, pax archives are
intended to be acceptable to pre-pax implementations, which implies
that they can't really change the meaning of the bits of the file
contents.  That's why GNU's --sparse hides funky file encodings from
old tars by renaming them to GNUSparseFile.%p/%f, and that leads back
to my original suggestion that we should figure out how to detect and
reject pax only if we failed to find the file under the expected name.
(Or of course we could just implement support for that, and I have a
half-baked trial patch for that but now is not the time.)


Attachments:

  [text/x-patch] 0001-Harmonize-tar-option-tests-from-ebba64c0.patch (1.6K, 2-0001-Harmonize-tar-option-tests-from-ebba64c0.patch)
  download | inline diff:
From 0635c0106946ce76c1bb84a5c17b71d0b0e574f7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Fri, 3 Apr 2026 12:03:56 +1300
Subject: [PATCH] Harmonize tar option tests from ebba64c0.

*  GNU and BSD tar both understand --format=ustar.
*  Windows lacks /dev/null, but perl knows its local name.

Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Backpatch-through: 18
---
 src/test/perl/PostgreSQL/Test/Utils.pm | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 120999f6ac9..05e1698efa6 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1328,21 +1328,14 @@ sub tar_portability_options
 
 	# GNU tar typically produces gnu-format archives, which we can read fine.
 	# But some platforms configure it to default to posix/pax format, and
-	# apparently they enable --sparse too.  Override that.
-	if (system("$tar --format=ustar -c -O /dev/null >/dev/null 2>/dev/null")
+	# apparently they enable --sparse too.  BSD tar does something similar.
+	# Override that.
+	my $devnull = File::Spec->devnull();
+	if (system("$tar --format=ustar -c $devnull >$devnull 2>$devnull")
 		== 0)
 	{
 		push(@tar_p_flags, "--format=ustar");
 	}
-
-	# bsdtar also archives sparse files by default, but it spells the switch
-	# to disable that differently.
-	if (system("$tar --no-read-sparse -c - /dev/null >/dev/null 2>/dev/null")
-		== 0)
-	{
-		push(@tar_p_flags, "--no-read-sparse");
-	}
-
 	return @tar_p_flags;
 }
 
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 00:07  Thomas Munro <[email protected]>
  parent: Sami Imseih <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Thomas Munro @ 2026-04-03 00:07 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 3, 2026 at 12:43 PM Sami Imseih <[email protected]> wrote:
> The --format=ustar has a limit of 2^21 (2097151) for UID/GID [1]
> and on my machine the UID is 10012663.
>
> So I found that one way to deal with this is to run the tar command with
> --owner=0 --group=0. As far as I can tell, the owner and group IDs don't
> matter for these tests, so maybe that is OK.
>
> @@ -1333,6 +1333,10 @@ sub tar_portability_options
>                 == 0)
>         {
>                 push(@tar_p_flags, "--format=ustar");
> +               # ustar format supports UIDs only up to 2^21 (2097151).
> +               # Override owner/group to avoid failures on systems where
> +               # the running user's UID/GID exceeds that limit.
> +               push(@tar_p_flags, "--owner=0", "--group=0");

Interesting.  BSD tar accepts those too, so here's an update to my
previous patch.

> While this fixes the test, I am now not sure what the broader implications are
> for --format=ustar for pg_waldump in the broader discussion?

I think users who have their own tar scripts will mostly be
unaffected, but a small minority will see the new error if they try to
use pg_verifybackup or pg_waldump, and they'll find their way to
--format=ustar, and then they might see the UID/GID error in their own
.tar production scripts, and find their way to adding those switches
too.  Seems OK?  Especially with a documentation note once we've
settle all of this.


Attachments:

  [text/x-patch] v2-0001-Improve-tar-portability-logic-from-ebba64c0.patch (2.1K, 2-v2-0001-Improve-tar-portability-logic-from-ebba64c0.patch)
  download | inline diff:
From a2c065fd5cee53bcde2ccaf92939737e8fc07f06 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Fri, 3 Apr 2026 12:03:56 +1300
Subject: [PATCH v2] Improve tar portability logic from ebba64c0.

* GNU and BSD tar both understand --format=ustar.
* Windows lacks /dev/null, but perl knows its local name.
* ustar format doesn't like large UID/GID values, so set them to 0.

Backpatch-through: 18
Co-authored-by: Thomas Munro <[email protected]>
Co-authored-by: Sami Imseih <[email protected]>
Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com
---
 src/test/perl/PostgreSQL/Test/Utils.pm | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 120999f6ac9..370acfcef7e 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1328,21 +1328,17 @@ sub tar_portability_options
 
 	# GNU tar typically produces gnu-format archives, which we can read fine.
 	# But some platforms configure it to default to posix/pax format, and
-	# apparently they enable --sparse too.  Override that.
-	if (system("$tar --format=ustar -c -O /dev/null >/dev/null 2>/dev/null")
-		== 0)
-	{
-		push(@tar_p_flags, "--format=ustar");
-	}
-
-	# bsdtar also archives sparse files by default, but it spells the switch
-	# to disable that differently.
-	if (system("$tar --no-read-sparse -c - /dev/null >/dev/null 2>/dev/null")
+	# apparently they enable --sparse too.  BSD tar does something similar.
+	#
+	# ustar format supports UIDs only up to 2^21 (2097151).  Override
+	# owner/group to avoid failures on systems where the running user's UID/GID
+	# exceeds that limit.
+	my $devnull = File::Spec->devnull();
+	if (system("$tar --format=ustar --owner=0 --group=0 -c $devnull >$devnull 2>$devnull")
 		== 0)
 	{
-		push(@tar_p_flags, "--no-read-sparse");
+		push(@tar_p_flags, "--format=ustar", "--owner=0", "--group=0");
 	}
-
 	return @tar_p_flags;
 }
 
-- 
2.53.0



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 00:11  Tom Lane <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 2 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-03 00:11 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> On Fri, Apr 3, 2026 at 11:50 AM Tom Lane <[email protected]> wrote:
>>> How about using --format=ustar, instead of that sparse control stuff?

>> I did it that way for GNU tar, but did not research whether bsdtar
>> will take that option.  Feel free to hack on ebba64c08 some more.

> This seems to work for both:

> $ tar --format=ustar -c /dev/null  > /dev/null
> tar: Removing leading '/' from member names
> $ gtar --format=ustar -c /dev/null  > /dev/null
> gtar: Removing leading `/' from member names

Cool.  LGTM.

> I think a Windows system could be using either.  BSD tar comes
> pre-installed by Microsoft and people often install GNU tools.  So I
> think we should use File::Spec->devnull() instead of /dev/null, and
> Andrew showed that working.

Agreed.

> Longer term I think we need to tolerate but ignore pax headers.  If I
> understand the spirit of this long evolution, pax archives are
> intended to be acceptable to pre-pax implementations, which implies
> that they can't really change the meaning of the bits of the file
> contents.

I don't buy that.  For example, POSIX specifies these allowed
fields in an extended header:

    linkpath
        The pathname of a link being created to another file, of any
        type, previously archived. This record shall override the
        linkname field in the following ustar header block(s).

    path
        The pathname of the following file(s). This record shall
        override the name and prefix fields in the following header
        block(s).

    size
        The size of the file in octets, expressed as a decimal number
        using digits from the ISO/IEC 646:1991 standard. This record
        shall override the size field in the following header
        block(s).

GNU tar seems to try hard to ensure that a non-pax-aware tar can
extract *something* from a tar file, but it's not guaranteed that the
something contains the right data or is located at the right pathname.
It looks like the goal is to allow post-processing to pick up the
pieces.

In any case, this is all completely moot if we don't write code to
de-sparse a sparse entry: we will not be able to validate WAL data
if the WAL file is missing some pages.  So I see little point in
having code that tolerates pax headers if it doesn't also do that.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 00:47  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 0 replies; 35+ messages in thread

From: Thomas Munro @ 2026-04-03 00:47 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 3, 2026 at 1:11 PM Tom Lane <[email protected]> wrote:
> In any case, this is all completely moot if we don't write code to
> de-sparse a sparse entry: we will not be able to validate WAL data
> if the WAL file is missing some pages.  So I see little point in
> having code that tolerates pax headers if it doesn't also do that.

Yeah.  FWIW I spent a few hours hacking on that the other day and
could decode many files, but I now realise that the task was made more
difficult by a problem you fixed: without header validation, small
mistakes resulted in corruption or went bananas.  With that now
addressed, I hope I can get it into shape and propose it for the next
cycle...

For what it's worth, I was just speculating about how one might
reasonably handle unrecognised *non-standard* header names, not the
POSIX-standardised ones which, you're right, we'd probably need to
grok properly.  If we assumed reasonable engineering decisions
following (what I understood to be) the spirit of pax, maybe we could
assume that new non-standard headers either don't affect file contents
and thus could be ignored (think: GNU.windows.permissions=...), or do
affect file contents but have measures in place to prevent unknown
encodings from being exposed to unsuspecting software (think:
deathstation.byte=9bit).  That's a position we could choose to take,
anyway, in the absence of a crystal ball...  Fortunately there aren't
really many implementations of POSIX left, so it's not like we're
dealing with the Fermi Paradox here...





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 00:59  Thomas Munro <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-03 00:59 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 3, 2026 at 1:11 PM Tom Lane <[email protected]> wrote:
> Cool.  LGTM.

Thanks.  I'll go ahead and push v2 shortly, which includes Sami's
change (with credit), unless you have any better ideas for that bit.





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 01:11  Tom Lane <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Tom Lane @ 2026-04-03 01:11 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Thomas Munro <[email protected]> writes:
> Thanks.  I'll go ahead and push v2 shortly, which includes Sami's
> change (with credit), unless you have any better ideas for that bit.

Not really.  I wondered for a bit if it was smart to create tar files
whose contents would appear root-owned, but it could only matter if
somebody extracted such a file as root, and I don't see a plausible
pathway for that to happen, since these are merely transient test
files.

We could dodge that hazard by using, say, "--owner=1" but I'm
not sure whether that'd introduce its own problems.  In any case,
walmethods.c's tar_open_for_write is already filling in zeroes
there.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 11:37  Nazir Bilal Yavuz <[email protected]>
  parent: Tom Lane <[email protected]>
  3 siblings, 1 reply; 35+ messages in thread

From: Nazir Bilal Yavuz @ 2026-04-03 11:37 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Thomas Munro <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

On Thu, 2 Apr 2026 at 21:48, Tom Lane <[email protected]> wrote:
>
>  FORMAT is one of the following:
>     gnu                      GNU tar 1.13.x format
>     oldgnu                   GNU format as per tar <= 1.12
>     pax                      POSIX 1003.1-2001 (pax) format
>     posix                    same as pax
>     ustar                    POSIX 1003.1-1988 (ustar) format
>     v7                       old V7 tar format
> ...
> *This* tar defaults to:
> --format=posix -f- -b20 --quoting-style=escape --rmt-command=/usr/bin/rmt
> --rsh-command=/usr/bin/ssh
>
> So there you have it: pax format by default.  This is unlike what
> I see on RHEL or Fedora:

It seems that the problem also applies to OpenBSD [1]:

-F format
    Specify the output archive format, with the default format being
pax. tar currently supports the following formats:

OpenBSD CI tasks started to fail [2] after bc30c704ad with the errors:

```
Listing only the last 100 lines from a long log.
#   at /home/postgres/postgres/src/bin/pg_waldump/t/001_basic.pl line 440.
#          got: 'pg_waldump: error: pax extensions to tar format are
not supported

#   Failed test 'corrupt backup fails verification: extra_file: matches'
#   at /home/postgres/postgres/src/bin/pg_verifybackup/t/003_corruption.pl
line 198.
#                   'pg_verifybackup: error: pax extensions to tar
format are not supported

Summary of Failures:
239/381 postgresql:pg_waldump / pg_waldump/001_basic
                 ERROR            18.58s   exit status 84
225/381 postgresql:pg_verifybackup / pg_verifybackup/003_corruption
                 ERROR            45.12s   exit status 8
```

I also tried Thomas'
"v2-0001-Improve-tar-portability-logic-from-ebba64c0" [3] but it
didn't fix the problem on OpenBSD [4].

[1] https://man.openbsd.org/tar#F
[2] https://cirrus-ci.com/task/5439721360326656
[3] https://postgr.es/m/CA%2BhUKGLMkv_fnGXzVRO8qbx5uHs-qMn151GTJYCfn9w1ZamGNg%40mail.gmail.com
[4] https://cirrus-ci.com/task/5602126958690304

-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 14:37  Thomas Munro <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-03 14:37 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Apr 4, 2026 at 12:38 AM Nazir Bilal Yavuz <[email protected]> wrote:
> I also tried Thomas'
> "v2-0001-Improve-tar-portability-logic-from-ebba64c0" [3] but it
> didn't fix the problem on OpenBSD [4].

Apparently it wants -F ustar, like this.  Funny that it passed on the
build farm animals though.  Oh, it looks like they changed the default
fairly recently.

https://undeadly.org/cgi?action=article;sid=20240417053301


Attachments:

  [text/x-patch] v3-0001-Improve-tar-portability-logic-from-ebba64c0.patch (2.4K, 2-v3-0001-Improve-tar-portability-logic-from-ebba64c0.patch)
  download | inline diff:
From 4b69f7d64798a5a55fdbb2447cee7a0d4326280c Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Fri, 3 Apr 2026 12:03:56 +1300
Subject: [PATCH v3] Improve tar portability logic from ebba64c0.

* GNU and BSD tar both understand --format=ustar.
* Windows lacks /dev/null, but perl knows its local name.
* ustar format doesn't like large UID/GID values, so set them to 0.
* OpenBSD has its own tar which understands -F ustar.

Backpatch-through: 18
Co-authored-by: Thomas Munro <[email protected]>
Co-authored-by: Sami Imseih <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com
---
 src/test/perl/PostgreSQL/Test/Utils.pm | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 120999f6ac9..077305cd790 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1328,21 +1328,24 @@ sub tar_portability_options
 
 	# GNU tar typically produces gnu-format archives, which we can read fine.
 	# But some platforms configure it to default to posix/pax format, and
-	# apparently they enable --sparse too.  Override that.
-	if (system("$tar --format=ustar -c -O /dev/null >/dev/null 2>/dev/null")
+	# apparently they enable --sparse too.  BSD tar (libarchive) does something
+	# similar.
+	#
+	# ustar format supports UIDs only up to 2^21 (2097151).  Override
+	# owner/group to avoid failures on systems where the running user's UID/GID
+	# exceeds that limit.
+	my $devnull = File::Spec->devnull();
+	if (system("$tar --format=ustar --owner=0 --group=0 -c $devnull >$devnull 2>$devnull")
 		== 0)
 	{
-		push(@tar_p_flags, "--format=ustar");
+		push(@tar_p_flags, "--format=ustar", "--owner=0", "--group=0");
 	}
 
-	# bsdtar also archives sparse files by default, but it spells the switch
-	# to disable that differently.
-	if (system("$tar --no-read-sparse -c - /dev/null >/dev/null 2>/dev/null")
-		== 0)
+	# OpenBSD's tar also defaults to pax, but spells the switch differently.
+	if (system("$tar -F ustar -c $devnull >$devnull 2>$devnull"))
 	{
-		push(@tar_p_flags, "--no-read-sparse");
+		push(@tar_p_flags, "-F", "ustar");
 	}
-
 	return @tar_p_flags;
 }
 
-- 
2.47.3



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 14:59  Sami Imseih <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Sami Imseih @ 2026-04-03 14:59 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; +Cc: Nazir Bilal Yavuz <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

> On Sat, Apr 4, 2026 at 12:38 AM Nazir Bilal Yavuz <[email protected]> wrote:
> > I also tried Thomas'
> > "v2-0001-Improve-tar-portability-logic-from-ebba64c0" [3] but it
> > didn't fix the problem on OpenBSD [4].
>
> Apparently it wants -F ustar, like this.  Funny that it passed on the
> build farm animals though.  Oh, it looks like they changed the default
> fairly recently.

LGTM with just a correction of my earlier comment.

< +    # ustar format supports UIDs only up to 2^21 (2097151).  Override
---
> +    # ustar format supports UIDs only up to 2^21 - 1 (2097151).  Override

--
Sami


Attachments:

  [application/octet-stream] v4-0001-Improve-tar-portability-logic-from-ebba64c0.patch (2.4K, 2-v4-0001-Improve-tar-portability-logic-from-ebba64c0.patch)
  download | inline diff:
From fe53cc6d93114a6700dc00a57795e53a0fa0a4ee Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Fri, 3 Apr 2026 12:03:56 +1300
Subject: [PATCH v4 1/1] Improve tar portability logic from ebba64c0.

* GNU and BSD tar both understand --format=ustar.
* Windows lacks /dev/null, but perl knows its local name.
* ustar format doesn't like large UID/GID values, so set them to 0.
* OpenBSD has its own tar which understands -F ustar.

Backpatch-through: 18
Co-authored-by: Thomas Munro <[email protected]>
Co-authored-by: Sami Imseih <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/3676229.1775170250%40sss.pgh.pa.us
Discussion: https://postgr.es/m/CAA5RZ0tt89MgNi4-0F4onH%2B-TFSsysFjMM-tBc6aXbuQv5xBXw%40mail.gmail.com
---
 src/test/perl/PostgreSQL/Test/Utils.pm | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 120999f6ac9..050037f0d93 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1328,21 +1328,24 @@ sub tar_portability_options
 
 	# GNU tar typically produces gnu-format archives, which we can read fine.
 	# But some platforms configure it to default to posix/pax format, and
-	# apparently they enable --sparse too.  Override that.
-	if (system("$tar --format=ustar -c -O /dev/null >/dev/null 2>/dev/null")
+	# apparently they enable --sparse too.  BSD tar (libarchive) does something
+	# similar.
+	#
+	# ustar format supports UIDs only up to 2^21 - 1 (2097151).  Override
+	# owner/group to avoid failures on systems where the running user's UID/GID
+	# exceeds that limit.
+	my $devnull = File::Spec->devnull();
+	if (system("$tar --format=ustar --owner=0 --group=0 -c $devnull >$devnull 2>$devnull")
 		== 0)
 	{
-		push(@tar_p_flags, "--format=ustar");
+		push(@tar_p_flags, "--format=ustar", "--owner=0", "--group=0");
 	}
 
-	# bsdtar also archives sparse files by default, but it spells the switch
-	# to disable that differently.
-	if (system("$tar --no-read-sparse -c - /dev/null >/dev/null 2>/dev/null")
-		== 0)
+	# OpenBSD's tar also defaults to pax, but spells the switch differently.
+	if (system("$tar -F ustar -c $devnull >$devnull 2>$devnull"))
 	{
-		push(@tar_p_flags, "--no-read-sparse");
+		push(@tar_p_flags, "-F", "ustar");
 	}
-
 	return @tar_p_flags;
 }
 
-- 
2.50.1



^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-03 15:30  Nazir Bilal Yavuz <[email protected]>
  parent: Sami Imseih <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Nazir Bilal Yavuz @ 2026-04-03 15:30 UTC (permalink / raw)
  To: Sami Imseih <[email protected]>; +Cc: Thomas Munro <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi,

On Fri, 3 Apr 2026 at 17:59, Sami Imseih <[email protected]> wrote:
>
> Hi,
>
> > On Sat, Apr 4, 2026 at 12:38 AM Nazir Bilal Yavuz <[email protected]> wrote:
> > > I also tried Thomas'
> > > "v2-0001-Improve-tar-portability-logic-from-ebba64c0" [3] but it
> > > didn't fix the problem on OpenBSD [4].
> >
> > Apparently it wants -F ustar, like this.  Funny that it passed on the
> > build farm animals though.  Oh, it looks like they changed the default
> > fairly recently.
>
> LGTM with just a correction of my earlier comment.

Thanks for the patches! I confirm that both v3 and v4 fix the problem
for OpenBSD CI.


-- 
Regards,
Nazir Bilal Yavuz
Microsoft





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-04 01:07  Thomas Munro <[email protected]>
  parent: Nazir Bilal Yavuz <[email protected]>
  0 siblings, 1 reply; 35+ messages in thread

From: Thomas Munro @ 2026-04-04 01:07 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Sat, Apr 4, 2026 at 4:30 AM Nazir Bilal Yavuz <[email protected]> wrote:
> On Fri, 3 Apr 2026 at 17:59, Sami Imseih <[email protected]> wrote:
> > > On Sat, Apr 4, 2026 at 12:38 AM Nazir Bilal Yavuz <[email protected]> wrote:
> > > > I also tried Thomas'
> > > > "v2-0001-Improve-tar-portability-logic-from-ebba64c0" [3] but it
> > > > didn't fix the problem on OpenBSD [4].
> > >
> > > Apparently it wants -F ustar, like this.  Funny that it passed on the
> > > build farm animals though.  Oh, it looks like they changed the default
> > > fairly recently.
> >
> > LGTM with just a correction of my earlier comment.
>
> Thanks for the patches! I confirm that both v3 and v4 fix the problem
> for OpenBSD CI.

Pushed, after testing on an OpenBSD VM and making some corrections:

* I'd screwed up the test command line in a way that worked by coincidence
** OpenBSD tar writes to a tape device by default, so use -f /dev/null
** I'd forgotten == 0, so the result was inverted, hiding that screwup
* -f /dev/null is a better form for all of them because the default
destination is a build option
* needed elsif instead of if, or BSD tar finished up getting both
--format=ustar and -F ustar
* ran perltidy, keeping only the hunks due to this patch

CI passes and shows "212 subtests passed" for all five Unixen +
Windows/mingw, but only "156 subtests passed" for Windows/MSVC.
.cirrus.tasks.yml appears to use the same $TAR for both, namely the
system tar, so I think we can say that *this* thing is working, but
something else might be wrong with our scripting glue somewhere?

The other OSes on our list are AIX and Solaris.  From a quick look at
their manuals, I don't foresee issues with pax or large UIDs.

Hopefully that covers everything!





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-10 07:57  Thomas Munro <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 2 replies; 35+ messages in thread

From: Thomas Munro @ 2026-04-10 07:57 UTC (permalink / raw)
  To: Nazir Bilal Yavuz <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Nitpicking code review for commit b15c1513:

+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
+                     Size count, char *readBuff)

I thought we agreed to stop using Size for new code?  size_t has been
around since C89.

+               pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: rea
d %lld of %lld bytes",
+                        fname, privateInfo->archive_name,
+                        (long long int) (count - nbytes),
+                        (long long int) count);

Why cast to long long int?  That's the sort of thing we used to have
to do for int64 (but no longer), but here it's size_t anyway.  %zu has
been around since C99.





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: pg_waldump: support decoding of WAL inside tarfile
@ 2026-04-10 12:57  Andrew Dunstan <[email protected]>
  parent: Thomas Munro <[email protected]>
  1 sibling, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2026-04-10 12:57 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Andres Freund <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>


On 2026-04-10 Fr 3:57 AM, Thomas Munro wrote:
> Nitpicking code review for commit b15c1513:
>
> +read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
> +                     Size count, char *readBuff)
>
> I thought we agreed to stop using Size for new code?  size_t has been
> around since C89.


Must have missed the memo :-(


>
> +               pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: rea
> d %lld of %lld bytes",
> +                        fname, privateInfo->archive_name,
> +                        (long long int) (count - nbytes),
> +                        (long long int) count);
>
> Why cast to long long int?  That's the sort of thing we used to have
> to do for int64 (but no longer), but here it's size_t anyway.  %zu has
> been around since C99.


will fix. Thanks for looking.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Documenting coding style
@ 2026-04-10 14:17  Andres Freund <[email protected]>
  parent: Thomas Munro <[email protected]>
  1 sibling, 2 replies; 35+ messages in thread

From: Andres Freund @ 2026-04-10 14:17 UTC (permalink / raw)
  To: Thomas Munro <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

Hi, 

On April 10, 2026 3:57:56 AM EDT, Thomas Munro <[email protected]> wrote:
>Nitpicking code review for commit b15c1513:
>
>+read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
>+                     Size count, char *readBuff)
>
>I thought we agreed to stop using Size for new code?  size_t has been
>around since C89.

We really need to start documenting some of this stuff somewhere. Deciding something a few years ago, deep in a thread, won't actually help anyone but the participants (and maybe not even them) to know about it. 

I wonder if we should move the coding style section out of sgml into a top-level CODING_STYLE.md or something like that.

And then obviously add things like Size being deprecated. 

Greetings, 

Andres 


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Documenting coding style
@ 2026-04-10 14:45  Andrew Dunstan <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 0 replies; 35+ messages in thread

From: Andrew Dunstan @ 2026-04-10 14:45 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; Thomas Munro <[email protected]>; Nazir Bilal Yavuz <[email protected]>; +Cc: Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Michael Paquier <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>


On 2026-04-10 Fr 10:17 AM, Andres Freund wrote:
> Hi,
>
> On April 10, 2026 3:57:56 AM EDT, Thomas Munro <[email protected]> wrote:
>> Nitpicking code review for commit b15c1513:
>>
>> +read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr,
>> +                     Size count, char *readBuff)
>>
>> I thought we agreed to stop using Size for new code?  size_t has been
>> around since C89.
> We really need to start documenting some of this stuff somewhere. Deciding something a few years ago, deep in a thread, won't actually help anyone but the participants (and maybe not even them) to know about it.
>
> I wonder if we should move the coding style section out of sgml into a top-level CODING_STYLE.md or something like that.
>
> And then obviously add things like Size being deprecated.
>

+many. Great idea.


cheers


andew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Documenting coding style
@ 2026-04-10 14:49  Nathan Bossart <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 1 reply; 35+ messages in thread

From: Nathan Bossart @ 2026-04-10 14:49 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Thomas Munro <[email protected]>; Nazir Bilal Yavuz <[email protected]>; Sami Imseih <[email protected]>; Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; Michael Paquier <[email protected]>; Andrew Dunstan <[email protected]>; Amul Sul <[email protected]>; Zsolt Parragi <[email protected]>; Robert Haas <[email protected]>; Chao Li <[email protected]>; Anthonin Bonnefoy <[email protected]>; Fujii Masao <[email protected]>; Jakub Wartak <[email protected]>; PostgreSQL Hackers <[email protected]>

On Fri, Apr 10, 2026 at 10:17:44AM -0400, Andres Freund wrote:
> On April 10, 2026 3:57:56 AM EDT, Thomas Munro <[email protected]> wrote:
>> I thought we agreed to stop using Size for new code?  size_t has been
>> around since C89.
> 
> We really need to start documenting some of this stuff somewhere.
> Deciding something a few years ago, deep in a thread, won't actually help
> anyone but the participants (and maybe not even them) to know about it..

I certainly didn't know this.  There's no comment in c.h, either.

> I wonder if we should move the coding style section out of sgml into a
> top-level CODING_STYLE.md or something like that.
> 
> And then obviously add things like Size being deprecated. 

Unless we're going to actually remove the typedef in the near future, I'm
not sure I'd support even marking it deprecated.  If we're going to keep it
around indefinitely, that's just going to become another source of nitpicks
when new contributors inevitably copy/paste some code from the aughts.  A
style page makes the situation a little better, but it's yet another thing
that folks have to remember.

To be clear, if someone proposed a patch that completely removed all traces
of Size, I'd likely support it.  There is indeed no reason not to use
size_t.  (I see that Size has been an alias for size_t since 1998 [0].)
But it's also quite heavily used, so I'd be fine with leaving it around and
considering it fully supported, too.

[0] https://postgr.es/c/0ad5d2a3a8

-- 
nathan





^ permalink  raw  reply  [nested|flat] 35+ messages in thread

* Re: Documenting coding style
@ 2026-04-15 18:10  Peter Eisentraut <[email protected]>
  parent: Nathan Bossart <[email protected]>
  0 siblings, 0 replies; 35+ messages in thread

From: Peter Eisentraut @ 2026-04-15 18:10 UTC (permalink / raw)
  To: Nathan Bossart <[email protected]>; Andres Freund <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>

On 10.04.26 16:49, Nathan Bossart wrote:
>> I wonder if we should move the coding style section out of sgml into a
>> top-level CODING_STYLE.md or something like that.
>>
>> And then obviously add things like Size being deprecated.
> Unless we're going to actually remove the typedef in the near future, I'm
> not sure I'd support even marking it deprecated.  If we're going to keep it
> around indefinitely, that's just going to become another source of nitpicks
> when new contributors inevitably copy/paste some code from the aughts.  A
> style page makes the situation a little better, but it's yet another thing
> that folks have to remember.

In this case, if one wanted to do something, one should at least enhance 
the code comment for Size, which is currently very bare and doesn't 
explain why the type exists.

As some counterexamples, some of which I wrote:

/*
  * Pointer
  *      Variable holding address of any memory resident object.
  *      (obsolescent; use void * or char *)
  */
typedef void *Pointer;

/* Historical names for types in <stdint.h>. */
typedef int8_t int8;
...

  * We require C11 and C++11, so static_assert() is expected to be there.
  * StaticAssertDecl() was previously used for portability, but it's now 
just a
  * plain wrapper and doesn't need to be used in new code.






^ permalink  raw  reply  [nested|flat] 35+ messages in thread


end of thread, other threads:[~2026-04-15 18:10 UTC | newest]

Thread overview: 35+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-01 02:05 Re: pg_waldump: support decoding of WAL inside tarfile Thomas Munro <[email protected]>
2026-04-01 10:39 ` Andrew Dunstan <[email protected]>
2026-04-01 13:26   ` Andres Freund <[email protected]>
2026-04-01 14:19     ` Andrew Dunstan <[email protected]>
2026-04-01 18:25 ` Tom Lane <[email protected]>
2026-04-02 00:16   ` Thomas Munro <[email protected]>
2026-04-02 11:36     ` Andrew Dunstan <[email protected]>
2026-04-02 01:22   ` Tom Lane <[email protected]>
2026-04-02 02:14     ` Thomas Munro <[email protected]>
2026-04-02 16:29       ` Tom Lane <[email protected]>
2026-04-02 17:13         ` Tomas Vondra <[email protected]>
2026-04-02 17:43         ` Tom Lane <[email protected]>
2026-04-02 18:08           ` Tomas Vondra <[email protected]>
2026-04-02 18:47             ` Tom Lane <[email protected]>
2026-04-02 19:15               ` Tom Lane <[email protected]>
2026-04-02 22:41               ` Thomas Munro <[email protected]>
2026-04-02 22:50               ` Tom Lane <[email protected]>
2026-04-02 23:49                 ` Thomas Munro <[email protected]>
2026-04-03 00:11                   ` Tom Lane <[email protected]>
2026-04-03 00:47                     ` Thomas Munro <[email protected]>
2026-04-03 00:59                     ` Thomas Munro <[email protected]>
2026-04-03 01:11                       ` Tom Lane <[email protected]>
2026-04-03 11:37               ` Nazir Bilal Yavuz <[email protected]>
2026-04-03 14:37                 ` Thomas Munro <[email protected]>
2026-04-03 14:59                   ` Sami Imseih <[email protected]>
2026-04-03 15:30                     ` Nazir Bilal Yavuz <[email protected]>
2026-04-04 01:07                       ` Thomas Munro <[email protected]>
2026-04-10 07:57                         ` Thomas Munro <[email protected]>
2026-04-10 12:57                           ` Andrew Dunstan <[email protected]>
2026-04-10 14:17                           ` Documenting coding style Andres Freund <[email protected]>
2026-04-10 14:45                             ` Re: Documenting coding style Andrew Dunstan <[email protected]>
2026-04-10 14:49                             ` Re: Documenting coding style Nathan Bossart <[email protected]>
2026-04-15 18:10                               ` Re: Documenting coding style Peter Eisentraut <[email protected]>
2026-04-02 23:43         ` Sami Imseih <[email protected]>
2026-04-03 00:07           ` Thomas Munro <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox