public inbox for [email protected]  
help / color / mirror / Atom feed
From: Thomas Munro <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Amul Sul <[email protected]>
Cc: Zsolt Parragi <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Anthonin Bonnefoy <[email protected]>
Cc: Fujii Masao <[email protected]>
Cc: Jakub Wartak <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: Mon, 30 Mar 2026 11:11:50 +1300
Message-ID: <CA+hUKGJyvdyWMC-RW1njqevD-q_gTbFq+DyDiFpUJVaG+DY20w@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<CAD5tBcLVWKnph3iB-VPuPKR0dCckOJRFZW2-4H7HTTmhw8-vOg@mail.gmail.com>
	<[email protected]>
	<[email protected]!!.pa.us>
	<CAD5tBcLsYDz+Nzx8MryjxiKaN3fGKd4ZgXuN1Jn=CYxw9dh+AA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<x2tknjejjouleunkqrvpnwn2tuulunybinycidefm3wmnsyhht@pw5uo3wrqx43>
	<CA+hUKGL2dppjO4o28ZY7n_LTWviKLAi-7KZ=tx5w2HGevCEYPA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On Mon, Mar 30, 2026 at 2:33 AM Tomas Vondra <[email protected]> wrote:
> On 3/29/26 00:12, Tom Lane wrote:
> > I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> > using zfs, and confirmed that this cowboy hack fixes it:
> >
>
> Interesting. Then I guess it has to be due to some difference in ufs vs.
> zfs, when handling sparse files. It might be useful to add a bit more
> variation here, and switch some of the animals to non-default
> filesystems (not just the FreeBSD ones, which we seem to have only two
> that run reasonably often). I'd bet most of the linux systems run on
> ext4/xfs, few on btrfs/zfs.

UFS does have sparse files (its ancestor invented them some time
around (time_t) 0), it just doesn't make them unless you tell it to.
PostgreSQL only does that if you set wal_init_zero=false.

ZFS is different because it creates holes automagically when you write
zeroes, at least if compression is enabled so it has to scan all your
bytes anyway.

I was curious to know if BTRFS does that too, or hides
zero-compression at some lower invisible level:

$ echo "hello" > 1MB-sparse.dat
$ truncate -s 512KB 1MB-sparse.dat
$ echo "world" >> 1MB-sparse.dat
$ truncate -s 1MB 1MB-sparse.dat
$ ls -l 1MB-sparse.dat
-rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat
$ du -hs 1MB-sparse.dat
8.0K    1MB-sparse.dat
$ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek
lseek(4, 0, SEEK_DATA)                  = 0
lseek(4, 0, SEEK_HOLE)                  = 4096
lseek(4, 4096, SEEK_DATA)               = 512000
lseek(4, 512000, SEEK_HOLE)             = 516096
lseek(4, 516096, SEEK_DATA)             = -1 ENXIO (No such device or address)

... so that's a yes, lseek sees holes that we didn't ask it to make,
just like on ZFS, but the rest of this trace of GNU tar -S -cf is
interesting:

lseek(5, 0, SEEK_SET)                   = 0
lseek(5, 0, SEEK_SET)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
lseek(4, 512000, SEEK_SET)              = 512000
lseek(4, 1000000, SEEK_SET)             = 1000000

It didn't write out PAX format!  Instead it replicated the holes into
the tar file itself with SEEK_SET.

$ strings foo.tar | grep Sparse

You have to add --format=posix to enable the GNU behaviour that BSD
tar is emulating by default:

$ tar --format=posix -S -cf foo.tar 1MB-sparse.dat
$ strings foo.tar | grep Sparse
./GNUSparseFile.4190/1MB-sparse.dat

I expected GNU tar to be forced to do that if writing to non-seekable
output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it
manages to write out only ~10KB of plain ustar format that it is able
to restore to the full 1MB apparent size using some other trick, but
... ENOTIME, I dunno how it's doing that.  Might be interesting to see
if pg_waldump can read it though, 'cause the bytes aren't all there.

BTW I confirmed that Apple tar does have -S by default too, it's just
that APFS doesn't make holes magically, so this test would presumably
have broken on a Mac if wal_init_zero had been forced to zero (not
tested).

Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
unlikely to hit this in the wild, and the symptom is a confusing error
in a maintenance tool, not corruption, so I don't think this is a big
deal.  I might still try teaching the astreamer code to understand PAX
1.0 when it sees it in the next cycle though, for the benefit of
FreeBSD users.  A quick and dirty version could probably just unmangle
the name and skip the first block of data, since any valid WAL file
will not begin with a hole and valid WAL data will end at the first
hole and fail our verification, but of course a real implementation
should read the map properly[1]...

[1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html





view thread (85+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_waldump: support decoding of WAL inside tarfile
  In-Reply-To: <CA+hUKGJyvdyWMC-RW1njqevD-q_gTbFq+DyDiFpUJVaG+DY20w@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox