Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6yMz-004pnv-1x for pgsql-hackers@arkaria.postgresql.org; Sun, 29 Mar 2026 22:12:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w6yMx-00HRp2-0h for pgsql-hackers@arkaria.postgresql.org; Sun, 29 Mar 2026 22:12:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w6yMw-00HRot-2a for pgsql-hackers@lists.postgresql.org; Sun, 29 Mar 2026 22:12:31 +0000 Received: from mail-dy1-x1335.google.com ([2607:f8b0:4864:20::1335]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w6yMv-00000001huL-0P38 for pgsql-hackers@lists.postgresql.org; Sun, 29 Mar 2026 22:12:30 +0000 Received: by mail-dy1-x1335.google.com with SMTP id 5a478bee46e88-2c4ae0b1992so62481eec.2 for ; Sun, 29 Mar 2026 15:12:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774822348; cv=none; d=google.com; s=arc-20240605; b=OCdR3mfV+10RcemVB+Za29gFlFbp+OgR9n2z00NqPov/ucALipdvRTy0cQQ+pqOors OuzSrQb7m+oRRr75Dnh2H2wqHlxaybdrBcZM6bxrsmBgnAZIDsGZxVOMH9L8rflZYt2y uq4nMq3XKrsaiZ3JhDNmhw/efufJzRTwBIAqCiGo+LB+Q01mqef+2S1CSas5NiPWrSx6 RJDsrXSDqHpa/JxWglZzWkA7Iz43YkDrR/0Ba/pXwkBN+kbo2fWbUN11Hh6GaGbFnzRc /pWTs27siCE+15An2G9+O/PqMTI9vcNdi6vgYsuVszaJhHqwOiZXcLACrKQJFEIAlHVQ QriQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=jRnG9mgDlxJFP400E1kp6yo9iXDq6x/B7bMvBop4dgY=; fh=jjmGUvASXAL6Ts98a9XPvf39C8ZawKcagcuekb8zbb0=; b=V/a+OTx1yGzrtKQDmD2DI+rWSYEfL815Ff+bazfERuMWpuKAECT7qfqxNl6wM7uokX Uu4BvsHXS49p8qsPEMmT9qnXp+dTqC3fDB2kXH9k+9+2EAJqt2br1hQElVPqXfgLWG8I RoJG3Sh+A2jbMHg724oEPGDeudBqDM9B5TaeJ1yGXG0yXBtLK5tHrF6IB0DTFkbkz5Xi ROT9QrlLMutnW4Gswx3aCb8JWyp8EYIgS3aiWO5HyxF5KbCSzw5l7O7egiu2QTZC0xTN DW7BTr/k3V0mRFZt+HEYQDMddTzj4r8blCcYxFJ5Hm/5z+HGAA5DPJBSygooSJECpYE3 xpPA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1774822348; x=1775427148; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jRnG9mgDlxJFP400E1kp6yo9iXDq6x/B7bMvBop4dgY=; b=X+Qn5O5E210nvQBGhHdv/WCC+LtXgGdehMrDcz8IK16O5tGQ+WiQhQLcjEr59HmoxT W7dO9ZaWJQQ1udD2/aUxYZuINHGw1kA1OZYS2P5iyE2ZrNI/C6aHENPt5P8D4BjcbfoK c5n+3ukMat6opX++l8W9bY2BwB54Pu+d55uks7vhY5lTymNd2ritte7TWgketXYRAlF+ 8sIxQSxUfnnaBh2kOj1KEhrAvCk5H6BP7Bk4aOHYqjAK2n9ae5hGJf8kT3ANygduZs0h WOIX8Tkcc/l38KmApFw/zYp1YaUlft2wkhUb8gIa96020QPvtACpwI64VrRyrMfMmv7Q 0NBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774822348; x=1775427148; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=jRnG9mgDlxJFP400E1kp6yo9iXDq6x/B7bMvBop4dgY=; b=Dr/w+uI9BxCjkf+Yp05VDGUtJMa9HU4Bl5atXGPitq764gqAysSEeELXH5kXt5h6ze CnCouXMxT8QbYwPzk0EEH9uzyl+Ic2IOG0BDvhPQj3lXk27oNeNWCtrQh1IjboGGYmLV dNeAWYevQyQX/kVO2GwobH302Z9bKo8gquWCAbZ6+QFO7JhCrR8hQwYUHCuqaSbpFX8w JnumekbKQop1jt5lkgFJdeXofR4WaUN/VwZ37u3Izi9OtmjPXUtWA/GNzLDDQO/QEe2s HSI4FY7oIJ10PrkLcCvQOeHGL3bNut4d3XlcROQpyllTP0blgJYvcyP2WU51jjMxUKKd Fc5Q== X-Forwarded-Encrypted: i=1; AJvYcCX00qtiGbK+wEb32BAplEkqWdMEZMOBEdid6NA77GxJI0LEKC9OkXYjT+DsDZJa+ApQe0yvCAGdpMFJCPOg@lists.postgresql.org X-Gm-Message-State: AOJu0Yz5QqbHteuyMrlgKs8lZfS4mp3t5PjAOyWqqWv/hjMxSSzjNvCm eN5B4gumtUBfBI0uTpdHaq4lZJA8fBK4wp1XaA2hbPF8s0QWgJu7MbkLB6bE7mooJqLL9kfiSEs 84hvug5CZJN0V9ZYBzSQ/V3q0PhlmD6A= X-Gm-Gg: ATEYQzxpwgBlYwV5FfXoK6bgr/A64P/XKUaKEeniIFULucTAZLPKS9XLOe52/dyrEqM S+nncNpBVkg+wFQymu+Wr/GMDF/ygT3IjK3Y2MCLNHTnPPREUIoyPfDh9QeD6f4J00hRwX237oe ueWXtqR/hfOmCtEl1wW1xBOGuxtDkC+0WGlM7tZnZrzsKMC7MKsIf3WQUjJOnBOcbt+UGQ5eAJj 7w0yYoAAOAUWcLLPbrhvRVzR76YKIsVHKOvt+NiIffSe1RhLO2tIwuLfbNSHKD0U2+2MkutwYr/ xwX7ycdpuqbQBXBFv+9W9a+00DV+CmNBcZENKKyHQKcRDv2jaGvX7R6+1aPOC3An X-Received: by 2002:a05:7300:a485:b0:2bd:d8e6:90a0 with SMTP id 5a478bee46e88-2c185f3293dmr2330453eec.3.1774822348023; Sun, 29 Mar 2026 15:12:28 -0700 (PDT) MIME-Version: 1.0 References: <2250061.1774104346@sss.pgh.pa.us> <2555285.1774131847@sss.pgh.pa.us> <2609460.1774153487@sss.pgh.pa.us> <2790913.1774200584@sss.pgh.pa.us> <2880042.1774203473@sss.pgh!!.pa.us> <3341199.1774221191@sss.pgh.pa.us> <3424809.1774234940@sss.pgh.pa.us> <1624716.1774736283@sss.pgh.pa.us> <1626907.1774737417@sss.pgh.pa.us> <97a382c0-1f19-4ea0-951f-e37e6abc34a3@vondra.me> <1630755.1774739531@sss.pgh.pa.us> In-Reply-To: From: Thomas Munro Date: Mon, 30 Mar 2026 11:11:50 +1300 X-Gm-Features: AQROBzAxwD9DX25g1ccVpL7qH7rj611GxQzbT30e5Vf_OFLvDXjEQqtgaFVu9vo Message-ID: Subject: Re: pg_waldump: support decoding of WAL inside tarfile To: Tomas Vondra Cc: Tom Lane , Andres Freund , Michael Paquier , Andrew Dunstan , Amul Sul , Zsolt Parragi , Robert Haas , Chao Li , Anthonin Bonnefoy , Fujii Masao , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Mar 30, 2026 at 2:33=E2=80=AFAM Tomas Vondra wrot= e: > On 3/29/26 00:12, Tom Lane wrote: > > I've reproduced Thomas' failure on a local FreeBSD 15.0 image > > using zfs, and confirmed that this cowboy hack fixes it: > > > > Interesting. Then I guess it has to be due to some difference in ufs vs. > zfs, when handling sparse files. It might be useful to add a bit more > variation here, and switch some of the animals to non-default > filesystems (not just the FreeBSD ones, which we seem to have only two > that run reasonably often). I'd bet most of the linux systems run on > ext4/xfs, few on btrfs/zfs. UFS does have sparse files (its ancestor invented them some time around (time_t) 0), it just doesn't make them unless you tell it to. PostgreSQL only does that if you set wal_init_zero=3Dfalse. ZFS is different because it creates holes automagically when you write zeroes, at least if compression is enabled so it has to scan all your bytes anyway. I was curious to know if BTRFS does that too, or hides zero-compression at some lower invisible level: $ echo "hello" > 1MB-sparse.dat $ truncate -s 512KB 1MB-sparse.dat $ echo "world" >> 1MB-sparse.dat $ truncate -s 1MB 1MB-sparse.dat $ ls -l 1MB-sparse.dat -rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat $ du -hs 1MB-sparse.dat 8.0K 1MB-sparse.dat $ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek lseek(4, 0, SEEK_DATA) =3D 0 lseek(4, 0, SEEK_HOLE) =3D 4096 lseek(4, 4096, SEEK_DATA) =3D 512000 lseek(4, 512000, SEEK_HOLE) =3D 516096 lseek(4, 516096, SEEK_DATA) =3D -1 ENXIO (No such device or add= ress) ... so that's a yes, lseek sees holes that we didn't ask it to make, just like on ZFS, but the rest of this trace of GNU tar -S -cf is interesting: lseek(5, 0, SEEK_SET) =3D 0 lseek(5, 0, SEEK_SET) =3D 0 lseek(4, 0, SEEK_SET) =3D 0 lseek(4, 512000, SEEK_SET) =3D 512000 lseek(4, 1000000, SEEK_SET) =3D 1000000 It didn't write out PAX format! Instead it replicated the holes into the tar file itself with SEEK_SET. $ strings foo.tar | grep Sparse You have to add --format=3Dposix to enable the GNU behaviour that BSD tar is emulating by default: $ tar --format=3Dposix -S -cf foo.tar 1MB-sparse.dat $ strings foo.tar | grep Sparse ./GNUSparseFile.4190/1MB-sparse.dat I expected GNU tar to be forced to do that if writing to non-seekable output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it manages to write out only ~10KB of plain ustar format that it is able to restore to the full 1MB apparent size using some other trick, but ... ENOTIME, I dunno how it's doing that. Might be interesting to see if pg_waldump can read it though, 'cause the bytes aren't all there. BTW I confirmed that Apple tar does have -S by default too, it's just that APFS doesn't make holes magically, so this test would presumably have broken on a Mac if wal_init_zero had been forced to zero (not tested). Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty unlikely to hit this in the wild, and the symptom is a confusing error in a maintenance tool, not corruption, so I don't think this is a big deal. I might still try teaching the astreamer code to understand PAX 1.0 when it sees it in the next cycle though, for the benefit of FreeBSD users. A quick and dirty version could probably just unmangle the name and skip the first block of data, since any valid WAL file will not begin with a hole and valid WAL data will end at the first hole and fail our verification, but of course a real implementation should read the map properly[1]... [1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html