Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w86m4-000DgE-0h for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 01:23:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w86m2-003Hci-2A for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 01:23:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w86m2-003HcZ-17 for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 01:23:06 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w86lw-000000007FN-23BF for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 01:23:06 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 6321Mi503118180; Wed, 1 Apr 2026 21:22:44 -0400 From: Tom Lane To: Thomas Munro cc: Tomas Vondra , Andres Freund , Michael Paquier , Andrew Dunstan , Amul Sul , Zsolt Parragi , Robert Haas , Chao Li , Anthonin Bonnefoy , Fujii Masao , Jakub Wartak , PostgreSQL Hackers Subject: Re: pg_waldump: support decoding of WAL inside tarfile In-reply-to: References: <2250061.1774104346@sss.pgh.pa.us> <2555285.1774131847@sss.pgh.pa.us> <2609460.1774153487@sss.pgh.pa.us> <2790913.1774200584@sss.pgh.pa.us> <2880042.1774203473@sss.pgh!!.pa.us> <3341199.1774221191@sss.pgh.pa.us> <3424809.1774234940@sss.pgh.pa.us> <1624716.1774736283@sss.pgh.pa.us> <1626907.1774737417@sss.pgh.pa.us> <97a382c0-1f19-4ea0-951f-e37e6abc34a3@vondra.me> <1630755.1774739531@sss.pgh.pa.us> <1873141.1774823011@sss.pgh.pa.us> <3049460.1775067940@sss.pgh.pa.us> Comments: In-reply-to Thomas Munro message dated "Thu, 02 Apr 2026 13:16:17 +1300" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <3117977.1775092827.0@sss.pgh.pa.us> Content-Transfer-Encoding: 8bit Date: Wed, 01 Apr 2026 21:22:44 -0400 Message-ID: <3118179.1775092964@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="UTF-8" Content-ID: <3117977.1775092827.1@sss.pgh.pa.us> Content-Transfer-Encoding: 8bit Thomas Munro writes: > On Thu, Apr 2, 2026 at 7:25 AM Tom Lane wrote: >> Also, if we are admitting the possibility that what we are reading >> was made by a platform-supplied tar and not our own code, I think >> it verges on lunacy to behave as though unsupported typeflags are >> regular files. > Yeah, if this is the first time we parse files we didn't make then > that makes total sense. I was a bit unsure of that question when I > suggested we reject pax only after we've failed to find a file, in > case there are scenarios that work today with harmless ignorable pax > headers that don't change the file name. Of course this is all new so far as pg_waldump is concerned. I'm a bit unclear about whether pg_verifybackup's exposure is large enough to warrant back-patching any of this. Looking again at astreamer_tar.c, I suddenly realized that it doesn't do any meaningful input validation. So if you feed it junk input, you get garbage errors that aren't even predictable: $ head -c 32768 /dev/urandom >junk.tar $ pg_waldump --path junk.tar --start 1/0x10000000 pg_waldump: error: COPY stream ended before last file was finished $ head -c 32768 /dev/urandom >junk.tar $ pg_waldump --path junk.tar --start 1/0x10000000 pg_waldump: error: tar member has empty name $ head -c 32768 /dev/urandom >junk.tar $ pg_waldump --path junk.tar --start 1/0x10000000 pg_waldump: error: COPY stream ended before last file was finished $ head -c 32768 /dev/urandom >junk.tar $ pg_waldump --path junk.tar --start 1/0x10000000 pg_waldump: error: could not find WAL in archive "junk.tar" tar itself is considerably saner: $ tar tf junk.tar tar: This does not look like a tar archive tar: Skipping to next header tar: Exiting with failure status due to previous errors So I think we need something like the attached, in addition to what I sent before. This just makes astreamer_tar.c use the isValidTarHeader function that pg_dump already had. (I decided to const-ify isValidTarHeader's argument while moving it to a shared location, which in turn requires const-ifying tarChecksum.) regards, tom lane ------- =_aaaaaaaaaa0 Content-Type: text/x-diff; name="v1-validate-tar-headers.patch"; charset="us-ascii" Content-ID: <3117977.1775092827.2@sss.pgh.pa.us> Content-Description: v1-validate-tar-headers.patch Content-Transfer-Encoding: quoted-printable diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_bac= kup_archiver.c index 271a2c3e481..fecf6f2d1ce 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -44,6 +44,7 @@ #include "pg_backup_archiver.h" #include "pg_backup_db.h" #include "pg_backup_utils.h" +#include "pgtar.h" = #define TEXT_DUMP_HEADER "--\n-- PostgreSQL database dump\n--\n\n" #define TEXT_DUMPALL_HEADER "--\n-- PostgreSQL database cluster dump\n--\= n\n" @@ -2372,7 +2373,7 @@ _discoverArchiveFormat(ArchiveHandle *AH) } = if (!isValidTarHeader(AH->lookahead)) - pg_fatal("input file does not appear to be a valid archive"); + pg_fatal("input file does not appear to be a valid tar archive"); = AH->format =3D archTar; } diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_bac= kup_archiver.h index 365073b3eae..9c3aca6543a 100644 --- a/src/bin/pg_dump/pg_backup_archiver.h +++ b/src/bin/pg_dump/pg_backup_archiver.h @@ -465,8 +465,6 @@ extern void InitArchiveFmt_Null(ArchiveHandle *AH); extern void InitArchiveFmt_Directory(ArchiveHandle *AH); extern void InitArchiveFmt_Tar(ArchiveHandle *AH); = -extern bool isValidTarHeader(char *header); - extern void ReconnectToServer(ArchiveHandle *AH, const char *dbname); extern void IssueCommandPerBlob(ArchiveHandle *AH, TocEntry *te, const char *cmdBegin, const char *cmdEnd); diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_t= ar.c index d94d0de2a5d..a3879410c94 100644 --- a/src/bin/pg_dump/pg_backup_tar.c +++ b/src/bin/pg_dump/pg_backup_tar.c @@ -984,31 +984,6 @@ tarPrintf(TAR_MEMBER *th, const char *fmt,...) return (int) cnt; } = -bool -isValidTarHeader(char *header) -{ - int sum; - int chk =3D tarChecksum(header); - - sum =3D read_tar_number(&header[TAR_OFFSET_CHECKSUM], 8); - - if (sum !=3D chk) - return false; - - /* POSIX tar format */ - if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar\0", 6) =3D=3D 0 && - memcmp(&header[TAR_OFFSET_VERSION], "00", 2) =3D=3D 0) - return true; - /* GNU tar format */ - if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar \0", 8) =3D=3D 0) - return true; - /* not-quite-POSIX format written by pre-9.3 pg_dump */ - if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar00\0", 8) =3D=3D 0) - return true; - - return false; -} - /* Given the member, write the TAR header & copy the file */ static void _tarAddFile(ArchiveHandle *AH, TAR_MEMBER *th) diff --git a/src/fe_utils/astreamer_tar.c b/src/fe_utils/astreamer_tar.c index 3b094fc0328..8fa3329237d 100644 --- a/src/fe_utils/astreamer_tar.c +++ b/src/fe_utils/astreamer_tar.c @@ -260,7 +260,8 @@ astreamer_tar_parser_content(astreamer *streamer, astr= eamer_member *member, * Parse a file header within a tar stream. * * The return value is true if we found a file header and passed it on to= the - * next astreamer; it is false if we have reached the archive trailer. + * next astreamer; it is false if we have found the archive trailer. + * We throw error if we see invalid data. */ static bool astreamer_tar_header(astreamer_tar_parser *mystreamer) @@ -289,6 +290,12 @@ astreamer_tar_header(astreamer_tar_parser *mystreamer= ) if (!has_nonzero_byte) return false; = + /* + * Verify that we have a reasonable-looking header. + */ + if (!isValidTarHeader(buffer)) + pg_fatal("input file does not appear to be a valid tar archive"); + /* * Parse key fields out of the header. */ diff --git a/src/include/pgtar.h b/src/include/pgtar.h index eb93bdef5c4..55b8c7c77a4 100644 --- a/src/include/pgtar.h +++ b/src/include/pgtar.h @@ -68,7 +68,8 @@ extern enum tarError tarCreateHeader(char *h, const char= *filename, time_t mtime); extern uint64 read_tar_number(const char *s, int len); extern void print_tar_number(char *s, int len, uint64 val); -extern int tarChecksum(char *header); +extern int tarChecksum(const char *header); +extern bool isValidTarHeader(const char *header); = /* * Compute the number of padding bytes required for an entry in a tar diff --git a/src/port/tar.c b/src/port/tar.c index 592b4fb7b0f..db462b90292 100644 --- a/src/port/tar.c +++ b/src/port/tar.c @@ -87,7 +87,7 @@ read_tar_number(const char *s, int len) * be 512 bytes, per the tar standard. */ int -tarChecksum(char *header) +tarChecksum(const char *header) { int i, sum; @@ -104,6 +104,35 @@ tarChecksum(char *header) return sum; } = +/* + * Check validity of a tar header (assumed to be 512 bytes long). + * We verify the checksum and the magic number / version. + */ +bool +isValidTarHeader(const char *header) +{ + int sum; + int chk =3D tarChecksum(header); + + sum =3D read_tar_number(&header[TAR_OFFSET_CHECKSUM], 8); + + if (sum !=3D chk) + return false; + + /* POSIX tar format */ + if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar\0", 6) =3D=3D 0 && + memcmp(&header[TAR_OFFSET_VERSION], "00", 2) =3D=3D 0) + return true; + /* GNU tar format */ + if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar \0", 8) =3D=3D 0) + return true; + /* not-quite-POSIX format written by pre-9.3 pg_dump */ + if (memcmp(&header[TAR_OFFSET_MAGIC], "ustar00\0", 8) =3D=3D 0) + return true; + + return false; +} + = /* * Fill in the buffer pointed to by h with a tar format header. This buff= er ------- =_aaaaaaaaaa0--