Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vndJu-009gTa-13 for pgsql-hackers@arkaria.postgresql.org; Wed, 04 Feb 2026 13:53:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vndJr-00BQjk-2A for pgsql-hackers@arkaria.postgresql.org; Wed, 04 Feb 2026 13:53:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vndJr-00BQjS-0P for pgsql-hackers@lists.postgresql.org; Wed, 04 Feb 2026 13:53:23 +0000 Received: from mail-pj1-x102c.google.com ([2607:f8b0:4864:20::102c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vndJo-00000000WwH-1l03 for pgsql-hackers@lists.postgresql.org; Wed, 04 Feb 2026 13:53:21 +0000 Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-352f00d0e83so3363255a91.2 for ; Wed, 04 Feb 2026 05:53:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770213199; cv=none; d=google.com; s=arc-20240605; b=Ydo0sk0hlAIpPUGZI0PsF5/k4EdZLy5DuYeQYlR6vG0qbx/JXbj3BOCkbvLfuG2t7d Qld+VIgxLldkCKNbqoth+Z1TjLKCrFeukfCZqx0pMw06x1LQ8fQY5ZLFpiyhidfyEdUC BdjAgNm9k9FoM5S9A4HoQ8bRyg5njuEEgDlKwboSUlqrkmK2qI3bsF4ASzl3UI99noBR fqnnsY5OzJtXu/VQcof7sRQKsxagYZtOX6EXryKOmWk/GlTjh5ahNQQf7U2tR65MbykX Dcko+B7/NDeKv7MdLOyoBu9/g0NeDNYhdTizVEAGBrjPLQbZV0qr+URAHAx8ZF1xm6Zg yYmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=1hveVeY08COurW5iiTapH+6cw70N8BOGix3EMJcJpNM=; fh=vZ4XS14sNEvnlk+UJfyelKxYjziy6EjnD7lAEIElsW4=; b=ZUxJcZyK2Af+Ws8kF9gdnNUxTsB8X5FciOC+ZSJGnKyRkJMu0MxymN2afCmsyD1pMs ow4bUxP3CfczuOBZSbuJPEHxOzFRi45DEFcwOEinWENXMjwC24Jsl2BrdDJ6hzVmc6eP 1usyDonMaudrsSn7QW5Wmk4uAofavtnkvSYIVVCF5dzlOc9AdfLOjef6FKJN+/YtkaFg kFqjzvlQnV5V4xN/vZ9UtKW67WwC69drml6qMa5+nOZJ/+Rbbilj2XhdNt0h8Bu7Z9+x EEvXYHcKb2zQNchD0t55+dpZcfSjNVCa4/u2Cx1SpC43/jDPsOE2aTEtrXApV4XEL3fG E5og==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1770213199; x=1770817999; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1hveVeY08COurW5iiTapH+6cw70N8BOGix3EMJcJpNM=; b=XMFLEVVdEXTsVgZNgr077ygI5cn99H9BpY5iafJHyy2AwFDhB1nBmy6cZXzoj6mvaD plGt0D5QttG0/Y4a/Va5MI/ElqYZ+U/257HKj9q41wtdRui707WvQL7Mequvx8Yp2QHb hgx1EgtizNxUOMfKneRZICzC9sViv/lfrzAkdfDDwjLJN/W3dT5d4F9I+nHSCOQfuIL+ 3mEn0D4u8g87q2A3UirsX6Gc/UawLqhEwHz+WOBUrYTpzSzJSG4ySeEV25dv0tCs9AYH hvwOXxo40p69vITGa58hyvqf0w+i5bv8Nq/bbely1IwlWPO7GJAIMyteLBfHISYBfJnL 8+FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770213199; x=1770817999; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=1hveVeY08COurW5iiTapH+6cw70N8BOGix3EMJcJpNM=; b=vR9FvJYKG83+rS9jdPvSEJjVes2e8VAY6/7GDdL5dCk2X9y+hLWQBWHxOJWO4TPtDm t4KM7GjO/V0FC8SIsDS1UWdb4HurVNrMULozxhAICI3CcHaOG8o56nwRfvcMf9tpbIob JrtAXqGve4ReLu6BqlzbmjE/L5fvVlGjfTeyhSvCnkL7KHxiKIx03wbkXT8QNVJQEZ1I vOIm0tFUasRWBaXBC1vlL0+XcRC5B0uWwhiXWGRFlpr1xaCIbFvUdab6+HdeQtI5zHhk kjEdxnSeExLapxCXI2TX3AbtUR9hpCHRP78pSSEXrlcOu26eATaqKQ2WKv6yqW2qSBkN Vgpw== X-Forwarded-Encrypted: i=1; AJvYcCWqKYvBwjBwaYT/1AxGXKdvfJklDdH90YCm1JOgSzk4fJ/mEQneSUn2X1FFvvD0ZgWSRvAfHaqUbXayfiZu@lists.postgresql.org X-Gm-Message-State: AOJu0YxyvOMEnSLbcHeXT+1jC6C2hWUZb5K3nF3MwjgLW4OOvgfKzKLf mwp7lf0Hy/OcwqcnaWlibsI8jGhStc4q6hgXvW2Zptl9WNza1f+pHhkrOIGxzFyypgMM3/+uBUE ofE0BDBrtD9cJy1w5C4F//RJIYMFukBc= X-Gm-Gg: AZuq6aI0bAiY2YoqvBcPgFWz1aHJimrzmVK9AXqygDEMLkU8R/zLxVzNWplRp+hU1JD zg39qlBNFyb1UwafVzcHdVPY9UCHT8r2oZKAV+/09dJQrxdHH3D4Vbry+QiRG6Sg79MQmtmD80B 4ZDBleRQ3/qT5vnMxtaY/iVld9HDrT7XU7NY2HdTswCwWKA6HlslBdcN05VQdtuoO52iXCG6vMa h51mAM5sJf6e9G1wlzO4LB/qnvIqXdFqgNtvQAMwnJu58lxIcn82Yqg+OTQP6+n7xPDifzI X-Received: by 2002:a17:90b:3910:b0:34c:3501:d118 with SMTP id 98e67ed59e1d1-354870cff6emr2391016a91.1.1770213199144; Wed, 04 Feb 2026 05:53:19 -0800 (PST) MIME-Version: 1.0 References: <731ADE6F-01C5-4996-BAEE-5851DFC3F502@gmail.com> <3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com> In-Reply-To: <3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.com> From: Amul Sul Date: Wed, 4 Feb 2026 19:22:41 +0530 X-Gm-Features: AZwV_QjJjwHIpnDxQ-QCYXQAHta2z5RzpnIQCnJcLs9lvbe6-PXIU8cKxGmeiag Message-ID: Subject: Re: pg_waldump: support decoding of WAL inside tarfile To: Euler Taveira Cc: Robert Haas , Chao Li , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jan 28, 2026 at 8:32=E2=80=AFAM Euler Taveira w= rote: > > On Tue, Jan 27, 2026, at 9:06 AM, Amul Sul wrote: > > Hi Euler, Thanks for looking into this, and apologies for the delayed response due to some unavoidable circumstances. The updated version of the patch could take a bit more time, as I am trying to fix and improve the code based on the concerns Robert raised in his previous post. > > I have included your patch (removing WalSegSz) to the attached patch > > set as 0001; all my subsequent patches are now bumped by one. > > > 0003. If this is a refactor, you should move the unlikely logic (if neces= sary) > to 0005. > Yes, that can be done. I have kept them separate for now to make the review easier. I can merge them once the patch is ready for the committer stage, unless the committer prefers to keep them as separate commit. > 0004. You mentioned in the commit message that this patch should be merge= d into > 0005. Reviewing it is a bit hard if you just looked at the patch. However= , it > is clear if you use 'git show -w $HASH0004'. I would expect that this pat= ch > contains > > +my $tar =3D $ENV{TAR}; > + > > ... > > + skip "tar command is not available", 3 > + if !defined $tar; > > because you add the SKIP. As a refactor I expect it to work independently= . > Okay, will do it in the next updated version. > 0005. > > + pg_log_error("unnecessary command-line arguments specified wi= th tar archive (first is \"%s\")", > + argv[optind]); > > Which command-line arguments? This message isn't clear. How about using "= option > %s cannot be used together with tar archive". > In the error message, we simply print only the first one -- we do have similar error messages that use "command-line arguments". > - if (waldir !=3D NULL) > + if (walpath !=3D NULL) > { > + /* validate path points to tar archive */ > + if (is_archive_file(walpath, &compression)) > + { > + char *fname =3D NULL; > + > + split_path(walpath, &waldir, &fname); > + > + private.archive_name =3D fname; > + is_archive =3D true; > + } > /* validate path points to directory */ > - if (!verify_directory(waldir)) > + else if (!verify_directory(walpath)) > > Maybe I'm confused about the fact that the function name is is_archive_fi= le() > and there is a variable called is_archive. Couldn't you test > private.archive_name !=3D NULL instead of using is_archive variable? > Yes, can do that. I hope that doesn't confuse anyone. > - if (!verify_directory(waldir)) > + else if (!verify_directory(walpath)) > { > pg_log_error("could not open directory \"%s\": %m", waldir); > goto bad_argument; > > This is not a problem of this patch but I just want to point out that it = should > be pg_fatal() just like similar code path a few lines below. > > + /* set segno to endsegno for check of --end */ > + segno =3D endsegno; > + } > + > + > + if (!XLByteInSeg(private.endptr, segno, WalSegSz) && > + private.endptr !=3D (segno + 1) * WalSegSz) > > 2 blank lines. Remove one. > That's pre-existing; I'm not sure if I should change this. > + * archive_waldump.c > + * A generic facility for reading WAL data from tar archives via arc= hive > + * streamer. > > The other tools (pg_basebackup and pg_verifybackup) that also use astream= er API > named this similar file as astreamer_SOMETHING.c. It seems a good idea to > follow the same pattern, no? Maybe astreamer_tar_archive.c or > astreamer_archive.c. > Robert responded to this -- thanks, Robert. > +/* Hash entry structure for holding WAL segment data read from the archi= ve */ > +typedef struct ArchivedWALEntry > +{ > + uint32 status; /* hash status */ > + XLogSegNo segno; /* hash key: WAL segment number */ > + TimeLineID timeline; /* timeline of this wal file */ > + > + StringInfoData buf; > + bool tmpseg_exists; /* spill file exists? */ > + > + int total_read; /* total read of archived WAL segment */ > +} ArchivedWALEntry; > > s/wal file/WAL file/ > Okay. > What buf is for? It is the only member that doesn't have a description. I= think > total_read gives the impression that you've read the file and that's the = number > of bytes it got. That's not true; it represents the accumulative length. = Maybe > read_len is a good candidate. > Okay. > +bool > +is_archive_file(const char *fname, pg_compress_algorithm *compression) > +{ > + int fname_len =3D strlen(fname); > + pg_compress_algorithm compress_algo; > + > > Why do you need an extra variable here? I think compress_algo is better t= han > compression. Well, I follow the practice of updating the output variable at the end to avoid any garbage assignment once all operations are successful. If there are strong objections to this, I am happy to change it. > Besides that, it is a good opportunity to move this function to a > common file (common/compression.c?) and use it in other places like > CreateBackupStreamer() -- pg_basebackup. > Yes, it would be a good idea to make it common, let me think about this. > + /* :. */ > + streamer =3D astreamer_tar_parser_new(streamer); > + > + /* Before that we must decompress, if archive is compressed. */ > + if (compression =3D=3D PG_COMPRESSION_GZIP) > > It reads better if you suppress 'Before that we must'. (I think you want = to say > 'After' instead of 'Before'.) > I meant to say 'Before'. This is the archive streamer stack, so the latter one is executed before it. > + /* Hash table storing WAL entries read from the archive */ > + ArchivedWAL_HTAB =3D ArchivedWAL_create(16, NULL); > > Any reason for 16? Comment says nothing about it. > It's an arbitrary choice; I will update the comment. > + if (read_archive_file(privateInfo, XLOG_BLCKSZ) =3D=3D 0) > + pg_fatal("could not find WAL in \"%s\" archive", > + privateInfo->archive_name); > > Could we have a better message here? The read_archive_file() is already t= esting > the error cases and you are testing the EOF case. I would use 'archive \"= %s\"' > or even 'archive file \"%s\". > Okay. > + /* Needed WAL yet to be decoded from archive, do the same */ > + while (1) > + { > + entry =3D privateInfo->cur_wal; > > This comment is not clear. Could you rephrase it? > Okay. > - dependencies: [frontend_code, lz4, zstd], > + dependencies: [frontend_code, lz4, zstd, libpq], > > Put libpq after frontend_code. > Okay. > + > + /* Fields required to read WAL from archive */ > + char *archive_name; /* Tar archive name */ > + int archive_fd; /* File descriptor for the open tar file = */ > + > + astreamer *archive_streamer; > + > + /* What the archive streamer is currently reading */ > + struct ArchivedWALEntry *cur_wal; > + > + /* > + * Although these values can be easily derived from startptr and endp= tr, > + * doing so repeatedly for each archived member would be inefficient,= as > + * it would involve recalculating and filtering out irrelevant WAL > + * segments. > + */ > + XLogSegNo startSegNo; > + XLogSegNo endSegNo; > > It is a matter of style but consistency is good. Per current style snake = case > is preferred instead of CamelCase for these last members. > Okay. > - @lines =3D test_pg_waldump('--path' =3D> $path); > + my @lines; > + @lines =3D test_pg_waldump($path); > > my @lines =3D test_pg_waldump($path); > Okay. > +/* Forward declaration */ > +struct ArchivedWALFile; > > Why don't you move ArchivedWALFile to pg_waldump.h? pg_waldump.h is inclu= ded > into archive_waldump.c. > The full structure is needed in archive_waldump.c only, so I added a forward declaration to declare the pointer inside XLogDumpPrivate. I believe we follow such practices elsewhere in the codebase as well. > 0006. As I said before I would avoid increasing the size of this patch if= an > ordered tar file is viable. Didn't look in deep this patch but I have a f= ew > suggestions: > > * I don't like tmpfile_exists as a member name. A suggestion is 'spilled'= . > Okay. > +#ifndef WIN32 > + if (chmod(fpath, pg_file_create_mode)) > + pg_fatal("could not set permissions on file \"%s\": %m", > + fpath); > +#endif > > s/on/of/ > My suggestion is to use the same sentence in initdb. > I think 'on' seems to be much more relevant than 'of,' and we do have 'could not set permissions on' error messages in other places as wel > could not change permissions of \"%s\": %m > > + pg_log_debug("temporarily exporting file \"%s\"", fpath); > > spilling to temporary file \"%s\" > Okay. > 0008. I'm concerned about this patch. It is breaking backward compatibili= ty if > you are using a long option (--wal-directory). Your proposal is a generic= word > that represents both cases (file and directory). I agree. However, I woul= dn't > remove --wal-directory from the tool. Instead, I would keep it with the s= ame > short option ('w') but add a sentence saying this long option is deprecat= ed and > will be removed in the future or even remove any traces of this long opti= on > from the help and documentation but silently accept the old long option. = I > prefer the latter because it is not a required argument so a deprecation > warning is not necessary IMO. > Yeah, that was discussed with Robert offline and we believe that it is better to make it more generalized; since we can now use the same option to accept both wal-directory and wal-archived. pg_waldump has much more generic options for the same, such as -- path=3DPATH. Regards, Amul