Re: pg_waldump: support decoding of WAL inside tarfile

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andrew Dunstan <[email protected]>
To: Amul Sul <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Jakub Wartak <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: Wed, 4 Mar 2026 16:50:30 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAAJ_b95k4yWsVSbptTS0SF3thZBuQURvFCCAKwwz-0yyZhbnTQ@mail.gmail.com>
References: <CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com>
	<CA+Tgmob_DB9QHDOxnP7a5Y0yJdeGqY8YNi+uK_811y7cN4mxYA@mail.gmail.com>
	<CAAJ_b97VUiP-DbLNe-ddq64J_RiB4ZcPgAjHkJH-0dbzgjR++A@mail.gmail.com>
	<CA+TgmoYMtcZBaqy9r59eDapaDy3WOdepkFFURu9MV-x-kxEbKg@mail.gmail.com>
	<CAAJ_b95FOeW38gw-3BLmpdnTWHFimopTvf=eTObYUbTOC0x8qg@mail.gmail.com>
	<CA+Tgmob=3POOO8st-v-fCjKCKREQ=+gs5_PBQnoFeNBdERfuEg@mail.gmail.com>
	<CAAJ_b94=gtCeUKkGPUmPj_2SwHV+PiXQ8Mx-1RqfYj3pP3OwpA@mail.gmail.com>
	<CA+TgmoYuC-92tMxZHn-KY=ExF-YkHkd4yOfiMhi9upqk3M1DuA@mail.gmail.com>
	<CAAJ_b974evwS9+tmKVEV+ctEp1scRnLAMrmx9_pzLFHsPtr0hA@mail.gmail.com>
	<CAAJ_b95H+=D2P_fLAbePCF++FAXO0fsJnxFr5NKVe4at=ga6WA@mail.gmail.com>
	<CAAJ_b979p7Z-dBZhxBC3m2VPkYTvYzypMTWkYg2q-e1S5F_f-Q@mail.gmail.com>
	<CAAJ_b96csGv+TvdxK-zp1Zo6zrAxOJ4n-qnxiWO1f0Lk0X0N_g@mail.gmail.com>
	<[email protected]>
	<CAAJ_b95k4yWsVSbptTS0SF3thZBuQURvFCCAKwwz-0yyZhbnTQ@mail.gmail.com>


On 2026-03-04 We 7:52 AM, Amul Sul wrote:
> On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<[email protected]> wrote:
>>
>> On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:
>>> On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<[email protected]> wrote:
>>>> On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<[email protected]> wrote:
>>>>> On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<[email protected]> wrote:
>>>>>> On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<[email protected]> wrote:
>>>>>>> On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<[email protected]> wrote:
>>>>>>>> In the attached version, I am using the WAL segment name as the hash
>>>>>>>> key, which is much more straightforward. I have rewritten
>>>>>>>> read_archive_wal_page(), and it looks much cleaner than before. The
>>>>>>>> logic to discard irrelevant WAL files is still within
>>>>>>>> get_archive_wal_entry. I added an explanation for setting cur_wal to
>>>>>>>> NULL, which is now handled in the separate function I mentioned
>>>>>>>> previously.
>>>>>>>>
>>>>>>>> Kindly have a look at the attached version; let me know if you are
>>>>>>>> still not happy with the current approach for filtering/discarding
>>>>>>>> irrelevant WAL segments. It isn't much different from the previous
>>>>>>>> version, but I have tried to keep it in a separate routine for better
>>>>>>>> code readability, with comments to make it easier to understand. I
>>>>>>>> also added a comment for ArchivedWALFile.
>>>>>>> I feel like the division of labor between get_archive_wal_entry() and
>>>>>>> read_archive_wal_page() is odd. I noticed this in the last version,
>>>>>>> too, and it still seems to be the case. get_archive_wal_entry() first
>>>>>>> calls ArchivedWAL_lookup(). If that finds an entry, it just returns.
>>>>>>> If it doesn't, it loops until an entry for the requested file shows up
>>>>>>> and then returns it. Then control returns to read_archive_wal_page()
>>>>>>> which loops some more until we have all the data we need for the
>>>>>>> requested file. But it seems odd to me to have two separate loops
>>>>>>> here. I think that the first loop is going to call read_archive_file()
>>>>>>> until we find the beginning of the file that we care about and then
>>>>>>> the second one is going to call read_archive_file() some more until we
>>>>>>> have read enough of it to satisfy the request. It feels odd to me to
>>>>>>> do it that way, as if we told somebody to first wait until 9 o'clock
>>>>>>> and then wait another 30 minutes, instead of just telling them to wait
>>>>>>> until 9:30. I realize it's not quite the same thing, because apart
>>>>>>> from calling read_archive_file(), the two loops do different things,
>>>>>>> but I still think it looks odd.
>>>>>>>
>>>>>>> + /*
>>>>>>> + * Ignore if the timeline is different or the current segment is not
>>>>>>> + * the desired one.
>>>>>>> + */
>>>>>>> + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz);
>>>>>>> + if (privateInfo->timeline != curSegTimeline ||
>>>>>>> + privateInfo->startSegNo > curSegNo ||
>>>>>>> + privateInfo->endSegNo < curSegNo ||
>>>>>>> + segno > curSegNo)
>>>>>>> + {
>>>>>>> + free_archive_wal_entry(entry->fname, privateInfo);
>>>>>>> + continue;
>>>>>>> + }
>>>>>>>
>>>>>>> The comment doesn't match the code. If it did, the test would be
>>>>>>> (privateInfo->timeline != curSegTimeline || segno != curSegno). But
>>>>>>> instead the segno test is > rather than !=, and the checks against
>>>>>>> startSegNo and endSegNo aren't explained at all. I think I understand
>>>>>>> why the segno test uses > rather than !=, but it's the point of the
>>>>>>> comment to explain things like that, rather than leaving the reader to
>>>>>>> guess. And I don't know why we also need to test startSegNo and
>>>>>>> endSegNo.
>>>>>>>
>>>>>>> I also wonder what the point is of doing XLogFromFileName() on the
>>>>>>> fname provided by the caller and then again on entry->fname. Couldn't
>>>>>>> you just compare the strings?
>>>>>>>
>>>>>>> Again, the division of labor is really odd here. It's the job of
>>>>>>> astreamer_waldump_content() to skip things that aren't WAL files at
>>>>>>> all, but it's the job of get_archive_wal_entry() to skip things that
>>>>>>> are WAL files but not the one we want. I disagree with putting those
>>>>>>> checks in completely separate parts of the code.
>>>>>>>
>>>>>> Keeping the timeline and segment start-end range checks inside the
>>>>>> archive streamer creates a circular dependency that cannot be resolved
>>>>>> without a 'dirty hack'. We must read the first available WAL file page
>>>>>> to determine the wal_segment_size before it can calculate the target
>>>>>> segment range. Moving the checks inside the streamer would make it
>>>>>> impossible to process that initial file, as the necessary filtering
>>>>>> parameters --  would still be unknown which would need to be skipped
>>>>>> for the first read somehow. What if later we realized that the first
>>>>>> WAL file which was allowed to be streamed by skipping that check is
>>>>>> irrelevant and doesn't fall under the start-end segment range?
>>>>>>
>>>>> Please have a look at the attached version, specifically patch 0005.
>>>>> In astreamer_waldump_content(), I have moved the WAL file filtration
>>>>> check from get_archive_wal_entry(). This check will be skipped during
>>>>> the initial read in init_archive_reader(), which instead performs it
>>>>> explicitly once it determines the WAL segment size and the start/end
>>>>> segments.
>>>>>
>>>>> To access the WAL segment size inside astreamer_waldump_content(), I
>>>>> have moved the WAL segment size variable into the XLogDumpPrivate
>>>>> structure in the separate 0004 patch.
>>>> Attached is an updated version including the aforesaid changes. It
>>>> includes a new refactoring patch (0001) that moves the logic for
>>>> identifying tar archives and their compression types from
>>>> pg_basebackup and pg_verifybackup into a separate-reusable function,
>>>> per a suggestion from Euler [1].  Additionally, I have added a test
>>>> for the contrecord decoding to the main patch (now 0006).
>>>>
>>>> 1]http://postgr.es/m/[email protected]
>>>>
>>> Rebased against the latest master, fixed typos in code comments, and
>>> replaced palloc0 with palloc0_object.
>>>
>> Hi Amul.
>>
>>
>> I think this looks in pretty good shape.
>>
> Thank you very much for looking at the patch.
>
>> Attached are patches for a few things I think could be fixed. They are
>> mostly self-explanatory. The TAP test fix is the only sane way I could
>> come up with stopping the skip code you had from reporting a wildly
>> inaccurate number of tests skipped. The sane way to do this from a
>> Test::More perspective is a subtest, but unfortunately meson does not
>> like subtest output, which is why we don't use it elsewhere, so the only
>> way I could come up with was to split this out into a separate test. Of
>> course, we might just say we don't care about the misreport, in which
>> case we could just live with things as they are.
>>
> I agree that the reported skip number was incorrect, and I have
> corrected it in the attached patch. I haven't applied your patch for
> the TAP test improvements yet because I wanted to double-check it
> first with you; the patch as it stood created duplicate tests already
> present in 001_basic.pl. To avoid this duplication, I have added a
> loop that performs tests for both plain and tar WAL directory inputs,
> similar to the approach used in pg_verifybackup for different
> compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I
> don't have any objection to doing so if you feel the duplication is
> acceptable, but I feel that using a loop for the tests in 001_basic.pl
> is a bit tidier. Let me know your thoughts.


I will take a look.

>
> I have applied all your other patches but skipped the changes to
> pg_verifybackup.c from cf5955-fixes.patch.no-cfbot, as they seem
> unrelated or perhaps I have misunderstood them.


<brown-paper-bag> That's what I get for using a poorly written tool.


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

view thread (85+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_waldump: support decoding of WAL inside tarfile
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox