public inbox for [email protected]  
help / color / mirror / Atom feed
From: Robert Haas <[email protected]>
To: Amul Sul <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Jakub Wartak <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: Mon, 26 Jan 2026 12:22:33 -0500
Message-ID: <CA+Tgmob=3POOO8st-v-fCjKCKREQ=+gs5_PBQnoFeNBdERfuEg@mail.gmail.com> (raw)
In-Reply-To: <CAAJ_b95FOeW38gw-3BLmpdnTWHFimopTvf=eTObYUbTOC0x8qg@mail.gmail.com>
References: <CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com>
	<CAAJ_b97PQjE4kFD8Qk6UvtLrfPMixw1nxBz0OP5Z2WB2B-uMxQ@mail.gmail.com>
	<CAAJ_b97JA8ehy_UDddrnGwDt9HG5NmJq8ATtmeMqo7YD-=tLyQ@mail.gmail.com>
	<CA+TgmoZjhWDG_AR1i+L1yss-wbuWvxrdRwSdVUUUnVPrJV2CnQ@mail.gmail.com>
	<CAAJ_b94Uh+b41LQG45bZFK+i62EVvv972LiGWWWuR64=-64rTQ@mail.gmail.com>
	<CA+TgmobF5c7ZcZHdEhqwNxGDZzWG2bDtpRaDtoVELWX_VHs_1A@mail.gmail.com>
	<CAAJ_b94gK1np8d1h-2c1YoCccGXr4zspTa-FC7X_bfXZNz=-DA@mail.gmail.com>
	<CA+TgmoayDY5b+bP1vRRN7A3xOP-=+tK13B2C1g-Xm1j4WTrT9Q@mail.gmail.com>
	<CAAJ_b97JAF+Zuoh2FBO79hVwLeaBPwsbXw-fY+313a7LfRQ-Bg@mail.gmail.com>
	<CA+Tgmoardk4VuthHc23vov+AVkhq7eT0mFUs-2ctAnP1uiTaog@mail.gmail.com>
	<CAAJ_b959x5VjmLJFmN78r_QohQuuj=fde11VbbAOHn5TzgEzng@mail.gmail.com>
	<CAAJ_b97_N+0sipFyq80n0jX-nKcjcQEMOSTVg8DsqkHR8dW_Sw@mail.gmail.com>
	<CAKZiRmyDk5KqovS9Ez3iFHd+p-TChSt2QTtWkwJ5Ya-+4gg21g@mail.gmail.com>
	<CAAJ_b956a+e8-HNEpeJ60ByFv7XJRqECPu3B0dozv0ChMRTPbQ@mail.gmail.com>
	<CAAJ_b94f6sUDWiZY90O-t7SWWeSK0nMWT7AsydKkpAL90m0oMQ@mail.gmail.com>
	<[email protected]>
	<CAAJ_b94SEcBVJcsp0Y1-YvLqZWBHPQH4FhRzSJfaH_ah_eL_FQ@mail.gmail.com>
	<CAAJ_b97nn9denP2SAjuXyjwbd3is-VnZVSkiRMJ-5YNiKfO9MA@mail.gmail.com>
	<CA+Tgmob_DB9QHDOxnP7a5Y0yJdeGqY8YNi+uK_811y7cN4mxYA@mail.gmail.com>
	<CAAJ_b97VUiP-DbLNe-ddq64J_RiB4ZcPgAjHkJH-0dbzgjR++A@mail.gmail.com>
	<CA+TgmoYMtcZBaqy9r59eDapaDy3WOdepkFFURu9MV-x-kxEbKg@mail.gmail.com>
	<CAAJ_b95FOeW38gw-3BLmpdnTWHFimopTvf=eTObYUbTOC0x8qg@mail.gmail.com>

On Fri, Jan 23, 2026 at 7:27 AM Amul Sul <[email protected]> wrote:
> Another option I previously considered was adding the filtration logic
> inside the archive streamer itself. However, since the very first read
> is required to calculate the WAL segment size, the filter check cannot
> be performed immediately. However, we could send a signal to the
> archive streamer via privateInfo (e.g., a read_any_wal or
> skip_wal_check boolean flag) to disable the filtration check until the
> size is calculated. But that approach isn't very elegant; if the first
> WAL page we read belongs to a segment we actually want to skip, we
> would still have to run the filter check and handle the skip/removal
> logic outside of the streamer (i.e., inside init_archive_reader()).
> This would result in performing the same filtration check in two
> different places.

I mean, I don't really buy this logic. If the information added to
privateInfo is "here's the LSN before which you can remove stuff," and
it starts out initialized to 0/0, then the read of the first WAL page
causes no problem at all, because nothing is before 0/0. After it gets
updated to some non-zero value, the next call to
astreamer_waldump_content() can handle discarding any data we don't
need.

IMHO, the best argument for keeping things are is that in some cases,
that decision might result in a bit of delay in discarding old data,
but I don't think that really matters. I think all that we care about
is the peak memory utilization of an operation, and I don't think that
an explicit signaling system should increase that at all.

That said, I'm certainly willing to consider other ideas about how
this can work. However, I feel strongly that the logic needs to be not
only correct, but clear and well-explained. Setting cur_wal to NULL to
make the astreamer skip without adequate comments doesn't meet that
standard. Maybe with some better comments it's all right, but frankly
I'm a bit skeptical. Right now, you're using whether or not cur_wal is
NULL as a signal to skip data or not skip data. How is that better
than passing down the LSN and TLI that you want to read next and
letting the astreamer figure out what to do itself? It's a signaling
mechanism either way, but it seems a lot easier to figure out whether
we always keep the LSN and TLI updated properly than to figure out
whether cur_wal is always NULL at exactly the right times.

-- 
Robert Haas
EDB: http://www.enterprisedb.com






view thread (8+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_waldump: support decoding of WAL inside tarfile
  In-Reply-To: <CA+Tgmob=3POOO8st-v-fCjKCKREQ=+gs5_PBQnoFeNBdERfuEg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox