Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vjGGZ-00C2tP-0O for pgsql-hackers@arkaria.postgresql.org; Fri, 23 Jan 2026 12:27:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vjGGX-00HTA8-19 for pgsql-hackers@arkaria.postgresql.org; Fri, 23 Jan 2026 12:27:53 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vjGGW-00HT9z-30 for pgsql-hackers@lists.postgresql.org; Fri, 23 Jan 2026 12:27:53 +0000 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vjGGU-001uxF-1B for pgsql-hackers@lists.postgresql.org; Fri, 23 Jan 2026 12:27:52 +0000 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-34c24f4dfb7so1124267a91.0 for ; Fri, 23 Jan 2026 04:27:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1769171270; cv=none; d=google.com; s=arc-20240605; b=ayPeuq498Bp8rELGH9D5dojQ+27DVd5IAnrkZktT50lyysdcOlBAKtgBXIt3HTJsgD xhaQx9UcAlogEliht/edXgvTh4AhoxmWIlVxn5ZWGw2HraHn1VitF4F8MszZj6YAdQPA Tthl4Jo20X4r+u9AdQ5PWxH8a/oqK6DZRo0bTMz/QYa0o8XpWH2EJQVCsOG8eG+agDRk kfw58NHeCDPKo8Ud7u3jvcJqgrKDH0w8DO7VV7spGSjFwR7bzkJv/YtKjaSitiCpWPzp qJGGftIQk0k4cSLX1wcgoDGIjRvhD7hIV6wFjSOwGqtg6QwZEjdVZ5yafnQBVofUszts wKSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fqXz3atguDrks2y8mIM9vPE5WYrqtb6Rc1GoFeKfJAI=; fh=dFY214hSaRfnHX29e1lofLOkkkgAxJAdNStj2r7l8tk=; b=JCQL5BE7MzRDoejSASJ4JRXh87HELq8q47osgZp/oWQ60026cBanFGw6YA4dvBsVG3 cD2pXCHvKEwBuRD0pzPqtiTK8+YD9ghuZdX//+4Dh9N5UCarnTVIRAeV4/h8TUd6y9OR GqT2vsohcOsUF+4wWJUCOACzDm0iY2hs04Via0Bm/Ju65k44X9VaZZ8CpKB6dupkWEQK c/ooF2nWnRoymCP2LHAgzs4M/kgwaOPJV5niUz3pANNYdTATEVvopbumxadb/J6lWZHj xHcoqXXMqR3nYxQ2qPfDZJ1nigihZINiUfWAz1xYulGc2dURrO4qFeg2nv4NPlA+iiU+ zF7w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769171269; x=1769776069; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fqXz3atguDrks2y8mIM9vPE5WYrqtb6Rc1GoFeKfJAI=; b=AqAduAT+IigZhj8CBcyydtBBRY+Wc2A78ApRQMNq5Pqlhb+18Ju4MEouiTrkHso+Od s6kv7yHSN+Msqj2gRAVo9pjbYK772NBHz/qJTy/TO8/oApG/DLrvx+RaBHRI6g/SM95s 0lN3CAa3xtb8zla/9w/VvfduCKCQIw/kGnmPs6OSSwLXpgqxf73Xgvzhz0howqqrMZv2 TVC88+2Rf6EMCh4z42uOoIMUYxoMRE4R9bl/nQXEM21cXNEbW720xs9GXxbQov1Vam2d OVrBg49pT5H5PZMWubPcodBAmB1EXeyyxmU+3VJ6W30dP+jnnupCYvcD93qBHOqC1BOt 4HJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769171270; x=1769776070; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fqXz3atguDrks2y8mIM9vPE5WYrqtb6Rc1GoFeKfJAI=; b=gLibxfyTCuOzEvhivl9NtbTRDuCv0RQHVnInYOHRuA8+QhFMNn025fdw9oC7RsoZQV V+3YT6io7vGzwL+TegxIQBNXpPdWRIQoWVOh7u4vcS9CWtv/aituQNAHcfuDuwLgn14s d50KkiTiFn1sVDWsT5adL9lMj0/77/Mv1YcAzB3y3x1fikagglDvx+kx3i+Vm0f2T9D6 skGhsXZAphcpRxg+MNlLvqj8cfqHaSVXPGW+41SQT03y1+f4t221000WIss3qqGWCQ6M nEmYnw78d9mLm0CxxHcM15JZ48RXInOBhSL0Ir+CEZSMAlFOSpPxPHPyP/6z30tfp1xy uflQ== X-Forwarded-Encrypted: i=1; AJvYcCUgOoJwF15B5zQ/qAeXlkPxfV7GiPJOkS3duhg+OzZPLAhzBULnzdL3cAzqdscQvyC1FwgMuy1HyBiJ4LZA@lists.postgresql.org X-Gm-Message-State: AOJu0YxEGDcmiZDZxj5iwOqguoGPm8LIVXlQzhMeGxijc9TpTTg46wKx ITrMA7WqPinZ3JlHV0QD2fQyTsook6tu361ISwkDPT2vejMEmnb+kQzh1/mZqE+hlQQS12YEtPz 42M3ECd1IkiWzYlSo4iMI6UlgCXO3ymo= X-Gm-Gg: AZuq6aLhoXMyscaeGjHHzmuXts+SL2O4/2lnw+rzuYH+z9B90T5z10kMU2sGfKglUTG Os60Ysu7purjPM9P63W8p96qfT0PyLLinrjf8ap8D+jaHS5Rf9G8pei7NIw4KDnrzes16YYbs2U WS58M1yfLTpq3gO21skv6hpvqu+isGgR5RI9OM3x/uM0/LKi8fnFSE733C9r39z8aTMKVzXPhlf 2iIFnsn5GfO1W6KAzI9bmdmDAWZ5WZ2nja/LoAdU+hHfStx7o4jY7w8/B72iFQ7fI6D/3zv X-Received: by 2002:a17:90b:3911:b0:341:2141:df76 with SMTP id 98e67ed59e1d1-35367028e9dmr2439522a91.13.1769171269439; Fri, 23 Jan 2026 04:27:49 -0800 (PST) MIME-Version: 1.0 References: <731ADE6F-01C5-4996-BAEE-5851DFC3F502@gmail.com> In-Reply-To: From: Amul Sul Date: Fri, 23 Jan 2026 17:57:11 +0530 X-Gm-Features: AZwV_QgE5Tsz7qu-qLi7uPWn1z_8wykKuFqxNFr6HDdqw-wWL44tgllOFXrZprw Message-ID: Subject: Re: pg_waldump: support decoding of WAL inside tarfile To: Robert Haas Cc: Chao Li , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Jan 19, 2026 at 7:01=E2=80=AFPM Robert Haas = wrote: > > > But, I am thinking that instead of setting privateInfo->cur_wal to > > NULL, we could simply discard the buffer data at that place and let > > memcpy in astreamer_content copy if it would be that much of an issue. > > I don't think doing unnecessary copying is the right way forward. The > copying itself could be expensive, but I am a little concerned about > the memory utilization of this code. Suppose the user has increased > the WAL segment size to 1GB or even higher. It seems like we could > buffer a whole segment or maybe even more in some scenarios. If we > avoid copying data we don't need, then we also avoid buffering it in > memory. > > In terms of the separation of concerns, we could view setting > privateInfo->cur_wal =3D NULL as a form of signaling, a way for this > code to tell the astreamer that it doesn't need the data buffering. > However, I think it might be better to make the signaling more > explicit. Instead of having the caller directly set the buffer to > NULL, or directly trim data out of the buffer, maybe it should set > some value in privateInfo that tells the astreamer what to do. For > instance, suppose it sets the oldest LSN that it might still care > about in privateInfo, and then the astreamer is free to do skip > copying of any data prior to that LSN, and discard any that it already > has. Especially if properly commented, I think this might be more > clear than what you have now. I am not 100% sure that's the right > idea, though. I just think right now it's a bit murky who does what > and why the division of responsibilities is what it is. > The current implementation of astreamer_waldump_content() does not have sufficient information to skip WAL segments during the initial hash entry preparation and data-copying phase. Because the filtration parameters -- which determine if a segment should be skipped -- depend on the WAL segment size, we must first read a WAL page through the streamer to calculate that size, which is done in init_archive_reader(). Therefore, the responsibility of the archive streamer is strictly to copy the WAL segment data to the buffer. The skipping decision is handled inside get_archive_wal_entry(), which sets privateInfo->cur_wal to NULL. In the next version, I am planning to add a separate routine (with better commenting) that, along with setting the pointer to NULL, releases that hash entry to avoid unnecessary memory usage. Another option I previously considered was adding the filtration logic inside the archive streamer itself. However, since the very first read is required to calculate the WAL segment size, the filter check cannot be performed immediately. However, we could send a signal to the archive streamer via privateInfo (e.g., a read_any_wal or skip_wal_check boolean flag) to disable the filtration check until the size is calculated. But that approach isn't very elegant; if the first WAL page we read belongs to a segment we actually want to skip, we would still have to run the filter check and handle the skip/removal logic outside of the streamer (i.e., inside init_archive_reader()). This would result in performing the same filtration check in two different places. Therefore, I believe performing the filtration check through get_archive_wal_entry() and then calling a routine to clear privateInfo->cur_wal and the hash entry is the better approach, IMO. Additionally, once we consume a WAL file and move to the next one, the hash entry and buffer for that WAL can be released to prevent unnecessary memory consumption by calling the same routine that I am planning to add. > > > + if (strstr(member->pathname, "PaxHeaders.")) > > > + return false; > > > > > > There is no way that a filename containing the string "PaxHeaders." > > > could ever pass the IsXLogFileName test just above. We shouldn't need > > > this. > > > > > > > This checks the directory name (e.g., > > x_dir/y_dir/PaxHeaders.NNNN/wal_file_name). The name of that metadata > > file is exactly the same as the WAL file name, which is why > > IsXLogFileName() doesn't help here. > > I think this code should only be considering files in the toplevel > directory, and skipping over any directories it finds. I absolutely > promise I am not going to commit anything that is specifically looking > for PaxHeaders. Nothing we've ever done with tar files up to now has > required that, and I don't think this should, either. Ok, fair enough, my intention was to allow decoding of valid WAL data from any directory in the tar archive, but I will go ahead and add that restriction as suggested. Regards, Amul