public inbox for [email protected]
help / color / mirror / Atom feedFrom: Amul Sul <[email protected]>
To: Andrew Dunstan <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Zsolt Parragi <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Jakub Wartak <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Date: Sat, 21 Mar 2026 22:56:38 +0530
Message-ID: <CAAJ_b96d0imBJ9Qm93oe40bEZ_4s-u2stM-JB0LYQC5GcjVW-w@mail.gmail.com> (raw)
In-Reply-To: <CAAJ_b94dfQBXaPiZFaVzH=8mCVY6NZ6JEiFXHWph3HYJiSj=1Q@mail.gmail.com>
References: <CAAJ_b94hkpAJ-Q8FKjGgpbRTmmXOd-agKo1Eii4yOH2N++N36Q@mail.gmail.com>
<[email protected]>
<CAAJ_b94jrauD_FAegTnODszjnF0O+ZFLtXhXBSPhyThRUqSNVg@mail.gmail.com>
<CAAJ_b94fiDcsN6k27hV9772kNjEbtZG5ESQvXL1KQoUcRcxrGA@mail.gmail.com>
<CAAJ_b95JZ7SunSeCgxB3+pC+38B5smD9y4uubAGmZTNo0xtHog@mail.gmail.com>
<CAN4CZFMn_9wgEG0q-9CCXynZ85FzVFoyVU=okWESA42U542ajw@mail.gmail.com>
<CAAJ_b95Oj6Kb6YGsV42Gqy=N7GuOX+FMmEtUbS7NC6BvARN2mQ@mail.gmail.com>
<CAAJ_b96AG+p+n+DZPc4isTDR_rD_-dYfVJGXhG0k2CG+kdzU9g@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAAJ_b94dfQBXaPiZFaVzH=8mCVY6NZ6JEiFXHWph3HYJiSj=1Q@mail.gmail.com>
On Sat, Mar 21, 2026 at 9:05 PM Amul Sul <[email protected]> wrote:
>
> On Sat, Mar 21, 2026 at 5:51 PM Andrew Dunstan <[email protected]> wrote:
> >
> >
> > On 2026-03-21 Sa 2:34 AM, Tom Lane wrote:
> >
> > Michael Paquier <[email protected]> writes:
> >
> > On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote:
> >
> > Buildfarm members batta and hachi don't like this very much.
> >
> > I did not look at what's happening on the host, but it seems like a
> > safe bet to assume that we are not seeing many failures in the
> > buildfarm because we don't have many animals that have the idea to add
> > --with-zstd to their build configuration, like these two ones.
> >
> > That may be part of the story, but only part. I spent a good deal of
> > time trying to reproduce batta & hachi's configurations locally, on
> > several different platforms, but still couldn't duplicate what they
> > are showing.
> >
> >
> >
> >
> >
> > Yeah, I haven't been able to reproduce it either. But while investigating I found a couple of issues. We neglected to add one of the tests to meson.build, and we neglected to close some files, causing errors on windows.
> >
>
> While the proposed fix of closing the file pointer before returning is
> correct, we also need to ensure the file is reopened in the next call
> to spill any remaining buffered data. I’ve made a small update to
> Andrew's 0001 patch to handle this. Also, changes to meson.build don't
> seem to be needed as we haven't committed that file yet (unless I am
> missing something).
>
> I’ve also reattached the other patches so they don't get lost: v2-0002
> is Andrew's patch for the archive streamer, and v2-0003 is the patch I
> posted previously [1].
>
>
On further thought, I don't think v2-0001 is the right patch. Consider
the case where we write a temporary file partially: if the next
segment required for decoding is that same segment,
TarWALDumpReadPage() will find the physical file present and continue
decoding, potentially triggering an error later due to the shorter
file.
I have attached the v3-0001 patch, which ensures that once we start
writing a temporary file, it should be finished before performing the
lookup. This ensures we don't leave a partial file on disk.
Updated patches are attached; 0002 and 0003 remain the same as before.
Regards,
Amul
Attachments:
[application/x-patch] v3-0001-archive_waldump-skip-hash-lookup-and-tighten-writ.patch (1.8K, 2-v3-0001-archive_waldump-skip-hash-lookup-and-tighten-writ.patch)
download | inline diff:
From b3bfdac9e425f4cb9fd7d7b6c698dd1607b737ee Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 22:27:22 +0530
Subject: [PATCH v3 1/3] archive_waldump: skip hash lookup and tighten write_fp
invariant
In get_archive_wal_entry(), when the streamer is still mid-segment
(entry == cur_file), jump directly to read_more instead of looping back
to the top and performing a hash table lookup that is guaranteed to fail.
---
src/bin/pg_waldump/archive_waldump.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index b078c2d6960..ee292b6dc8d 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -484,6 +484,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
*/
entry = privateInfo->cur_file;
+read_more:
/*
* Fetch more data either when no current file is being tracked or
* when its buffer has been fully flushed to the temporary file.
@@ -525,11 +526,20 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo)
* file handle so data is flushed to disk before the next segment
* starts writing to a different handle.
*/
- if (entry != privateInfo->cur_file && write_fp != NULL)
+ if (entry != privateInfo->cur_file)
{
+ Assert(write_fp);
fclose(write_fp);
write_fp = NULL;
}
+ else
+ /*
+ * The file being written hasn't been completed. We must finish
+ * extracting it before performing the hash lookup; otherwise, the
+ * lookup might return without flushing the current segment buffer,
+ * leaving the file open and incomplete on disk.
+ */
+ goto read_more;
}
/* Requested WAL segment not found */
--
2.47.1
[application/x-patch] v3-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch (2.5K, 3-v3-0002-Fix-astreamer-decompressor-finalize-to-send-corre.patch)
download | inline diff:
From 3a52d70947f7c7bc3a0decbd473e95891ad3b6eb Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 19:48:48 +0530
Subject: [PATCH v3 2/3] Fix-astreamer-decompressor-finalize-to-send-correct
---
src/fe_utils/astreamer_gzip.c | 9 +++++----
src/fe_utils/astreamer_lz4.c | 9 +++++----
src/fe_utils/astreamer_zstd.c | 2 +-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c
index 2e080c37a58..df392f67cab 100644
--- a/src/fe_utils/astreamer_gzip.c
+++ b/src/fe_utils/astreamer_gzip.c
@@ -347,10 +347,11 @@ astreamer_gzip_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c
index 2bc32b42879..605c188007b 100644
--- a/src/fe_utils/astreamer_lz4.c
+++ b/src/fe_utils/astreamer_lz4.c
@@ -397,10 +397,11 @@ astreamer_lz4_decompressor_finalize(astreamer *streamer)
* End of the stream, if there is some pending data in output buffers then
* we must forward it to next streamer.
*/
- astreamer_content(mystreamer->base.bbs_next, NULL,
- mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
- ASTREAMER_UNKNOWN);
+ if (mystreamer->bytes_written > 0)
+ astreamer_content(mystreamer->base.bbs_next, NULL,
+ mystreamer->base.bbs_buffer.data,
+ mystreamer->bytes_written,
+ ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
}
diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c
index f26abcfd0fa..4b43ab795e3 100644
--- a/src/fe_utils/astreamer_zstd.c
+++ b/src/fe_utils/astreamer_zstd.c
@@ -347,7 +347,7 @@ astreamer_zstd_decompressor_finalize(astreamer *streamer)
if (mystreamer->zstd_outBuf.pos > 0)
astreamer_content(mystreamer->base.bbs_next, NULL,
mystreamer->base.bbs_buffer.data,
- mystreamer->base.bbs_buffer.maxlen,
+ mystreamer->zstd_outBuf.pos,
ASTREAMER_UNKNOWN);
astreamer_finalize(mystreamer->base.bbs_next);
--
2.47.1
[application/x-patch] v3-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch (3.3K, 4-v3-0003-pg_waldump-Handle-archive-exhaustion-in-init_arch.patch)
download | inline diff:
From 242b8904682cec326d059e3d11355ca2315c869c Mon Sep 17 00:00:00 2001
From: Amul Sul <[email protected]>
Date: Sat, 21 Mar 2026 20:57:23 +0530
Subject: [PATCH v3 3/3] pg_waldump: Handle archive exhaustion in
init_archive_reader().
When read_archive_file() returns 0, the archive may have already
buffered a complete WAL file into the hash table before exhausting
the input. Instead of immediately reporting an error, search the
hash table for an entry containing at least sizeof(XLogLongPageHeader)
bytes. Report a specific error if a WAL entry exists but is too
short (truncated/corrupt), or a generic error if no WAL was found
at all.
Also tighten the loop condition to check for sizeof(XLogLongPageHeader)
rather than XLOG_BLCKSZ, since only the long page header is needed
at this stage.
---
src/bin/pg_waldump/archive_waldump.c | 55 ++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 4 deletions(-)
diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c
index ee292b6dc8d..943c843e05b 100644
--- a/src/bin/pg_waldump/archive_waldump.c
+++ b/src/bin/pg_waldump/archive_waldump.c
@@ -176,13 +176,60 @@ init_archive_reader(XLogDumpPrivate *privateInfo,
* the first WAL segment in the archive so we can extract the WAL segment
* size from the long page header.
*/
- while (entry == NULL || entry->buf->len < XLOG_BLCKSZ)
+ while (entry == NULL || entry->read_len < sizeof(XLogLongPageHeader))
{
if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0)
- pg_fatal("could not find WAL in archive \"%s\"",
- privateInfo->archive_name);
+ {
+ ArchivedWAL_iterator iter;
+ ArchivedWALFile *e = NULL;
+ ArchivedWALFile *short_entry = NULL;
- entry = privateInfo->cur_file;
+ entry = NULL;
+
+ /*
+ * read_archive_file() returned 0, meaning the archive is
+ * exhausted. However, a sufficiently compressed archive may have
+ * already read a complete WAL file and inserted it into the hash
+ * table before returning. Search the hash table for any entry
+ * that already has enough buffered data to contain the long page
+ * header; if none is found, the archive contains no usable WAL.
+ */
+ ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter);
+ while ((e = ArchivedWAL_iterate(privateInfo->archive_wal_htab,
+ &iter)) != NULL)
+ {
+ if (e->read_len >= sizeof(XLogLongPageHeader))
+ {
+ entry = e;
+ break;
+ }
+ /* Remember a short entry in case we need to report it */
+ short_entry = e;
+ }
+
+ if (entry == NULL)
+ {
+ /*
+ * A WAL file was found in the hash table but it does not
+ * contain enough data to read the long page header,
+ * indicating a truncated or corrupt WAL segment.
+ */
+ if (short_entry != NULL)
+ pg_fatal("could not read file \"%s\" from \"%s\" archive: read %d of %d",
+ short_entry->fname, privateInfo->archive_name,
+ short_entry->read_len,
+ (int) sizeof(XLogLongPageHeader));
+
+ /*
+ * The hash table contains no WAL entries at all, meaning the
+ * archive holds no WAL data.
+ */
+ pg_fatal("could not find WAL in archive \"%s\"",
+ privateInfo->archive_name);
+ }
+ }
+ else
+ entry = privateInfo->cur_file;
}
/* Extract the WAL segment size from the long page header */
--
2.47.1
view thread (85+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
In-Reply-To: <CAAJ_b96d0imBJ9Qm93oe40bEZ_4s-u2stM-JB0LYQC5GcjVW-w@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox