public inbox for [email protected]
help / color / mirror / Atom feedFrom: Dharin Shah <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Robert Treat <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: [email protected]
Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
Date: Mon, 29 Dec 2025 14:45:27 +0100
Message-ID: <CAOj6k6dEVi0NvLjMLDhyrJS_n_NZO5D_OU89AO1u53u6NCDDwQ@mail.gmail.com> (raw)
In-Reply-To: <CAOj6k6dG0M-G0gs_Htray5y_pfvacXSjKYw0uJaaYJvOeqk7xw@mail.gmail.com>
References: <CAOj6k6dy2CRVA6Lsb5N59zE-7KNVKt=oYwWyg8ULK8zOOY8e7A@mail.gmail.com>
<CAOj6k6dkseVvZzmEAWvBd6twZsCU0DbN+qeM7CoDuMM3r9doiw@mail.gmail.com>
<CAOj6k6eSsQ_PTUTxW0upn6wp7tzbvQwqkQco-=+majwX8p6JJg@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAOj6k6fxs0Bwjr34W4aURzFPta0DTGLN-1ic-U+-M_EqJ+Wd8A@mail.gmail.com>
<[email protected]>
<CABV9wwOdeFHHNnw+rUsVnFek-oqwsJnJX9KFG6AT5Yqz5RA_vg@mail.gmail.com>
<[email protected]>
<CAOj6k6dG0M-G0gs_Htray5y_pfvacXSjKYw0uJaaYJvOeqk7xw@mail.gmail.com>
Hello Michael,
Following up on the discussion about avoiding method-specific shortcuts in
detoast paths, this patch is a refactor-only step: it introduces a small
decoded/in-memory representation of an on-disk external TOAST pointer, and
refactors detoast_attr() and detoast_attr_slice() to use it.
The goal is to centralize “how do we interpret an external datum?” so that
detoast code paths don’t have to reason directly about va_extinfo encoding
vs payload layout details. This is intended as groundwork for a follow-up
patch adding a new vartag-based method (e.g., zstd) without scattering
special cases in detoast paths.
Key changes
- Introduces DecodedExternalToast + ToastDecompressMethod +
TOAST_EXT_HAS_TCINFO in toast_internals.h.
- Adds a small static decoder in detoast.c (decode_external_toast_pointer())
- Refactors detoast_attr() and detoast_attr_slice() to use a decode ->
fetch -> decompress dispatch pattern
- No on-disk format changes; existing behavior preserved (including error
behavior for unsupported compression builds).
Why HAS_TCINFO?
- Previously, “is compressed?” was used as a proxy for whether the external
payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
captures payload layout, which is distinct from whether the value is
compressed. This separation is needed for future methods that may store
external compressed payloads without tcinfo.
Testing: Core regression suites pass
Performance: I ran a small detoast-focused benchmark that forces external
storage; results were within run-to-run variance, with no measurable
regression. (Benchmark script attached: benchmark_toast_detoast.sql for
reproduction)
Thanks,
Dharin
On Thu, Dec 25, 2025 at 1:54 AM Dharin Shah <[email protected]> wrote:
> Thanks Michael & Robert,
>
> Agreed — I don’t think it’s realistic or practical to talk about
> deprecating or replacing pglz (or lz4) given on-disk compatibility
> requirements.
>
> > Note that I am not on board with simply reusing varatt_external for
> > zstd-compressed entries, neither do I think that this is the best move
> > ever. It makes the core patch simpler, but it makes things like
> > ToastCompressionId more complicated to think about. If anything, I'd
> > consider a rename of varatt_external as the best way to go with an
> > intermediate "translation" structure only used in memory as I am
> > proposing on the other thread (something that others seem meh enough
> > about but I am not seeing alternate proposals floating around,
> > either). This would make things like detoast_external_attr() less
> > confusing, I think, as the latest patch posted on this thread is
> > actually proving with its shortcut for toast_fetch_datum as one
> > example of something I'd rather not do..
>
> On the design: I understand & share the same concerns that we’d end up
> with multiple “sources of truth” for external compression method
> identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that
> this pushes method-specific shortcuts into detoast paths.
>
> Would you be OK if I split this into two steps?
>
> 1.First, a refactor-only patch introducing a small decoded/in-memory
> representation of an external TOAST pointer, so detoast/toast code paths
> don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This
> would be a cleanup with no on-disk format change and no behavioral change
> for existing methods. Is this the same “translation structure” approach you
> mentioned in the other thread? If you can point me to it, I’ll align with
> that proposal.
>
> 2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD,
> implemented on top of that abstraction to keep zstd handling centralized
> and minimize special-casing in detoast.
> If that direction matches what you had in mind, I can first post the
> proposed translation structure/API for feedback before respinning the zstd
> patch.
>
> Thanks,
> Dharin
>
>
> On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <[email protected]>
> wrote:
>
>> On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:
>> > Agreed that I can't see pglz being removed any time soon, if ever.
>> > Thinking through what a conversion process would look like seems
>> > unwieldy at best, so I think we definitely need it for backwards
>> > compatibility, plus I think it is useful to have a self-contained
>> > option. I'd almost suggest we should look at replacing lz4, but I
>> > don't think that is significantly easier, it just has a smaller, more
>> > invested, blast radius.
>>
>> Backward-compatibility requirements make a replacement of LZ4
>> basically impossible to me, for the same reasons as pglz. We could
>> not replace the bit used in the va_extinfo to track if LZ4 compression
>> is used, either. One thing that I do wonder is if it would make
>> things simpler in the long-run if we introduced a new separated vartag
>> for LZ4-compressed external TOAST pointers as well. At least we'd
>> have a leaner design: it means that we have to keep the
>> varatt_external available on read, but we could update to the new
>> format when writing entries. Or perhaps that's not worth the
>> complication based on the last sentence you are writing..
>>
>> > That said, I do suspect ztsd could quickly
>> > become a popular recommendation and/or default among users /
>> > consultants / service providers.
>>
>> .. Because I strongly suspect that this is going to be true, and that
>> zstd would just be a better replacement over lz4. That's a trend that
>> I see is already going on for wal_compression.
>>
>> Note that I am not on board with simply reusing varatt_external for
>> zstd-compressed entries, neither do I think that this is the best move
>> ever. It makes the core patch simpler, but it makes things like
>> ToastCompressionId more complicated to think about. If anything, I'd
>> consider a rename of varatt_external as the best way to go with an
>> intermediate "translation" structure only used in memory as I am
>> proposing on the other thread (something that others seem meh enough
>> about but I am not seeing alternate proposals floating around,
>> either). This would make things like detoast_external_attr() less
>> confusing, I think, as the latest patch posted on this thread is
>> actually proving with its shortcut for toast_fetch_datum as one
>> example of something I'd rather not do..
>> --
>> Michael
>>
>
Attachments:
[application/octet-stream] benchmark_toast_detoast.sql (2.8K, 3-benchmark_toast_detoast.sql)
download
[application/octet-stream] 0001-refactor-detoast-to-use-decoded-external-toast-abstr.patch (13.2K, 4-0001-refactor-detoast-to-use-decoded-external-toast-abstr.patch)
download | inline diff:
From 49939fe3416803d446ba43c404eae9711a3ae21b Mon Sep 17 00:00:00 2001
From: Dharin Shah <[email protected]>
Date: Sun, 28 Dec 2025 23:21:21 +0100
Subject: [PATCH v1] refactor detoast to use decoded external toast abstraction
---
src/backend/access/common/detoast.c | 326 ++++++++++++++++++++-------
src/include/access/toast_internals.h | 39 ++++
2 files changed, 278 insertions(+), 87 deletions(-)
diff --git a/src/backend/access/common/detoast.c b/src/backend/access/common/detoast.c
index 62651787742..fdade574913 100644
--- a/src/backend/access/common/detoast.c
+++ b/src/backend/access/common/detoast.c
@@ -29,6 +29,70 @@ static struct varlena *toast_fetch_datum_slice(struct varlena *attr,
static struct varlena *toast_decompress_datum(struct varlena *attr);
static struct varlena *toast_decompress_datum_slice(struct varlena *attr, int32 slicelength);
+static struct varlena *toast_fetch_datum_decoded(const DecodedExternalToast *decoded);
+static struct varlena *toast_decompress_decoded(const struct varlena *compressed,
+ const DecodedExternalToast *decoded);
+
+/* ----------
+ * decode_external_toast_pointer -
+ *
+ * Decode external varlena into DecodedExternalToast struct.
+ * Only handles VARTAG_ONDISK with known compression methods.
+ * Returns false for INDIRECT, EXPANDED, or unrecognized compression.
+ * ----------
+ */
+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+ DecodedExternalToast *decoded)
+{
+ struct varatt_external toast_pointer;
+ vartag_external tag;
+
+ Assert(VARATT_IS_EXTERNAL(attr));
+
+ tag = VARTAG_EXTERNAL(attr);
+
+ if (tag == VARTAG_ONDISK)
+ {
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ decoded->toastrelid = toast_pointer.va_toastrelid;
+ decoded->valueid = toast_pointer.va_valueid;
+ decoded->rawsize = toast_pointer.va_rawsize;
+ decoded->extsize = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
+ decoded->flags = TOAST_EXT_IS_ONDISK;
+
+ if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+ {
+ uint32 cmid = VARATT_EXTERNAL_GET_COMPRESS_METHOD(toast_pointer);
+
+ switch (cmid)
+ {
+ case TOAST_PGLZ_COMPRESSION_ID:
+ decoded->method = TOAST_DECOMP_PGLZ;
+ decoded->flags |= TOAST_EXT_HAS_TCINFO;
+ break;
+ case TOAST_LZ4_COMPRESSION_ID:
+ decoded->method = TOAST_DECOMP_LZ4;
+ decoded->flags |= TOAST_EXT_HAS_TCINFO;
+ break;
+ default:
+ /* Unknown compression - let caller fall back */
+ return false;
+ }
+ }
+ else
+ {
+ decoded->method = TOAST_DECOMP_NONE;
+ }
+
+ return true;
+ }
+
+ /* Not an on-disk datum (INDIRECT, EXPANDED, etc.) */
+ return false;
+}
+
/* ----------
* detoast_external_attr -
*
@@ -115,57 +179,78 @@ detoast_external_attr(struct varlena *attr)
struct varlena *
detoast_attr(struct varlena *attr)
{
- if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ if (VARATT_IS_EXTERNAL(attr))
{
- /*
- * This is an externally stored datum --- fetch it back from there
- */
- attr = toast_fetch_datum(attr);
- /* If it's compressed, decompress it */
- if (VARATT_IS_COMPRESSED(attr))
+ DecodedExternalToast decoded;
+
+ if (decode_external_toast_pointer(attr, &decoded))
{
- struct varlena *tmp = attr;
+ struct varlena *fetched = toast_fetch_datum_decoded(&decoded);
+
+ if (decoded.method != TOAST_DECOMP_NONE)
+ {
+ struct varlena *result = toast_decompress_decoded(fetched, &decoded);
+ pfree(fetched);
+ return result;
+ }
+ return fetched;
+ }
- attr = toast_decompress_datum(tmp);
- pfree(tmp);
+ /* Decode failed: INDIRECT, EXPANDED, or unrecognized compression */
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ /* Unrecognized compression - legacy path preserves error behavior */
+ attr = toast_fetch_datum(attr);
+ if (VARATT_IS_COMPRESSED(attr))
+ {
+ struct varlena *tmp = attr;
+
+ attr = toast_decompress_datum(tmp);
+ pfree(tmp);
+ }
+ return attr;
}
- }
- else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
- {
- /*
- * This is an indirect pointer --- dereference it
- */
- struct varatt_indirect redirect;
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ /*
+ * This is an indirect pointer --- dereference it
+ */
+ struct varatt_indirect redirect;
- VARATT_EXTERNAL_GET_POINTER(redirect, attr);
- attr = (struct varlena *) redirect.pointer;
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
- /* nested indirect Datums aren't allowed */
- Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
+ /* nested indirect Datums aren't allowed */
+ Assert(!VARATT_IS_EXTERNAL_INDIRECT(attr));
- /* recurse in case value is still extended in some other way */
- attr = detoast_attr(attr);
+ /* recurse in case value is still extended in some other way */
+ attr = detoast_attr(attr);
- /* if it isn't, we'd better copy it */
- if (attr == (struct varlena *) redirect.pointer)
- {
- struct varlena *result;
+ /* if it isn't, we'd better copy it */
+ if (attr == (struct varlena *) redirect.pointer)
+ {
+ struct varlena *result;
- result = (struct varlena *) palloc(VARSIZE_ANY(attr));
- memcpy(result, attr, VARSIZE_ANY(attr));
- attr = result;
+ result = (struct varlena *) palloc(VARSIZE_ANY(attr));
+ memcpy(result, attr, VARSIZE_ANY(attr));
+ attr = result;
+ }
+ return attr;
+ }
+ else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
+ {
+ /*
+ * This is an expanded-object pointer --- get flat format
+ */
+ attr = detoast_external_attr(attr);
+ /* flatteners are not allowed to produce compressed/short output */
+ Assert(!VARATT_IS_EXTENDED(attr));
+ return attr;
}
}
- else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
- {
- /*
- * This is an expanded-object pointer --- get flat format
- */
- attr = detoast_external_attr(attr);
- /* flatteners are not allowed to produce compressed/short output */
- Assert(!VARATT_IS_EXTENDED(attr));
- }
- else if (VARATT_IS_COMPRESSED(attr))
+
+ /* Handle inline cases (not external) */
+ if (VARATT_IS_COMPRESSED(attr))
{
/*
* This is a compressed value inside of the main tuple
@@ -223,63 +308,75 @@ detoast_attr_slice(struct varlena *attr,
else if (pg_add_s32_overflow(sliceoffset, slicelength, &slicelimit))
slicelength = slicelimit = -1;
- if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ if (VARATT_IS_EXTERNAL(attr))
{
- struct varatt_external toast_pointer;
+ DecodedExternalToast decoded;
- VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+ if (decode_external_toast_pointer(attr, &decoded))
+ {
+ if (decoded.method == TOAST_DECOMP_NONE)
+ return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
+
+ /* Compressed: fetch enough to decompress the requested prefix */
+ if (slicelimit >= 0)
+ {
+ int32 max_size = decoded.extsize;
+
+ /* PGLZ supports partial decompression; LZ4 needs all data */
+ if (decoded.method == TOAST_DECOMP_PGLZ)
+ max_size = pglz_maximum_compressed_size(slicelimit, max_size);
+
+ preslice = toast_fetch_datum_slice(attr, 0, max_size);
+ }
+ else
+ preslice = toast_fetch_datum_decoded(&decoded);
+ }
+ else if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ /* Unrecognized compression - legacy path */
+ struct varatt_external toast_pointer;
- /* fast path for non-compressed external datums */
- if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
- return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
- /*
- * For compressed values, we need to fetch enough slices to decompress
- * at least the requested part (when a prefix is requested).
- * Otherwise, just fetch all slices.
- */
- if (slicelimit >= 0)
- {
- int32 max_size = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
+ if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
+ return toast_fetch_datum_slice(attr, sliceoffset, slicelength);
- /*
- * Determine maximum amount of compressed data needed for a prefix
- * of a given length (after decompression).
- *
- * At least for now, if it's LZ4 data, we'll have to fetch the
- * whole thing, because there doesn't seem to be an API call to
- * determine how much compressed data we need to be sure of being
- * able to decompress the required slice.
- */
- if (VARATT_EXTERNAL_GET_COMPRESS_METHOD(toast_pointer) ==
- TOAST_PGLZ_COMPRESSION_ID)
- max_size = pglz_maximum_compressed_size(slicelimit, max_size);
+ if (slicelimit >= 0)
+ {
+ int32 max_size = VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer);
- /*
- * Fetch enough compressed slices (compressed marker will get set
- * automatically).
- */
- preslice = toast_fetch_datum_slice(attr, 0, max_size);
+ if (VARATT_EXTERNAL_GET_COMPRESS_METHOD(toast_pointer) ==
+ TOAST_PGLZ_COMPRESSION_ID)
+ max_size = pglz_maximum_compressed_size(slicelimit, max_size);
+
+ preslice = toast_fetch_datum_slice(attr, 0, max_size);
+ }
+ else
+ preslice = toast_fetch_datum(attr);
}
- else
- preslice = toast_fetch_datum(attr);
- }
- else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
- {
- struct varatt_indirect redirect;
+ else if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
- VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
- /* nested indirect Datums aren't allowed */
- Assert(!VARATT_IS_EXTERNAL_INDIRECT(redirect.pointer));
+ /* nested indirect Datums aren't allowed */
+ Assert(!VARATT_IS_EXTERNAL_INDIRECT(redirect.pointer));
- return detoast_attr_slice(redirect.pointer,
- sliceoffset, slicelength);
- }
- else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
- {
- /* pass it off to detoast_external_attr to flatten */
- preslice = detoast_external_attr(attr);
+ return detoast_attr_slice(redirect.pointer,
+ sliceoffset, slicelength);
+ }
+ else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
+ {
+ /* pass it off to detoast_external_attr to flatten */
+ preslice = detoast_external_attr(attr);
+ }
+ else
+ {
+ /* Should not reach here - unknown external type */
+ elog(ERROR, "unexpected external varlena type");
+ preslice = attr; /* keep compiler quiet */
+ }
}
else
preslice = attr;
@@ -462,6 +559,61 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset,
return result;
}
+/* ----------
+ * toast_fetch_datum_decoded -
+ *
+ * Fetch TOAST data using pre-decoded pointer info.
+ * ----------
+ */
+static struct varlena *
+toast_fetch_datum_decoded(const DecodedExternalToast *decoded)
+{
+ Relation toastrel;
+ struct varlena *result;
+ int32 attrsize = decoded->extsize;
+
+ result = (struct varlena *) palloc(attrsize + VARHDRSZ);
+
+ /* HAS_TCINFO determines header format, not "is compressed" */
+ if (decoded->flags & TOAST_EXT_HAS_TCINFO)
+ SET_VARSIZE_COMPRESSED(result, attrsize + VARHDRSZ);
+ else
+ SET_VARSIZE(result, attrsize + VARHDRSZ);
+
+ if (attrsize == 0)
+ return result;
+
+ toastrel = table_open(decoded->toastrelid, AccessShareLock);
+ table_relation_fetch_toast_slice(toastrel, decoded->valueid,
+ attrsize, 0, attrsize, result);
+ table_close(toastrel, AccessShareLock);
+
+ return result;
+}
+
+/* ----------
+ * toast_decompress_decoded -
+ *
+ * Decompress TOAST data using decoded method.
+ * ----------
+ */
+static struct varlena *
+toast_decompress_decoded(const struct varlena *compressed,
+ const DecodedExternalToast *decoded)
+{
+ switch (decoded->method)
+ {
+ case TOAST_DECOMP_PGLZ:
+ return pglz_decompress_datum(compressed);
+ case TOAST_DECOMP_LZ4:
+ return lz4_decompress_datum(compressed);
+ case TOAST_DECOMP_NONE:
+ default:
+ elog(ERROR, "unexpected decompression method %d", decoded->method);
+ return NULL;
+ }
+}
+
/* ----------
* toast_decompress_datum -
*
diff --git a/src/include/access/toast_internals.h b/src/include/access/toast_internals.h
index 06ae8583c1e..0245f67b246 100644
--- a/src/include/access/toast_internals.h
+++ b/src/include/access/toast_internals.h
@@ -16,6 +16,45 @@
#include "storage/lockdefs.h"
#include "utils/relcache.h"
#include "utils/snapshot.h"
+#include "varatt.h"
+
+/*
+ * Decompression method for decoded toast pointers. Separate from
+ * ToastCompressionId (2-bit on-disk encoding) to allow future methods.
+ */
+typedef enum ToastDecompressMethod
+{
+ TOAST_DECOMP_NONE = 0,
+ TOAST_DECOMP_PGLZ = 1,
+ TOAST_DECOMP_LZ4 = 2
+} ToastDecompressMethod;
+
+/*
+ * Flags for DecodedExternalToast.
+ *
+ * HAS_TCINFO: Payload starts with tcinfo header. True for PGLZ/LZ4 external;
+ * false for uncompressed or future methods storing raw compressed data.
+ * IS_ONDISK: Set for VARTAG_ONDISK (for future vartag extension).
+ */
+#define TOAST_EXT_HAS_TCINFO 0x01
+#define TOAST_EXT_IS_ONDISK 0x02
+
+/*
+ * Decoded representation of an external on-disk TOAST pointer.
+ * Normalizes vartag/va_extinfo variations; decode once, use throughout.
+ *
+ * HAS_TCINFO indicates payload format (has tcinfo header), distinct from
+ * "is compressed" (extsize < rawsize) - future methods may omit tcinfo.
+ */
+typedef struct DecodedExternalToast
+{
+ Oid toastrelid;
+ Oid valueid;
+ uint32 rawsize; /* Decompressed size; for future methods without tcinfo */
+ uint32 extsize; /* On-disk payload size */
+ ToastDecompressMethod method;
+ uint8 flags;
+} DecodedExternalToast;
/*
* The information at the start of the compressed toast data.
--
2.39.3 (Apple Git-146)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
In-Reply-To: <CAOj6k6dEVi0NvLjMLDhyrJS_n_NZO5D_OU89AO1u53u6NCDDwQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox