Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaxik-00GSfq-0f for pgsql-hackers@arkaria.postgresql.org; Wed, 31 Dec 2025 15:02:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vaxij-006XPO-0d for pgsql-hackers@arkaria.postgresql.org; Wed, 31 Dec 2025 15:02:41 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vaxii-006XPF-2l for pgsql-hackers@lists.postgresql.org; Wed, 31 Dec 2025 15:02:41 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vaxih-003myP-09 for pgsql-hackers@lists.postgresql.org; Wed, 31 Dec 2025 15:02:41 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-64b5ed53d0aso15673493a12.3 for ; Wed, 31 Dec 2025 07:02:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767193356; x=1767798156; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Ln+KnFsA6V0pN0N71RIuEo7MJj2U2BVVETO5urq1UlM=; b=QdyyVPqg6Obf8M0Tn6SdHb/ncA2oOLqHkJ7+ub9B1asol1gEkrv1myGmYqw00SD1wt QcWwoaAUVnWL1+TE65+or2ux8MLgrbsxJ56+0gV2JHEa+eLEBlgCNh03trinOrfYoyWK ZJZiV835KL66qY8fEtczjE5CmZR1aGuqy4XFIalIC6Cn6SRyUJ0pn4xrLKQwjiHhTyJC Dt8c8HCKAMIWm9sasBCQA5Bs6wBS876kS3fTKtqY2bqEXJjpv9i11GbMJMSZjYmrAiDz +1zhyA0q6MhpoZlfzzumLzIcoNMqMY26YND90g+CAuGDFI3aqyTZaKgKIfIIeg18BiUQ nRoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767193356; x=1767798156; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Ln+KnFsA6V0pN0N71RIuEo7MJj2U2BVVETO5urq1UlM=; b=B+GCts99h3HFZmSW/P2W2/yKjxVjTGM8mcVqrulTSn3quiOYuZI2vgcoA+tZe0t6qr E7dA5lKuIq/rur1qOVDlu9GYlbDkPShH2O2s9YNNED29VENbZESiHf7vL/mz7oIPRUg0 XLP6+2RyDnhWbfIaWcz0MOqYHk5oYpnc//otd9TgjlhBAtshNLNFfSjUP4aohg1MZMsg Bm1rNHVGSPzTh+lvDXpVR78+w9l1HDMqKT/XpQwCABLb6jbKMIezguk95UBd2uM0Qnbx b9p9RnddmUn8jg0LDIkWpUY0adlbdeLoIGVf5WNjCgA8txBTlTH8/Y21lImzlMYYTUmg 2TRg== X-Forwarded-Encrypted: i=1; AJvYcCUKVDFuf2F4z+YEZ6hAsqDx2iMoPlhNFYcs4XbMlA6VRyihUYWBcclbQgmKCRENbde/yLoSIKwxeLfsuRbr@lists.postgresql.org X-Gm-Message-State: AOJu0YyrSOMpAQYEkWyWcDroh7otIzxi0RTppT4iFmiVg6b9aO+jjMZ3 C6b0v1nNB71FN3wcDGjLIC3JZahV/zBZhb46wc3FvyMycUt2mrSOychotiqJBeQzloNxl+AVF/N j9Z3vq0jUUEWnW/5hk+iiNKj4cOs/WGk= X-Gm-Gg: AY/fxX7sB7yQQqADhP8l40DZEdra+Z9bgX3uMrYNlBmxy9xTXnr+qkVa+nXzXQ0J0t2 wakdoLkUhnWV03aoCt68Q9Z3QqNpOv6sT+QORr2pBKTWpZsBd7wvWEHuEXHrJ9yUH93dyC9hU3U 6ZhgHmEqlcY2SjMGNeK7qrw6G/GC6iPPtTxLdaxWQRFtETS+RipxzR5o5ew+o6WaKItLQKHcKBM Wk6xF51F7ETAmoaZgm7YoiKMmnRHOqxux9Fx5x+4Pidn12g2ib6z37JdiNNmfq5OizR2A5nvq0E WjSxfSuY5YZv10oCyUX4AYMvH+pVur4H6MyHTjMj X-Google-Smtp-Source: AGHT+IGLiH2MCfkniJ8XkNK68wMxk9S7mluxIRpElTHXwAMEV0sgXq/ItXCJyJ6X50Fwvo1pqPPCMOCe8p1yArMe7uY= X-Received: by 2002:a17:907:7246:b0:b73:792c:6326 with SMTP id a640c23a62f3a-b8036ebbbc5mr3579158366b.11.1767193356122; Wed, 31 Dec 2025 07:02:36 -0800 (PST) MIME-Version: 1.0 References: <4c70b0f6-5aba-464c-b145-464a620c1222@eisentraut.org> In-Reply-To: From: Dharin Shah Date: Wed, 31 Dec 2025 16:02:24 +0100 X-Gm-Features: AQt7F2pBF_YlyUqcXtvFrqGUNektPOHwjOJ3BXRPf_f_LJDWtZ_syRvprDpsY04 Message-ID: Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format To: Michael Paquier Cc: Robert Treat , Peter Eisentraut , pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000003dbc7e064740c4a3" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003dbc7e064740c4a3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Michael, After looking more closely at your =E2=80=9C8=E2=80=91byte TOAST values / i= nfinite loop=E2=80=9D thread and patch series, I see this is very much the same direction you outlined there: introduce a normalized in-memory representation for external pointers (toast_external_data) and keep most call sites from having to reason about vartag_external/va_extinfo details directly [1]. For this refactor patch I kept the decoder local to detoast.c to minimize scope and avoid committing to a broader API boundary too early. But if the consensus heads toward a shared interface closer to the format definitions (as in your toast_external approach), I=E2=80=99m happy to respin/rework th= is patch to align with that direction, rather than working on parallel abstractions. It should also be straightforward to mold this refactor in the direction of the 8=E2=80=91byte value-id work without changing the over= all detoast structure. On HAS_TCINFO flag: the intent is to make payload layout explicit. In the current code, =E2=80=9Cexternal is compressed=E2=80=9D effectively implies = =E2=80=9Cpayload begins with tcinfo=E2=80=9D, which is wired into fetch/slice logic. For a vartag-b= ased follow-up (e.g., zstd), we may want compressed payloads without a tcinfo prefix, so having an explicit flag keeps detoast paths uniform and avoids method-specific shortcuts. Let me know what you=E2=80=99d prefer for next steps: keep this patch as a detoast-local refactor, or respin it to align more directly with a shared decoded external-pointer interface in the direction of the 8=E2=80=91byte w= ork. [1] https://www.postgresql.org/message-id/flat/CAN-LCVNsE4x0k11ZRWvU4ySTbe98fwA= 16qzV7p8dxogWnD5Jng%40mail.gmail.com#8253d63beab18b73706e3555ce19a0f4 Thanks, Dharin On Tue, Dec 30, 2025 at 12:46=E2=80=AFAM Michael Paquier wrote: > On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote: > > The goal is to centralize =E2=80=9Chow do we interpret an external datu= m?=E2=80=9D so > that > > detoast code paths don=E2=80=99t have to reason directly about va_extin= fo > encoding > > vs payload layout details. This is intended as groundwork for a follow-= up > > patch adding a new vartag-based method (e.g., zstd) without scattering > > special cases in detoast paths. > > +static bool > +decode_external_toast_pointer(const struct varlena *attr, > + DecodedExternalToast > *decoded) > [...] > +typedef enum ToastDecompressMethod > +{ > + TOAST_DECOMP_NONE =3D 0, > + TOAST_DECOMP_PGLZ =3D 1, > + TOAST_DECOMP_LZ4 =3D 2 > +} ToastDecompressMethod; > + > +typedef struct DecodedExternalToast > +{ > + Oid toastrelid; > + Oid valueid; > + uint32 rawsize; /* Decompressed size; for > future methods without tcinfo */ > + uint32 extsize; /* On-disk payload size *= / > + ToastDecompressMethod method; > + uint8 flags; > +} DecodedExternalToast; > > Yeah, honestly this is a layer I have been thinking about as well as > one option, but contrary to you I have been focusing on putting that > into varatt.h, with the exception of the value being an Oid8. I think > that you have an interesting point in focusing your implementation to > be stored in the detoast part, though. I'd need to spend a bit more > time to see the result this would lead at with the larger 8-byte issue > in mind, but this is something that would come at no real cost as it > has no function pointer redirection compared to what I was first > envisioning on the other thread. That's especially true if it makes > the CompressionId business easier to mold around when adding a new > vartag. > > > Why HAS_TCINFO? > > - Previously, =E2=80=9Cis compressed?=E2=80=9D was used as a proxy for = whether the > external > > payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO > > captures payload layout, which is distinct from whether the value is > > compressed. This separation is needed for future methods that may store > > external compressed payloads without tcinfo. > > It is possible to model the on-memory data as we want. This > suggestion would be OK with some flags. > -- > Michael > --0000000000003dbc7e064740c4a3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Michael,

After looking more closely at your = =E2=80=9C8=E2=80=91byte TOAST values / infinite loop=E2=80=9D thread and pa= tch series, I see this is very much the same direction you outlined there: = introduce a normalized in-memory representation for external pointers (toas= t_external_data) and keep most call sites from having to reason about varta= g_external/va_extinfo details directly [1].

For this refactor patch = I kept the decoder local to detoast.c to minimize scope and avoid committin= g to a broader API boundary too early. But if the consensus heads toward a = shared interface closer to the format definitions (as in your toast_externa= l approach), I=E2=80=99m happy to respin/rework this patch to align with th= at direction, rather than working on parallel abstractions.=C2=A0It should = also be straightforward to mold this refactor in the direction of the 8=E2= =80=91byte value-id work without changing the overall detoast structure.
On HAS_TCINFO flag: the intent is to make payload layout explicit. In = the current code, =E2=80=9Cexternal is compressed=E2=80=9D effectively impl= ies =E2=80=9Cpayload begins with tcinfo=E2=80=9D, which is wired into fetch= /slice logic. For a vartag-based follow-up (e.g., zstd), we may want compre= ssed payloads without a tcinfo prefix, so having an explicit flag keeps det= oast paths uniform and avoids method-specific shortcuts.

Let me know what you=E2=80=99d prefer for next steps: keep this patch as a= detoast-local refactor, or respin it to align more directly with a shared = decoded external-pointer interface in the direction of the 8=E2=80=91byte w= ork.


On Tue, Dec 30, 2025 at 12:46=E2=80=AFAM Mich= ael Paquier <michael@paquier.xyz<= /a>> wrote:
O= n Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:
> The goal is to centralize =E2=80=9Chow do we interpret an external dat= um?=E2=80=9D so that
> detoast code paths don=E2=80=99t have to reason directly about va_exti= nfo encoding
> vs payload layout details. This is intended as groundwork for a follow= -up
> patch adding a new vartag-based method (e.g., zstd) without scattering=
> special cases in detoast paths.

+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 DecodedExternalToast *decoded)
[...]
+typedef enum ToastDecompressMethod
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TOAST_DECOMP_NONE =3D 0,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TOAST_DECOMP_PGLZ =3D 1,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TOAST_DECOMP_LZ4 =3D 2
+} ToastDecompressMethod;
+
+typedef struct DecodedExternalToast
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0Oid=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0toastrelid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0Oid=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0valueid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uint32=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rawsiz= e;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Decompressed s= ize; for future methods without tcinfo */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uint32=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 extsiz= e;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* On-disk payloa= d size */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ToastDecompressMethod method;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uint8=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0f= lags;
+} DecodedExternalToast;

Yeah, honestly this is a layer I have been thinking about as well as
one option, but contrary to you I have been focusing on putting that
into varatt.h, with the exception of the value being an Oid8.=C2=A0 I think=
that you have an interesting point in focusing your implementation to
be stored in the detoast part, though.=C2=A0 I'd need to spend a bit mo= re
time to see the result this would lead at with the larger 8-byte issue
in mind, but this is something that would come at no real cost as it
has no function pointer redirection compared to what I was first
envisioning on the other thread.=C2=A0 That's especially true if it mak= es
the CompressionId business easier to mold around when adding a new
vartag.

> Why HAS_TCINFO?
> - Previously, =E2=80=9Cis compressed?=E2=80=9D was used as a proxy for= whether the external
> payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO=
> captures payload layout, which is distinct from whether the value is > compressed. This separation is needed for future methods that may stor= e
> external compressed payloads without tcinfo.

It is possible to model the on-memory data as we want.=C2=A0 This
suggestion would be OK with some flags.
--
Michael
--0000000000003dbc7e064740c4a3--