Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYZd8-007VSt-2D for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Dec 2025 00:55:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vYZd7-0068of-1n for pgsql-hackers@arkaria.postgresql.org; Thu, 25 Dec 2025 00:55:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vYZd7-0068oW-0j for pgsql-hackers@lists.postgresql.org; Thu, 25 Dec 2025 00:55:02 +0000 Received: from mail-ej1-x62e.google.com ([2a00:1450:4864:20::62e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vYZd5-002bYQ-1a for pgsql-hackers@lists.postgresql.org; Thu, 25 Dec 2025 00:55:01 +0000 Received: by mail-ej1-x62e.google.com with SMTP id a640c23a62f3a-b7277324054so1016512266b.0 for ; Wed, 24 Dec 2025 16:54:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766624098; x=1767228898; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=GjI23sJYQP37HywDqo5BFTG+kWN4vsDnlqthKOK2syQ=; b=H4FjHA0MmVXfg2OHIYUcGe1lWovp4I58Ja5ElhAUGYgElglfztQTwPaYuKQX5fME0l X7jJnEh2BvJV8V4vVVO0jfEVdzAQT8s0HovC6WKphcInfp1G5P39bbwm0yFqSpFFGAQL 4biL6N6fY2en9dVPK1Tqt5Sr4+2ltx1lwFXsvJNHBMWY59rpdmgwQrMhy2gS5XCZUeF7 CwZk9oBnJxj+HRl5RACj5tlOsD6DeWOy//tBHEjF6jvUM8ZwXrcc9AxPzOjsdVG3ZysR InINNDe+eoC0HBv62/Qk7bkNb5X1wp+b2u4RDlyB7PAquCiNkMWeFP0InMJ9CxC8D7RE GMcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766624098; x=1767228898; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=GjI23sJYQP37HywDqo5BFTG+kWN4vsDnlqthKOK2syQ=; b=ie8EUjzBO/bbUPgL+5Tg3d7RCy63fZkPZSYaaHkkqUTFg4tco9OH/mDfRy+GIKtuz4 tLKQW/z60CjDn0KDlImJI0hvNIzQTaZXfVd/OX1nmJooXHQ+T4UNpDGr38SF40CKv8+f YZTPDvCze9hH5k8b1h9DPNjbcunCAN82pLDdfHaXDYdzyfy0AVeWrc1uVEl5LkT0vKIh H/I56FbxRFQofJA03IGGemdPZbfeXO2CETvX4y+Gt9elgaAEX/ncXLesX9uDI4sO5x5N D2CyjU4OlIIQGVVhp1B2R4STmk1JfevfZFh3OzM74gz7Xdz3Z3ySa+AVmSTIhGhxT4rS 4wEg== X-Forwarded-Encrypted: i=1; AJvYcCWOLb6aY1nKzqP2IdE6iITkphvCO83ir4ZBArMIoFigjLGP5UYcLcOWQJrwqydMkWsTcgX5xBjKkP4fMUTc@lists.postgresql.org X-Gm-Message-State: AOJu0YwYX+3pkDuNU6mzMIlN7eXL2R0thkpCVE5Zzo8p3N5ODcjY8KOA pUIBcEQNSLveKa8lFYYdNT+vbvt6s1te2C9w57WNnuImPmCyV6gv6rXMri4QvgjW8Yhfp7O8fsM at8J5ByH73hXRl2w6Pv6qiB+armKh0EA= X-Gm-Gg: AY/fxX4JQ5VCCY2HygbLtUWombgFU4ZKEoYfifmblmdeN31Y6C+HjARfNKIFA1NlZ3a aG9wCM+G+4oXEUqQmIJAeQLTv1kJCcyFzRvichJcOe8JZd8wR7GEiNZPopbXIsouXjw8BtOpeFt 63sq8yXp8rbdoBqRGDf2ojcZXfjlZ8SLLb9HPLXBKRmezT86jsveJI2zU2LSqXHQrVFo2P60Km6 YUd5qJ4pdLpD/niHX2pMJA/DJj3ODjF/ez6668G9MpKbbW65ZEY4vAOlecA8T8++Rf/lq8= X-Google-Smtp-Source: AGHT+IEWrb+JKBTB0qGmscSZy1PT6V5U+eb15lF5eu0h7HkNgDHMsENbFivYuxR6PDrdcjrMIl9ystqEgpNykb90bYw= X-Received: by 2002:a17:906:7315:b0:b73:398c:c5a7 with SMTP id a640c23a62f3a-b80371a7253mr1937940866b.41.1766624098289; Wed, 24 Dec 2025 16:54:58 -0800 (PST) MIME-Version: 1.0 References: <4c70b0f6-5aba-464c-b145-464a620c1222@eisentraut.org> In-Reply-To: From: Dharin Shah Date: Thu, 25 Dec 2025 01:54:46 +0100 X-Gm-Features: AQt7F2qPQhhvpsbTuw9Rkb3_75BaUjWlnRVUgOFkD6oY8ztDy_3e4CdqISYqnTc Message-ID: Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format To: Michael Paquier Cc: Robert Treat , Peter Eisentraut , pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000d48a3f0646bc3998" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000d48a3f0646bc3998 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Michael & Robert, Agreed =E2=80=94 I don=E2=80=99t think it=E2=80=99s realistic or practical = to talk about deprecating or replacing pglz (or lz4) given on-disk compatibility requirements. > Note that I am not on board with simply reusing varatt_external for > zstd-compressed entries, neither do I think that this is the best move > ever. It makes the core patch simpler, but it makes things like > ToastCompressionId more complicated to think about. If anything, I'd > consider a rename of varatt_external as the best way to go with an > intermediate "translation" structure only used in memory as I am > proposing on the other thread (something that others seem meh enough > about but I am not seeing alternate proposals floating around, > either). This would make things like detoast_external_attr() less > confusing, I think, as the latest patch posted on this thread is > actually proving with its shortcut for toast_fetch_datum as one > example of something I'd rather not do.. On the design: I understand & share the same concerns that we=E2=80=99d end= up with multiple =E2=80=9Csources of truth=E2=80=9D for external compression method= identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes method-specific shortcuts into detoast paths. Would you be OK if I split this into two steps? 1.First, a refactor-only patch introducing a small decoded/in-memory representation of an external TOAST pointer, so detoast/toast code paths don=E2=80=99t have to reason directly about tcinfo vs vartag vs va_extinfo.= This would be a cleanup with no on-disk format change and no behavioral change for existing methods. Is this the same =E2=80=9Ctranslation structure=E2=80= =9D approach you mentioned in the other thread? If you can point me to it, I=E2=80=99ll alig= n with that proposal. 2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD, implemented on top of that abstraction to keep zstd handling centralized and minimize special-casing in detoast. If that direction matches what you had in mind, I can first post the proposed translation structure/API for feedback before respinning the zstd patch. Thanks, Dharin On Thu, Dec 25, 2025 at 1:25=E2=80=AFAM Michael Paquier wrote: > On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote: > > Agreed that I can't see pglz being removed any time soon, if ever. > > Thinking through what a conversion process would look like seems > > unwieldy at best, so I think we definitely need it for backwards > > compatibility, plus I think it is useful to have a self-contained > > option. I'd almost suggest we should look at replacing lz4, but I > > don't think that is significantly easier, it just has a smaller, more > > invested, blast radius. > > Backward-compatibility requirements make a replacement of LZ4 > basically impossible to me, for the same reasons as pglz. We could > not replace the bit used in the va_extinfo to track if LZ4 compression > is used, either. One thing that I do wonder is if it would make > things simpler in the long-run if we introduced a new separated vartag > for LZ4-compressed external TOAST pointers as well. At least we'd > have a leaner design: it means that we have to keep the > varatt_external available on read, but we could update to the new > format when writing entries. Or perhaps that's not worth the > complication based on the last sentence you are writing.. > > > That said, I do suspect ztsd could quickly > > become a popular recommendation and/or default among users / > > consultants / service providers. > > .. Because I strongly suspect that this is going to be true, and that > zstd would just be a better replacement over lz4. That's a trend that > I see is already going on for wal_compression. > > Note that I am not on board with simply reusing varatt_external for > zstd-compressed entries, neither do I think that this is the best move > ever. It makes the core patch simpler, but it makes things like > ToastCompressionId more complicated to think about. If anything, I'd > consider a rename of varatt_external as the best way to go with an > intermediate "translation" structure only used in memory as I am > proposing on the other thread (something that others seem meh enough > about but I am not seeing alternate proposals floating around, > either). This would make things like detoast_external_attr() less > confusing, I think, as the latest patch posted on this thread is > actually proving with its shortcut for toast_fetch_datum as one > example of something I'd rather not do.. > -- > Michael > --000000000000d48a3f0646bc3998 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Michael & Robert,

Agree= d =E2=80=94 I don=E2=80=99t think it=E2=80=99s realistic or practical to ta= lk about deprecating or replacing pglz (or lz4) given on-disk compatibility= requirements.

> Note that I am not on board with simply reusing = varatt_external for
> zstd-compressed entries, neither do I think tha= t this is the best move
>=C2=A0ever.=C2=A0 It makes the core patch si= mpler, but it makes things like
>=C2=A0ToastCompressionId more compli= cated to think about.=C2=A0 If anything, I'd
>=C2=A0consider a re= name of varatt_external as the best way to go with an
>=C2=A0intermed= iate "translation" structure only used in memory as I am
>= =C2=A0proposing on the other thread (something that others seem meh enough<= br>>=C2=A0about but I am not seeing alternate proposals floating around,=
>=C2=A0either).=C2=A0 This would make things like detoast_external_a= ttr() less
>=C2=A0confusing, I think, as the latest patch posted on t= his thread is
>=C2=A0actually proving with its shortcut for toast_fet= ch_datum as one
>=C2=A0example of something I'd rather not do..
On the design: I understand & share the same concerns that we=E2= =80=99d end up with multiple =E2=80=9Csources of truth=E2=80=9D for externa= l compression method identification (pglz/lz4 via va_extinfo bits, zstd via= vartag), and that this pushes method-specific shortcuts into detoast paths= .

Would you be OK if I split this into two steps?

1.First, a = refactor-only patch introducing a small decoded/in-memory representation of= an external TOAST pointer, so detoast/toast code paths don=E2=80=99t have = to reason directly about tcinfo vs vartag vs va_extinfo. This would be a cl= eanup with no on-disk format change and no behavioral change for existing m= ethods. Is this the same =E2=80=9Ctranslation structure=E2=80=9D approach y= ou mentioned in the other thread? If you can point me to it, I=E2=80=99ll a= lign with that proposal.

2. Then, a follow-up patch adding zstd usin= g VARTAG_ONDISK_ZSTD, implemented on top of that abstraction to keep zstd h= andling centralized and minimize special-casing in detoast.
If that dire= ction matches what you had in mind, I can first post the proposed translati= on structure/API for feedback before respinning the zstd patch.

Than= ks,
Dharin


On Thu, De= c 25, 2025 at 1:25=E2=80=AFAM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 24, 2025 at 11:50:48AM -0500, = Robert Treat wrote:
> Agreed that I can't see pglz being removed any time soon, if ever.=
> Thinking through what a conversion process would look like seems
> unwieldy at best, so I think we definitely need it for backwards
> compatibility, plus I think it is useful to have a self-contained
> option. I'd almost suggest we should look at replacing lz4, but I<= br> > don't think that is significantly easier, it just has a smaller, m= ore
> invested, blast radius.

Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz.=C2=A0 We could not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either.=C2=A0 One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well.=C2=A0 At least we'd=
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries.=C2=A0 Or perhaps that's not worth the
complication based on the last sentence you are writing..=C2=A0

> That said, I do suspect ztsd could quickly
> become a popular recommendation and/or default among users /
> consultants / service providers.

..=C2=A0 Because I strongly suspect that this is going to be true, and that=
zstd would just be a better replacement over lz4.=C2=A0 That's a trend = that
I see is already going on for wal_compression.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever.=C2=A0 It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about.=C2=A0 If anything, I= 9;d
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am<= br> proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either).=C2=A0 This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael
--000000000000d48a3f0646bc3998--