public inbox for [email protected]
help / color / mirror / Atom feedFrom: Merlin Moncure <[email protected]>
To: Dominique Devienne <[email protected]>
Cc: [email protected]
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date: Thu, 10 Jul 2025 13:02:59 -0600
Message-ID: <CAHyXU0ztigAau2x5zut6tdjNVXENEGR00F-Mvx4X-20zFz-NnQ@mail.gmail.com> (raw)
In-Reply-To: <CAFCRh-8FOZiycyfX4uPB8MTHQTxqNVuW0pdKBuFNQneEZy1PwQ@mail.gmail.com>
References: <CAFCRh-8FOZiycyfX4uPB8MTHQTxqNVuW0pdKBuFNQneEZy1PwQ@mail.gmail.com>
On Thu, Jul 10, 2025 at 5:48 AM Dominique Devienne <[email protected]>
wrote:
> We store scientific information in PostgreSQL, and some of that is
> bytea and large, thus we must "chunk it" both for performance, and not
> be limited to 1GB (we do exceed that, in rare occasions).
>
> Recently I added md5/sha1 hashing support for such values (for various
> reasons, to track corruptions in our ETLs, now fixed, but also in the
> future for custom smart sync and such), and was surprised to discover
> there are no aggregate versions of those functions, neither the
> built-in md5 one (now with bytea overload), nor for pgcrypto. Did I
> miss something?
>
> Any chance this might be added in the future?
>
> Seems so logical to me, that these hashing functions were available
> are aggregates, I can't be the first one to think of that, can it?
>
> Thanks, --DD
>
> PS: The holly-grail IMHO, would be built-in support for hashing, with
> intelligent lazy-compute and persistence correctly reset on changes.
> Probably opt-in.
>
> PPS: Built-in Support for xxh64, or at least in official pgcrypto,
> would also be nice. With aggregate!
>
> PPPS: I remember Oracle LOBs supporting a built-in implicit mime-type
> attribute on them, which was used as the content-type with served over
> the built-in WebDAV extension for the DB. I wish PostgreSQL had
> something like that too.
>
> PPPPS: the lo extension is not viable for us, FWIW. All the above
> should be opt-in on bytea columns IMHO.
>
Does it have to match the md5 of the 'unchunked' variant exactly? If not,
maybe you can rig a custom aggregate that would just "hash amend" the
chunks using the n-1 chunk has as salt, this would be fast and easy, at the
cost of not matching the unchunked variant.
I had to do something very similar with compression, I had a need to be
able to compress bytea values with lz4 at SQL level due to limited support
for extensions in the cloud. it works great...if a bit slow, and requires
the database to handle the decompression.
merlin
view thread (2+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
In-Reply-To: <CAHyXU0ztigAau2x5zut6tdjNVXENEGR00F-Mvx4X-20zFz-NnQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox