public inbox for [email protected]  
help / color / mirror / Atom feed
From: Dominique Devienne <[email protected]>
To: Ron Johnson <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date: Fri, 11 Jul 2025 14:11:44 +0200
Message-ID: <CAFCRh-9dMQC99F22VreuOF9sv7kNjqVzXvaHZQerk0aBHUyhTA@mail.gmail.com> (raw)
In-Reply-To: <CAFCRh-8chpiFs1oOGnOS=wTRd9y0t4cojv2iMwr0Zgo+j1RYTg@mail.gmail.com>
References: <CAFCRh-8FOZiycyfX4uPB8MTHQTxqNVuW0pdKBuFNQneEZy1PwQ@mail.gmail.com>
	<[email protected]>
	<CANzqJaAWF7hr0OjMHBDpsO6O-AtPLu1RwopBEVYqgBT=Rmwrbg@mail.gmail.com>
	<CAFCRh-8chpiFs1oOGnOS=wTRd9y0t4cojv2iMwr0Zgo+j1RYTg@mail.gmail.com>

On Fri, Jul 11, 2025 at 11:00 AM Dominique Devienne <[email protected]> wrote:
> The current md5() and pgcrypto.digest() functions roll the x1
> init, xN process, and x1 finish into a single call, processing a
> single bytea (or perhaps more intelligently for TOAST'ed values, the
> 2K "rows" of those in streaming-fashion, hopefully. Can a dev confirm?)

FWIW, I've [asked ChatGPT about that][1], and assuming it's right (md5
and pgcrypto.digest not leveraging the "substring-optimization" on
TOASTED bytea), that's an unfortunate lost opportunity, especially for
byteas reaching close to the 1GB limit. And again (sorry to lay it on
thick...), when required to manually chunk for sizes > 1GB, the lack
of aggregate is a bit crippling, I'm afraid.

So again, can a dev confirm what ChatGPT blurted out?

And if true, any interest in improving that for better TOAST support
for true streaming hashing for current scalar digests?

And of course, the main point of this thread, add (true streaming)
aggregate support in a future version?

Thanks, --DD

[1]: https://chatgpt.com/share/6870fe03-416c-800e-8633-a76e478a794a






view thread (3+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
  In-Reply-To: <CAFCRh-9dMQC99F22VreuOF9sv7kNjqVzXvaHZQerk0aBHUyhTA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox