public inbox for [email protected]
help / color / mirror / Atom feedFrom: Dominique Devienne <[email protected]>
To: Ron Johnson <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date: Fri, 11 Jul 2025 11:00:49 +0200
Message-ID: <CAFCRh-8chpiFs1oOGnOS=wTRd9y0t4cojv2iMwr0Zgo+j1RYTg@mail.gmail.com> (raw)
In-Reply-To: <CANzqJaAWF7hr0OjMHBDpsO6O-AtPLu1RwopBEVYqgBT=Rmwrbg@mail.gmail.com>
References: <CAFCRh-8FOZiycyfX4uPB8MTHQTxqNVuW0pdKBuFNQneEZy1PwQ@mail.gmail.com>
<[email protected]>
<CANzqJaAWF7hr0OjMHBDpsO6O-AtPLu1RwopBEVYqgBT=Rmwrbg@mail.gmail.com>
On Thu, Jul 10, 2025 at 7:11 PM Ron Johnson <[email protected]> wrote:
> On Thu, Jul 10, 2025 at 12:26 PM Adrian Klaver <[email protected]> wrote:
>> On 7/10/25 04:48, Dominique Devienne wrote:
>> > Seems so logical to me, that these hashing functions were available
>> > are aggregates, I can't be the first one to think of that, can it?
>>
>> I've been on this list since late 2002 and I don't recall this ever
>> being brought up. Now it is entirely possible that age has dimmed my
>> recall abilities:) Though a quick search seems to confirm my memory.
>
> What even is an aggregate hash function? (I can imagine a few possibilities, but don't want to assume.)
Well, it's so obvious to me, I wonder if you're baiting me :)
Any hasher/digest inits some internal state, processes bytes,
typically in "streaming-fashion" via successive byte spans (equivalent
to PostgreSQL's bytea), and yields a digest of various length at the
end. The current md5() and pgcrypto.digest() functions roll the x1
init, xN process, and x1 finish into a single call, processing a
single bytea (or perhaps more intelligently for TOAST'ed values, the
2K "rows" of those in streaming-fashion, hopefully. Can a dev
confirm?). As an aggregate, the processing is extended to all values
aggregated. That's it. Obviously order-sensitive, so an explicit ORDER
BY inside the aggregate call is DE RIGEUR, but that's normal. As I
mentioned already, SQLite supports sha3_agg() for almost a year, and
had sha(1|3)_query, which is conceptually similar (although hashes
value types too, since multi-column and dynamically typed), for years
(8+ for sha3, probably decades for sha1).
Basically anyone who knows hashing/digests and has ever written an
aggregate UDF (in SQLite or elsewhere), understands what I'm talking
about. --DD
view thread (3+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
In-Reply-To: <CAFCRh-8chpiFs1oOGnOS=wTRd9y0t4cojv2iMwr0Zgo+j1RYTg@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox