public inbox for [email protected]  
help / color / mirror / Atom feed
From: Greg Sabino Mullane <[email protected]>
To: Josh Triplett <[email protected]>
Cc: [email protected]
Subject: Re: Storing and comparing columns of cryptographic hashes?
Date: Mon, 8 Apr 2024 10:43:51 -0400
Message-ID: <CAKAnmmLMa-q3F4K42J=yN15VFHyqs5JNbW=tsYQE=ZmdQGVwzg@mail.gmail.com> (raw)
In-Reply-To: <ZhP6OmFXob6W3uLK@localhost>
References: <ZhP6OmFXob6W3uLK@localhost>

On Mon, Apr 8, 2024 at 10:08 AM Josh Triplett <[email protected]> wrote:

> - Is there a way to tell postgresql "this column contains cryptographic
> hashes, so you can do hash joins using any subset of the bits,
> without having to hash them again"? If not, should there be?
>

No, and no. (if I understand your question correctly). You could use a
functional index, I suppose, but seems premature optimization.


> - Is `bit(256)` the right type to use to store 32-byte hash values with no
> additional overhead?
>

No, you would want bytea. I would store the value in a TEXT field, unless
you really worried about space savings. The hexadecimal value will be far
easier to debug and work with, and you can use a simple b-tree index.

- What would be the simplest way, given an input array of hashes (which
>   I may have to pass in as an array and use `unnest`), to filter out all
>   the values that already exist, *and* generate a corresponding bitmap
>   in the same order for present/not-present for the entire array (to
>   send back to the user)? Filtering seems easy enough, but generating
>   the bitmap less so.
>

Something like this:

SELECT array_agg(case when t.bhash is null then 1 else 0 end)
from unnest(array['blakehash1', 'blakehash2', etc...]) as a(x)
left join mytable t on t.bhash = a.x;


> - Does it make more sense to store the values as one row per value, or
>   as one row per group of values?


Hard to answer without knowing more, but I'd lean towards simple and one
row per value.

Your proposal (query db, do external work, update db) also sets of lots of
concurrency red flags, so be mindful of that.

Cheers,
Greg


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Storing and comparing columns of cryptographic hashes?
  In-Reply-To: <CAKAnmmLMa-q3F4K42J=yN15VFHyqs5JNbW=tsYQE=ZmdQGVwzg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox