public inbox for [email protected]
help / color / mirror / Atom feedCan we get sha* function over text, that could be used in index?
5+ messages / 4 participants
[nested] [flat]
* Can we get sha* function over text, that could be used in index?
@ 2026-02-18 14:58 hubert depesz lubaczewski <[email protected]>
0 siblings, 2 replies; 5+ messages in thread
From: hubert depesz lubaczewski @ 2026-02-18 14:58 UTC (permalink / raw)
To: pgsql-general
Hi,
So, we have various sha* functions.
And I recently got asked about using them as a based for unique index
over long texts.
Normally one would do it with md5(text), but the person asking wanted to
use sha(). and these functions work only on bytea.
And apparently - we can't.
'text-value'::bytea won't work for some specific text values.
convert_to() isn't immutable.
I figured out that I can do something like:
SELECT
sha256(
string_agg( ascii( t )::text, ',' ORDER BY idx )::bytea
)
FROM
regexp_split_to_table( 'INPUT_STRING', '' ) WITH ORDINALITY AS x ( t, idx );
But that's hardly sane solution.
I've read bug report from 2008:
https://www.postgresql.org/message-id/flat/48D20645.1090503%40gmx.net#ce27df4802c9854a9eb77066a5c7cb...
And while I kinda undestand, create-conversion, server-encoding, I don't
really *grok* why we can't have immutable conversion to bytea. And/or
versions of sha* functions that simply work on text.
Is it doable? How does it work in md5()? Apparently it does also work in
pgcrypto/digest(), so there should be a way to get it in core sha*
functions?
Best regards,
depesz
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Can we get sha* function over text, that could be used in index?
@ 2026-02-18 15:07 Ron Johnson <[email protected]>
parent: hubert depesz lubaczewski <[email protected]>
1 sibling, 0 replies; 5+ messages in thread
From: Ron Johnson @ 2026-02-18 15:07 UTC (permalink / raw)
To: pgsql-general
On Wed, Feb 18, 2026 at 9:58 AM hubert depesz lubaczewski <[email protected]>
wrote:
> Hi,
> So, we have various sha* functions.
>
> And I recently got asked about using them as a based for unique index
> over long texts.
> Normally one would do it with md5(text), but the person asking wanted to
> use sha().
I think I'd push back, asking them if they really need
cryptographically-secure hashing (which they most probably don't).
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Can we get sha* function over text, that could be used in index?
@ 2026-02-19 14:48 Peter Eisentraut <[email protected]>
parent: hubert depesz lubaczewski <[email protected]>
1 sibling, 1 reply; 5+ messages in thread
From: Peter Eisentraut @ 2026-02-19 14:48 UTC (permalink / raw)
To: [email protected]; pgsql-general
On 18.02.26 15:58, hubert depesz lubaczewski wrote:
> And while I kinda undestand, create-conversion, server-encoding, I don't
> really*grok* why we can't have immutable conversion to bytea. And/or
> versions of sha* functions that simply work on text.
Hash functions fundamentally work on a sequence of bytes, so bytea is
the right type. The encoding of text into bytes is complicated, so it
seems better if you handle that yourself depending on the local
requirements.
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Can we get sha* function over text, that could be used in index?
@ 2026-02-19 22:30 Peter J. Holzer <[email protected]>
parent: Peter Eisentraut <[email protected]>
0 siblings, 1 reply; 5+ messages in thread
From: Peter J. Holzer @ 2026-02-19 22:30 UTC (permalink / raw)
To: [email protected]
On 2026-02-19 15:48:33 +0100, Peter Eisentraut wrote:
> On 18.02.26 15:58, hubert depesz lubaczewski wrote:
> > And while I kinda undestand, create-conversion, server-encoding, I don't
> > really*grok* why we can't have immutable conversion to bytea. And/or
> > versions of sha* functions that simply work on text.
>
> Hash functions fundamentally work on a sequence of bytes, so bytea is the
> right type. The encoding of text into bytes is complicated,
Maybe, but it needs to be done anyway, sicne text is ultimately stored
as a sequence of bytes on disk and sent as a sequence of bytes over the
wire. So the code should be present already.
Something like
encode(s text, enc text) -> bytea
Encodes s in encoding enc. E.g. encode('Tröt!', 'utf-8')
produces \x5472c3b67421
decode(d bytea, enc text) -> text
Decodes d assuming encoding enc. E.g.
decode('\x5472c3b67421'::bytea, 'utf-8') produces 'Tröt!'.
might be generally useful.
> so it seems better if you handle that yourself depending on the local
> requirements.
I would probably do that kind of processing in the application code,
but I can see that one might want it in the database.
hjp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Can we get sha* function over text, that could be used in index?
@ 2026-02-21 12:40 Peter J. Holzer <[email protected]>
parent: Peter J. Holzer <[email protected]>
0 siblings, 0 replies; 5+ messages in thread
From: Peter J. Holzer @ 2026-02-21 12:40 UTC (permalink / raw)
To: [email protected]
On 2026-02-19 23:30:17 +0100, Peter J. Holzer wrote:
> Something like
> encode(s text, enc text) -> bytea
> decode(d bytea, enc text) -> text
> might be generally useful.
Please ignore this email. These are basically the same as convert_to()
and convert_from() which already exist.
hjp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2026-02-21 12:40 UTC | newest]
Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-18 14:58 Can we get sha* function over text, that could be used in index? hubert depesz lubaczewski <[email protected]>
2026-02-18 15:07 ` Ron Johnson <[email protected]>
2026-02-19 14:48 ` Peter Eisentraut <[email protected]>
2026-02-19 22:30 ` Peter J. Holzer <[email protected]>
2026-02-21 12:40 ` Peter J. Holzer <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox