public inbox for [email protected]
help / color / mirror / Atom feedFrom: Claudio Freire <[email protected]>
To: Ivan Voras <[email protected]>
Cc: postgres performance list <[email protected]>
Subject: Re: Indexes for hashes
Date: Fri, 17 Jun 2016 00:51:03 -0300
Message-ID: <CAGTBQpY7apkp79d2a+mgz-o0MggLrY-nGaMFZBjvGCuvyAA75A@mail.gmail.com> (raw)
In-Reply-To: <CAF-QHFULmVOdrqwtR7AKRnnx6=GbAW7S6v6f4jACEOVENef7NA@mail.gmail.com>
References: <CAF-QHFULmVOdrqwtR7AKRnnx6=GbAW7S6v6f4jACEOVENef7NA@mail.gmail.com>
List-Unsubscribe: <mailto:[email protected]?body=unsub%20pgsql-performance>
On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <[email protected]> wrote:
>
> I have an application which stores a large amounts of hex-encoded hash
> strings (nearly 100 GB of them), which means:
>
> The number of distinct characters (alphabet) is limited to 16
> Each string is of the same length, 64 characters
> The strings are essentially random
>
> Creating a B-Tree index on this results in the index size being larger than
> the table itself, and there are disk space constraints.
>
> I've found the SP-GIST radix tree index, and thought it could be a good
> match for the data because of the above constraints. An attempt to create it
> (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> than 12 hours (while a similar B-tree index takes a few hours at most), so
> I've interrupted it because "it probably is not going to finish in a
> reasonable time". Some slides I found on the spgist index allude that both
> build time and size are not really suitable for this purpose.
I've found that hash btree indexes tend to perform well in these situations:
CREATE INDEX ON t USING btree (hashtext(fieldname));
However, you'll have to modify your queries to query for both, the
hashtext and the text itself:
SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
fieldname = 'blabla';
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Indexes for hashes
In-Reply-To: <CAGTBQpY7apkp79d2a+mgz-o0MggLrY-nGaMFZBjvGCuvyAA75A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox