public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ivan Voras <[email protected]>
To: postgres performance list <[email protected]>
Subject: Indexes for hashes
Date: Wed, 15 Jun 2016 11:34:18 +0200
Message-ID: <CAF-QHFULmVOdrqwtR7AKRnnx6=GbAW7S6v6f4jACEOVENef7NA@mail.gmail.com> (raw)
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Hi,

I have an application which stores a large amounts of hex-encoded hash
strings (nearly 100 GB of them), which means:

   - The number of distinct characters (alphabet) is limited to 16
   - Each string is of the same length, 64 characters
   - The strings are essentially random

Creating a B-Tree index on this results in the index size being larger than
the table itself, and there are disk space constraints.

I've found the SP-GIST radix tree index, and thought it could be a good
match for the data because of the above constraints. An attempt to create
it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
than 12 hours (while a similar B-tree index takes a few hours at most), so
I've interrupted it because "it probably is not going to finish in a
reasonable time". Some slides I found on the spgist index allude that both
build time and size are not really suitable for this purpose.

My question is: what would be the most size-efficient index for this
situation?


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Indexes for hashes
  In-Reply-To: <CAF-QHFULmVOdrqwtR7AKRnnx6=GbAW7S6v6f4jACEOVENef7NA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox