Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD7Ff-0002Bx-J9 for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 09:36:47 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bD7Ff-0003tp-68 for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 09:36:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bD7Dz-0001Hb-1b for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 09:35:03 +0000 Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bD7Dv-0005Wm-MJ for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 09:35:02 +0000 Received: by mail-wm0-x233.google.com with SMTP id a66so15307540wme.0 for ; Wed, 15 Jun 2016 02:34:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=/7Z5Ms+o+r8Whxp6Hhpm1MLcI5G9PA3kc+qC3II5ons=; b=XE3iqqIYcFFv0ofCZnarVOvl1ay8mHOa7899BGIotQZ/rqD+rZVMFfcbgtskkmp0sL UWwaeBdvqvlavZWAwP6tdN9VUqI9a1Sa+s4owKFiEqCKXBdCNQeS3YoXtyl8ed8KJmHf LiZ+PJ5mltYCbGBvnY704Obbf7KQlix/ytxo3/kZzYv6VSq+r4G1wwWVjxs47OcWqt+P Hf0iswMTpB93o4CZMQYbpjeJaWKt2ADvBeoAPMdK1sWqobwayRPrZK+KMQeLalkhu8/f 66Gz/2aCrN7KmigB2gKjrPqxpXcBqKor7kzZUrhqp4BmqZLz0BlUdWHon2/6Edyd2sWM hl9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=/7Z5Ms+o+r8Whxp6Hhpm1MLcI5G9PA3kc+qC3II5ons=; b=XaEDCgYIEnsVvchrNmzhzq3m8uVcE4vpFg7a5S44fl8YRg7fz/vraMpQj9NHEfjbwg BRj9+P04pnbaXT3c/Qy2ObGTVt5iln+G1DxgITDg/J5PKlxWN94sNmIVe10Ux9s6eReE X+cl0jU8yGBQIu9Ze1OUNzW+n1Pk7LW4jqQ6eErxioAyfxbjFkZnuBo8yrDPcLN2bI1S KSy4UqCqkgDFbXTF4OcXMplQWWK8p9kRBcvmwtpVvK9qp1CO75UP8C94KbYTG8XOEFwu dbFBj6fw0ToN9xsvfKDr2AmjKkCs50OB5Vh3uC14QGpxipypD1K9YYvclylJMLQ8K2sJ 5M3w== X-Gm-Message-State: ALyK8tKNf6ywz/pOb0IndCY/6PZnAjpoZ5NiirHUvasyp/XqnrC2onGgWZcclQ0f39Vm6UMmL2ttOtCWQZssJg== X-Received: by 10.28.164.198 with SMTP id n189mr6132314wme.12.1465983298327; Wed, 15 Jun 2016 02:34:58 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.70.7 with HTTP; Wed, 15 Jun 2016 02:34:18 -0700 (PDT) From: Ivan Voras Date: Wed, 15 Jun 2016 11:34:18 +0200 Message-ID: Subject: Indexes for hashes To: postgres performance list Content-Type: multipart/alternative; boundary=94eb2c081fbcbe5adf05354dd588 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --94eb2c081fbcbe5adf05354dd588 Content-Type: text/plain; charset=UTF-8 Hi, I have an application which stores a large amounts of hex-encoded hash strings (nearly 100 GB of them), which means: - The number of distinct characters (alphabet) is limited to 16 - Each string is of the same length, 64 characters - The strings are essentially random Creating a B-Tree index on this results in the index size being larger than the table itself, and there are disk space constraints. I've found the SP-GIST radix tree index, and thought it could be a good match for the data because of the above constraints. An attempt to create it (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more than 12 hours (while a similar B-tree index takes a few hours at most), so I've interrupted it because "it probably is not going to finish in a reasonable time". Some slides I found on the spgist index allude that both build time and size are not really suitable for this purpose. My question is: what would be the most size-efficient index for this situation? --94eb2c081fbcbe5adf05354dd588 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

I have an application which stores = a large amounts of hex-encoded hash strings (nearly 100 GB of them), which = means:
  • The number of distinct characters (alphabet) is li= mited to 16
  • Each string is of the same length, 64 characters
  • The strings are essentially random
Creating a B-Tree index = on this results in the index size being larger than the table itself, and t= here are disk space constraints.

I've fo= und the SP-GIST radix tree index, and thought it could be a good match for = the data because of the above constraints. An attempt to create it (as in C= REATE INDEX ON t USING spgist(field_name)) apparently takes more than 12 ho= urs (while a similar B-tree index takes a few hours at most), so I've i= nterrupted it because "it probably is not going to finish in a reasona= ble time". Some slides I found on the spgist index allude that both bu= ild time and size are not really suitable for this purpose.

<= /div>
My question is: what would be the most size-efficient index for t= his situation?
--94eb2c081fbcbe5adf05354dd588--