Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bDBPC-0007J2-Pl for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:02:54 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bDBPC-0005Rk-CS for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:02:54 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bDBNV-00032W-PR for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 14:01:10 +0000 Received: from mail-lb0-x235.google.com ([2a00:1450:4010:c04::235]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bDBNR-0005Gw-JT for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 14:01:08 +0000 Received: by mail-lb0-x235.google.com with SMTP id o4so1659879lbp.2 for ; Wed, 15 Jun 2016 07:01:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=mMvZ60T0wfd12eWSUh0ELm/a56E5KzMsXPNVpgxEzPU=; b=P/X5A18XGVdb7PUYTT8m4JasShSm+i8+52GrUN34yUQiIGIrBhxMLO6nz66j7QZWtq 5W6CdnCGO2n5OV7MYev0baO7x9v+LkhMREyLMpq2qQFnrRRwMKH1MItGJ+pRazK1e3EW YclouAc/nErQWKTzLXdfzt2Zq6HXfVMky0SkjXfGudsb+Va6p7Vsdx29W3MoxDXJy8Ez SdyTh02NIV9bTUeFsDFdsFXYhcRk40ldhfEkflGrjN87YjgogeT/vhmRzR6Tm+9CVV7H PvEgEqpf9kc53E5xH9JTQySU6YtUUcQXFBCKzF01sKRkXVHfyC+lXXcKT9dh7D7qkOMh 4snw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=mMvZ60T0wfd12eWSUh0ELm/a56E5KzMsXPNVpgxEzPU=; b=i2sCrG/1tMXRTIPQdnUr74aHhZW+XuafoQLGZy2PEtMmKbb3ok7p8mYu61y198lKyk MnJGqREyT6KNhQ3gTqOjBXzDipaAbCwpQiopmwoyl7VUYKuOG2BgK8Bjitm+oSL2N+mL +RRYGObYThbkMnFkeNdzXxdGr5SfQKkK9iFOC7VAQwufDqVyxjCmNCh8B3hr/iXM6ov/ OBpjSDZzY0QNrY7qg3BH3leCHeJyHE60t0TFg1IPSK7sPaAz6zjRgY7QrMCXQ4pHu6KC QnjJHGtRLyASai332V16b0u/rJw4qkgqz1+1Eab3tkaNAF1kz9pGeE2FHEgVkOeo8YXj 2fbg== X-Gm-Message-State: ALyK8tI94zhZarpJQg14Nbm2D+erBgFsutpS2wGlZyy7WUXEbyl7A2s42sRPLsqg/dKSKcYOj4xt6cgQkMxaEA== X-Received: by 10.194.67.101 with SMTP id m5mr12692131wjt.129.1465999262508; Wed, 15 Jun 2016 07:01:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.70.7 with HTTP; Wed, 15 Jun 2016 07:00:22 -0700 (PDT) In-Reply-To: References: <20160615133833.GA29449@depesz.com> From: Ivan Voras Date: Wed, 15 Jun 2016 16:00:22 +0200 Message-ID: Subject: Re: Indexes for hashes To: julyanto@equnix.co.id Cc: postgres performance list Content-Type: multipart/alternative; boundary=047d7bf0d27c486a590535518d88 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --047d7bf0d27c486a590535518d88 Content-Type: text/plain; charset=UTF-8 Hi, This idea is similar to the substring one, and while it does give excellent performance and small size, it requires application code modifications, so it's out. On 15 June 2016 at 15:58, julyanto SUTANDANG wrote: > Hi Ivan, > > How about using crc32 ? and then index the integer as the result of crc32 > function? you can split the hash into 2 part and do crc32 2x ? and then > create composite index on both integer (the crc32 result) > instead of using 64 char, you only employ 2 integer as index key. > > Regards, > > Jul > > On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras wrote: > >> Hi, >> >> I understand your idea, and have also been thinking about it. Basically, >> existing applications would need to be modified, however slightly, and that >> wouldn't be good. >> >> >> >> >> On 15 June 2016 at 15:38, hubert depesz lubaczewski >> wrote: >> >>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: >>> > I have an application which stores a large amounts of hex-encoded hash >>> > strings (nearly 100 GB of them), which means: >>> >>> Why do you keep them hex encoded, and not use bytea? >>> >>> I made a sample table with 1 million rows, looking like this: >>> >>> Table "public.new" >>> Column | Type | Modifiers >>> ---------+-------+----------- >>> texthex | text | >>> a_bytea | bytea | >>> >>> values are like: >>> >>> $ select * from new limit 10; >>> texthex | >>> a_bytea >>> >>> ------------------------------------------------------------------+-------------------------------------------------------------------- >>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | >>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f >>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | >>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db >>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | >>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 >>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | >>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 >>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | >>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a >>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | >>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea >>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | >>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 >>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | >>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c >>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | >>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 >>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | >>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa >>> (10 rows) >>> >>> created two indexes: >>> create index i1 on new (texthex); >>> create index i2 on new (a_bytea); >>> >>> i1 is 91MB, and i2 is 56MB. >>> >>> Index creation was also much faster - best out of 3 runs for i1 was >>> 4928.982 >>> ms, best out of 3 runs for i2 was 2047.648 ms >>> >>> Best regards, >>> >>> depesz >>> >>> >> > > > -- > > > Julyanto SUTANDANG > > Equnix Business Solutions, PT > (An Open Source an Open Mind Company) > > Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta > Pusat > T: +6221 22866662 F: +62216315281 M: +628164858028 > > > Caution: The information enclosed in this email (and any attachments) may > be legally privileged and/or confidential and is intended only for the use > of the addressee(s). No addressee should forward, print, copy, or otherwise > reproduce this message in any manner that would allow it to be viewed by > any individual not originally listed as a recipient. If the reader of this > message is not the intended recipient, you are hereby notified that any > unauthorized disclosure, dissemination, distribution, copying or the taking > of any action in reliance on the information herein is strictly prohibited. > If you have received this communication in error, please immediately notify > the sender and delete this message.Unless it is made by the authorized > person, any views expressed in this message are those of the individual > sender and may not necessarily reflect the views of PT Equnix Business > Solutions. > --047d7bf0d27c486a590535518d88 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

This idea is similar to the substri= ng one, and while it does give excellent performance and small size, it req= uires application code modifications, so it's out.

=

On 15 June = 2016 at 15:58, julyanto SUTANDANG <julyanto@equnix.co.id> wrote:
Hi Ivan,=C2=A0<= div>
How about using crc32 ? and then index the integer as th= e result of crc32 function? you can split the hash into 2 part and do crc32= 2x ? and then create composite index on both integer (the crc32 result)
instead of using 64 char, you only employ 2 integer as index key.= =C2=A0

Regards,=C2=A0

Jul=

On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras@gmail= .com> wrote:
Hi,

I understand your idea, and have also = been thinking about it. Basically, existing applications would need to be m= odified, however slightly, and that wouldn't be good.




On 15 June 2016 at 15:38, hubert depesz lub= aczewski <depesz@depesz.com> wrote:
On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote= :
> I have an application which stores a large amounts of hex= -encoded hash
> strings (nearly 100 GB of them), which means:

Why do you keep them hex encoded, and not use bytea?

I made a sample table with 1 million rows, looking like this:

=C2=A0 =C2=A0 =C2=A0Table "public.new"
=C2=A0Column=C2=A0 | Type=C2=A0 | Modifiers
---------+-------+-----------
=C2=A0texthex | text=C2=A0 |
=C2=A0a_bytea | bytea |

values are like:

$ select * from new limit 10;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0texthex=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 a_bytea
------------------------------------------------------------------+--------= ------------------------------------------------------------
=C2=A0c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \x= c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
=C2=A061dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x= 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
=C2=A0757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x= 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
=C2=A0fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \x= fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
=C2=A0ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \x= ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
=C2=A011782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x= 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
=C2=A05862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x= 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
=C2=A02d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x= 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
=C2=A02186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x= 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
=C2=A02061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x= 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)

created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);

i1 is 91MB, and i2 is 56MB.

Index creation was also much faster - best out of 3 runs for i1 was 4928.98= 2
ms, best out of 3 runs for i2 was 2047.648 ms

Best regards,

depesz





<= /div>--
<= br>
Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open = Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.=C2= =A0 Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: <= a href=3D"tel:%2B62216315281" value=3D"+62216315281" target=3D"_blank">+622= 16315281 M: +628164858028


Caution: The information enclos= ed in this email (and any attachments) may be legally privileged and/or con= fidential and is intended only for the use of the addressee(s). No addresse= e should forward, print, copy, or otherwise reproduce this message in any m= anner that would allow it to be viewed by any individual not originally lis= ted as a recipient. If the reader of this message is not the intended recip= ient, you are hereby notified that any unauthorized disclosure, disseminati= on, distribution, copying or the taking of any action in reliance on the in= formation herein is strictly prohibited. If you have received this communic= ation in error, please immediately notify the sender and delete this messag= e.Unless it is made by the authorized person, any views expressed in this m= essage are those of the individual sender and may not necessarily reflect t= he views of PT Equnix Business Solutions.

--047d7bf0d27c486a590535518d88--