Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bDBhF-0008GR-HV for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:21:33 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bDBhF-0003ZL-3C for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:21:33 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bDBhE-0003Yp-1d for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 14:21:32 +0000 Received: from mail-lf0-x233.google.com ([2a00:1450:4010:c07::233]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bDBhB-0005jO-1S for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 14:21:30 +0000 Received: by mail-lf0-x233.google.com with SMTP id l188so12028430lfe.2 for ; Wed, 15 Jun 2016 07:21:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=7hbQdP17eT6OFz6A569F39pl7BeifL+PNxEK4qQpf6o=; b=SeKjnA2NA+wjHGYHdhnVpOWF9bkNIcgMgDywPNmI6cmTKMfA575EE5R1LBCf7Wuz9B 0HZFmWKxMkOHo07A6t+SAHWZJ7ULvhmBEMKhPZZkOUnJT7XWW7e1c+wi6z4b6ZKlWfHp WNDfUDbvBWvjiVxaYRjuRO89Pmc7Dtq7IGdk9fWUNgQ917EV+xuwq8w/fpsilRjFauRz HAciafTWQtsDkVY4B+wD8mcWX23tk0DVzw4E/nZoG57n95OqbZmBI6eV0eI9gTxM6o2p mOMwRazlxruofiPRZkQ6XgS3FRZRVw1ePpvAjuROMnlAmqJBp//g3LGTpqOPOv5RYBBk GFng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=7hbQdP17eT6OFz6A569F39pl7BeifL+PNxEK4qQpf6o=; b=MsneWfzmYwSNP10z5LEN7t9hB5M6vl7pqs5WjpvrUzeuDorlOcpKHbQZytK8oYEOpg RxIgPNgPHBIMyW3wJNChn+QgEEhyIF25URFK6li2an8QUHgB6Kl2HNaWmy2x7kyvcgaQ YBTO1UHcBvyr193ADzwjFvA2daCJKDgm/d52QqCLJRxGwQgzXe9uPCuLMKXYZBSPjZj9 zwGXZp9HvdjZGRMQlXXWl8mrEwKrKWuDC+YXLzlVcHtbJyTzfIWS+AF0BMgOoJ3koMJL uBzHLFQ387osc4y/ajJftQU9XnVPn+cqJGy4HB3W+jxofYA5lOs3y1CM4nKJOMysb+5q MNCw== X-Gm-Message-State: ALyK8tJ/dW2QgNUl9DpW0sRynyAmyN/Dr1eDYpOs157K+l470jOFm5ae7FWGPNzfK30HjPwwg3WmYRVZk99prA== X-Received: by 10.28.164.193 with SMTP id n184mr369714wme.12.1466000486267; Wed, 15 Jun 2016 07:21:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.70.7 with HTTP; Wed, 15 Jun 2016 07:20:46 -0700 (PDT) In-Reply-To: References: <20160615133833.GA29449@depesz.com> From: Ivan Voras Date: Wed, 15 Jun 2016 16:20:46 +0200 Message-ID: Subject: Re: Indexes for hashes To: julyanto@equnix.co.id Cc: postgres performance list Content-Type: multipart/alternative; boundary=94eb2c074efa398332053551d6f2 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --94eb2c074efa398332053551d6f2 Content-Type: text/plain; charset=UTF-8 Hi, Just for testing... is there a fast (i.e. written in C) crc32 or a similar small hash function for PostgreSQL? On 15 June 2016 at 16:00, Ivan Voras wrote: > Hi, > > This idea is similar to the substring one, and while it does give > excellent performance and small size, it requires application code > modifications, so it's out. > > > On 15 June 2016 at 15:58, julyanto SUTANDANG > wrote: > >> Hi Ivan, >> >> How about using crc32 ? and then index the integer as the result of crc32 >> function? you can split the hash into 2 part and do crc32 2x ? and then >> create composite index on both integer (the crc32 result) >> instead of using 64 char, you only employ 2 integer as index key. >> >> Regards, >> >> Jul >> >> On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras wrote: >> >>> Hi, >>> >>> I understand your idea, and have also been thinking about it. Basically, >>> existing applications would need to be modified, however slightly, and that >>> wouldn't be good. >>> >>> >>> >>> >>> On 15 June 2016 at 15:38, hubert depesz lubaczewski >>> wrote: >>> >>>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: >>>> > I have an application which stores a large amounts of hex-encoded hash >>>> > strings (nearly 100 GB of them), which means: >>>> >>>> Why do you keep them hex encoded, and not use bytea? >>>> >>>> I made a sample table with 1 million rows, looking like this: >>>> >>>> Table "public.new" >>>> Column | Type | Modifiers >>>> ---------+-------+----------- >>>> texthex | text | >>>> a_bytea | bytea | >>>> >>>> values are like: >>>> >>>> $ select * from new limit 10; >>>> texthex | >>>> a_bytea >>>> >>>> ------------------------------------------------------------------+-------------------------------------------------------------------- >>>> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | >>>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f >>>> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | >>>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db >>>> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | >>>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 >>>> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | >>>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 >>>> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | >>>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a >>>> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | >>>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea >>>> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | >>>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 >>>> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | >>>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c >>>> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | >>>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 >>>> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | >>>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa >>>> (10 rows) >>>> >>>> created two indexes: >>>> create index i1 on new (texthex); >>>> create index i2 on new (a_bytea); >>>> >>>> i1 is 91MB, and i2 is 56MB. >>>> >>>> Index creation was also much faster - best out of 3 runs for i1 was >>>> 4928.982 >>>> ms, best out of 3 runs for i2 was 2047.648 ms >>>> >>>> Best regards, >>>> >>>> depesz >>>> >>>> >>> >> >> >> -- >> >> >> Julyanto SUTANDANG >> >> Equnix Business Solutions, PT >> (An Open Source an Open Mind Company) >> >> Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta >> Pusat >> T: +6221 22866662 F: +62216315281 M: +628164858028 >> >> >> Caution: The information enclosed in this email (and any attachments) may >> be legally privileged and/or confidential and is intended only for the use >> of the addressee(s). No addressee should forward, print, copy, or otherwise >> reproduce this message in any manner that would allow it to be viewed by >> any individual not originally listed as a recipient. If the reader of this >> message is not the intended recipient, you are hereby notified that any >> unauthorized disclosure, dissemination, distribution, copying or the taking >> of any action in reliance on the information herein is strictly prohibited. >> If you have received this communication in error, please immediately notify >> the sender and delete this message.Unless it is made by the authorized >> person, any views expressed in this message are those of the individual >> sender and may not necessarily reflect the views of PT Equnix Business >> Solutions. >> > > --94eb2c074efa398332053551d6f2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

Just for testing... is there a fast= (i.e. written in C) crc32 or a similar small hash function for PostgreSQL?=


On 15 June 2016 at 16:00, Ivan Voras <ivoras@gmail.com> wrote:
Hi,

=
This idea is similar to the substring one, and while it does giv= e excellent performance and small size, it requires application code modifi= cations, so it's out.

<= div class=3D"h5">

= On 15 June 2016 at 15:58, julyanto SUTANDANG <julyanto@equnix.co.id> wrote:
Hi = Ivan,=C2=A0

How about using crc32 ? and then index the i= nteger as the result of crc32 function? you can split the hash into 2 part = and do crc32 2x ? and then create composite index on both integer (the crc3= 2 result)
instead of using 64 char, you only employ 2 integer as = index key.=C2=A0

Regards,=C2=A0

Jul

On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivoras@gmail.= com> wrote:
Hi,

I understand your idea, and have also b= een thinking about it. Basically, existing applications would need to be mo= dified, however slightly, and that wouldn't be good.



=
On 15 June 2016 at 15:38, hubert depesz luba= czewski <depesz@depesz.com> wrote:
On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote= :
> I have an application which stores a large amounts of hex= -encoded hash
> strings (nearly 100 GB of them), which means:

Why do you keep them hex encoded, and not use bytea?

I made a sample table with 1 million rows, looking like this:

=C2=A0 =C2=A0 =C2=A0Table "public.new"
=C2=A0Column=C2=A0 | Type=C2=A0 | Modifiers
---------+-------+-----------
=C2=A0texthex | text=C2=A0 |
=C2=A0a_bytea | bytea |

values are like:

$ select * from new limit 10;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0texthex=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 a_bytea
------------------------------------------------------------------+--------= ------------------------------------------------------------
=C2=A0c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \x= c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
=C2=A061dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x= 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
=C2=A0757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x= 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
=C2=A0fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \x= fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
=C2=A0ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \x= ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
=C2=A011782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x= 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
=C2=A05862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x= 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
=C2=A02d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x= 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
=C2=A02186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x= 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
=C2=A02061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x= 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)

created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);

i1 is 91MB, and i2 is 56MB.

Index creation was also much faster - best out of 3 runs for i1 was 4928.98= 2
ms, best out of 3 runs for i2 was 2047.648 ms

Best regards,

depesz





<= /div>--
<= br>
Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open = Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.=C2= =A0 Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 22866662 F: <= a href=3D"tel:%2B62216315281" value=3D"+62216315281" target=3D"_blank">+622= 16315281 M: +628164858028


Caution: The information enclos= ed in this email (and any attachments) may be legally privileged and/or con= fidential and is intended only for the use of the addressee(s). No addresse= e should forward, print, copy, or otherwise reproduce this message in any m= anner that would allow it to be viewed by any individual not originally lis= ted as a recipient. If the reader of this message is not the intended recip= ient, you are hereby notified that any unauthorized disclosure, disseminati= on, distribution, copying or the taking of any action in reliance on the in= formation herein is strictly prohibited. If you have received this communic= ation in error, please immediately notify the sender and delete this messag= e.Unless it is made by the authorized person, any views expressed in this m= essage are those of the individual sender and may not necessarily reflect t= he views of PT Equnix Business Solutions.


--94eb2c074efa398332053551d6f2--