Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bDBMW-000778-1s for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:00:08 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1bDBMV-0000b7-Kk for pgsql-performance@arkaria.postgresql.org; Wed, 15 Jun 2016 14:00:07 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bDBKo-0003Iz-Dg for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 13:58:22 +0000 Received: from mail-lb0-x236.google.com ([2a00:1450:4010:c04::236]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1bDBKk-0002s7-9A for pgsql-performance@postgresql.org; Wed, 15 Jun 2016 13:58:21 +0000 Received: by mail-lb0-x236.google.com with SMTP id xp5so1585167lbb.0 for ; Wed, 15 Jun 2016 06:58:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=equnix-biz.20150623.gappssmtp.com; s=20150623; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc; bh=GqnHuCyvT9Y9itfPYXh9cmQHBB+JMbAG5tRvr3bWH4s=; b=WSBos3uPNXqlNnKtvsPilScU0tw/KspXP1ZOhRRlabcuvkpUoWGE9B2w6AaKtGQjZd 3S5pKom+X5qsRU6QRSs2dDVV8tAcsI2E2N9RzlbFcxcU3W69dwWGQ945c6VDjV1d4RKT G680lO4R1rNK/pl9WEEc64ghp9jsfaJs2U+M5hXKzzFRHCAzCjvQSvg65aaAwK/wEgoc VEIP9o9l64mkP17tU+fiH7ALY+DyNL0kI9zmxj3NYjle7qVrMirNcKMZx8UM7P1AHHt+ vGW+30cOzjV837u+nFgipkTE1TGgKdAsCjhF5u/izKYab2a1MbFvS+xZmZt0vYH6CiMU pNAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:sender:in-reply-to :references:date:message-id:subject:from:to:cc; bh=GqnHuCyvT9Y9itfPYXh9cmQHBB+JMbAG5tRvr3bWH4s=; b=IonnLpkANSEO1lMsVDvPSu6BM6bRnujwfne/hDXKSjWCUrwVHrkAjdaHlJufxMaisV Ni0IblvVkwbRyN3JpLbmv86f1Ap3OtPm1BfVMDYCBiGSBjRQnf4AZxt4w9u2AURBh/0x nC/rrmwwgR5J0K/uMUck8SeygR8KjUD3kJe9ht3I5dGBsEREYg36y83wq5nst0AgJ4Mh QYr96CYRceZJmekCezOTRZfNU7SREhwOwMs2V2KNLiaqBOTC9N039aTPH/BZ9R/kePSK ndacRNZoO+XAQR3NqR8zx7heeU3LuT3I7VmSc1i02u4VcoErOSrpN6acnnlR6w8WGTLu czyQ== X-Gm-Message-State: ALyK8tJaVvohaUG3i/3N1ZSmi21ZLFmbSZi/sA/dJjuEYPY71EGZxQj+k68fVutU+BYmniBmL4ajJlLua2gb8w== MIME-Version: 1.0 X-Received: by 10.28.135.137 with SMTP id j131mr11702260wmd.47.1465999095801; Wed, 15 Jun 2016 06:58:15 -0700 (PDT) Reply-To: julyanto@equnix.co.id Received: by 10.194.24.100 with HTTP; Wed, 15 Jun 2016 06:58:15 -0700 (PDT) X-Originating-IP: [180.245.84.149] In-Reply-To: References: <20160615133833.GA29449@depesz.com> Date: Wed, 15 Jun 2016 20:58:15 +0700 X-Google-Sender-Auth: XbNFKLqXCrvdDLL_hLTlLeh5dSw Message-ID: Subject: Re: Indexes for hashes From: julyanto SUTANDANG To: Ivan Voras Cc: depesz@depesz.com, postgres performance list Content-Type: multipart/alternative; boundary=001a114431e258c0e70535518371 X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a114431e258c0e70535518371 Content-Type: text/plain; charset=UTF-8 Hi Ivan, How about using crc32 ? and then index the integer as the result of crc32 function? you can split the hash into 2 part and do crc32 2x ? and then create composite index on both integer (the crc32 result) instead of using 64 char, you only employ 2 integer as index key. Regards, Jul On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras wrote: > Hi, > > I understand your idea, and have also been thinking about it. Basically, > existing applications would need to be modified, however slightly, and that > wouldn't be good. > > > > > On 15 June 2016 at 15:38, hubert depesz lubaczewski > wrote: > >> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote: >> > I have an application which stores a large amounts of hex-encoded hash >> > strings (nearly 100 GB of them), which means: >> >> Why do you keep them hex encoded, and not use bytea? >> >> I made a sample table with 1 million rows, looking like this: >> >> Table "public.new" >> Column | Type | Modifiers >> ---------+-------+----------- >> texthex | text | >> a_bytea | bytea | >> >> values are like: >> >> $ select * from new limit 10; >> texthex | >> a_bytea >> >> ------------------------------------------------------------------+-------------------------------------------------------------------- >> c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | >> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f >> 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | >> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db >> 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | >> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 >> fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | >> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 >> ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | >> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a >> 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | >> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea >> 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | >> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 >> 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | >> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c >> 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | >> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 >> 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | >> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa >> (10 rows) >> >> created two indexes: >> create index i1 on new (texthex); >> create index i2 on new (a_bytea); >> >> i1 is 91MB, and i2 is 56MB. >> >> Index creation was also much faster - best out of 3 runs for i1 was >> 4928.982 >> ms, best out of 3 runs for i2 was 2047.648 ms >> >> Best regards, >> >> depesz >> >> > -- Julyanto SUTANDANG Equnix Business Solutions, PT (An Open Source an Open Mind Company) Pusat Niaga ITC Roxy Mas Blok C2/42. Jl. KH Hasyim Ashari 125, Jakarta Pusat T: +6221 22866662 F: +62216315281 M: +628164858028 Caution: The information enclosed in this email (and any attachments) may be legally privileged and/or confidential and is intended only for the use of the addressee(s). No addressee should forward, print, copy, or otherwise reproduce this message in any manner that would allow it to be viewed by any individual not originally listed as a recipient. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorized disclosure, dissemination, distribution, copying or the taking of any action in reliance on the information herein is strictly prohibited. If you have received this communication in error, please immediately notify the sender and delete this message.Unless it is made by the authorized person, any views expressed in this message are those of the individual sender and may not necessarily reflect the views of PT Equnix Business Solutions. --001a114431e258c0e70535518371 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Ivan,=C2=A0

How about using crc32 ? = and then index the integer as the result of crc32 function? you can split t= he hash into 2 part and do crc32 2x ? and then create composite index on bo= th integer (the crc32 result)
instead of using 64 char, you only = employ 2 integer as index key.=C2=A0

Regards,=C2= =A0

Jul

=
On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivo= ras@gmail.com> wrote:
Hi,

I understand your idea, and h= ave also been thinking about it. Basically, existing applications would nee= d to be modified, however slightly, and that wouldn't be good.




On 1= 5 June 2016 at 15:38, hubert depesz lubaczewski <depesz@depesz.com>= wrote:
On Wed, Jun 15, 2016= at 11:34:18AM +0200, Ivan Voras wrote:
> I have an application which stores a large amounts of hex= -encoded hash
> strings (nearly 100 GB of them), which means:

Why do you keep them hex encoded, and not use bytea?

I made a sample table with 1 million rows, looking like this:

=C2=A0 =C2=A0 =C2=A0Table "public.new"
=C2=A0Column=C2=A0 | Type=C2=A0 | Modifiers
---------+-------+-----------
=C2=A0texthex | text=C2=A0 |
=C2=A0a_bytea | bytea |

values are like:

$ select * from new limit 10;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0texthex=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 a_bytea
------------------------------------------------------------------+--------= ------------------------------------------------------------
=C2=A0c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f | \x= c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
=C2=A061dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db | \x= 61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
=C2=A0757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 | \x= 757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
=C2=A0fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 | \x= fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
=C2=A0ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a | \x= ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
=C2=A011782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea | \x= 11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
=C2=A05862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 | \x= 5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
=C2=A02d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c | \x= 2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
=C2=A02186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 | \x= 2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
=C2=A02061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa | \x= 2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
(10 rows)

created two indexes:
create index i1 on new (texthex);
create index i2 on new (a_bytea);

i1 is 91MB, and i2 is 56MB.

Index creation was also much faster - best out of 3 runs for i1 was 4928.98= 2
ms, best out of 3 runs for i2 was 2047.648 ms

Best regards,

depesz





--
=


Julyanto SUTANDANG

Equnix Business Solutions, = PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas= Blok C2/42.=C2=A0 Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 2286= 6662 F: +62216315281 M: +628164858028


Caution: The information e= nclosed in this email (and any attachments) may be legally privileged and/o= r confidential and is intended only for the use of the addressee(s). No add= ressee should forward, print, copy, or otherwise reproduce this message in = any manner that would allow it to be viewed by any individual not originall= y listed as a recipient. If the reader of this message is not the intended = recipient, you are hereby notified that any unauthorized disclosure, dissem= ination, distribution, copying or the taking of any action in reliance on t= he information herein is strictly prohibited. If you have received this com= munication in error, please immediately notify the sender and delete this m= essage.Unless it is made by the authorized person, any views expressed in t= his message are those of the individual sender and may not necessarily refl= ect the views of PT Equnix Business Solutions.
--001a114431e258c0e70535518371--