Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1atbab-0008GZ-LF for pgsql-performance@arkaria.postgresql.org; Fri, 22 Apr 2016 13:57:45 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1atbab-0001eF-3O for pgsql-performance@arkaria.postgresql.org; Fri, 22 Apr 2016 13:57:45 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1atbaZ-0001d6-QZ for pgsql-performance@postgresql.org; Fri, 22 Apr 2016 13:57:44 +0000 Received: from mail-oi0-x234.google.com ([2607:f8b0:4003:c06::234]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1atbaW-0004Yc-GD for pgsql-performance@postgresql.org; Fri, 22 Apr 2016 13:57:42 +0000 Received: by mail-oi0-x234.google.com with SMTP id x201so117470540oif.3 for ; Fri, 22 Apr 2016 06:57:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=6lxtt2nGvUOp1GRZop3N5MErrtR0n/LKJO4D8bTFe5Q=; b=gz1d1irBDEQ3CEP8sdw7KI3AMAgj4S99cnuXxwHzWaqzCx4mAX1J5hBzyZjH0lFqWB 05LxDlnlyHHJHiTObCn2NmaPrhh9sIQhjYijFNAvvuBVRDwjNGfOF3K8Yww9zjfC96Jj phyuJ1Ylv825zsa/Mq4NDcM5hkPF9bOIyLJoc3XKDp7pZnxdym337ApwBFhTmPW1UC9s kSTrsOH+/h0UB8q6oJ++vxUDpsb54+4bQquiv0bT/G+6PkSha7ardrjRtZubxRlv3BFc Yk2ORbgKKCKcpQDVUVdhAchetDIBgIIPqR4lk1OsbspQePl+y1XPZWj+gErj3mK/RecS Vz1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=6lxtt2nGvUOp1GRZop3N5MErrtR0n/LKJO4D8bTFe5Q=; b=RdTjdjwqJz0rFrFCD7+YP1KN+dctBEIe5CEGiKGMHWmMbZ4ZDJ8/BEYt9GPCBz0C+y hjIIzxlEo4Be7mNKfS8ocf33XHiDD6dwYqyB5EF59ye6W67GvHBihc7wnYVje5GAjxsC sUFwpVVOcZ8LZkGHinzBuxv+FS7ohL0MboJ+5KZN8kmy038skJ8zz+VIQDqBmcRDbCs1 ls/oBR610XWlZvTlWwD8X9/E0i52SqRRZxwKovb2dE6r1z9mpMblgX0iI55I7HsK/lXf NI8jG3J2NeauFyIsX6MUie1zwCVtb4bCZ4KC26gnznsxG/pluZlXsWDc5OPyF3EtVWry bofg== X-Gm-Message-State: AOPr4FV5akZfxelbeBm3iZbgaPTBQFVcMeQI05Hdl5QN4J/cqgQA3sZMv16TO9ofWqY6KcfeHxsynrwWHK3VXA== MIME-Version: 1.0 X-Received: by 10.157.16.43 with SMTP id h40mr8567257ote.185.1461333459293; Fri, 22 Apr 2016 06:57:39 -0700 (PDT) Received: by 10.202.171.12 with HTTP; Fri, 22 Apr 2016 06:57:39 -0700 (PDT) In-Reply-To: References: <5717D07E.3070802@sigaev.ru> Date: Fri, 22 Apr 2016 09:57:39 -0400 Message-ID: Subject: Re: Performant queries on table with many boolean columns From: Rob Imig To: Jeff Janes Cc: Teodor Sigaev , "pgsql-performance@postgresql.org" Content-Type: multipart/alternative; boundary=001a1135bf38bd601b0531133597 X-Pg-Spam-Score: -2.4 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a1135bf38bd601b0531133597 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just to followup where I'm at, I've constructed a new column which is a 100 bit bitstring representing all the flags. Created a b-tree index on that column and can now do super fast lookups (2) for specific scenarios however getting the behavior I need would require a huge amount of OR conditions (as Rick mentioned earlier). Another option is to do bitwiser operators (3) but that seems really slow. Not sure how I can speed that up. For my specific use-case I think we are going to be able to shard by a category so performance will be acceptable, so this is turning into an educational exercise. *1. SELECT..WHERE on each boolean property* rimig=3D# explain analyze select bitstr from bloomtest_bi where prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64; QUERY PLAN ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= --------------------------------------- Seq Scan on bloomtest_bi (cost=3D0.00..350770.00 rows=3D6 width=3D18) (ac= tual time=3D229.365..2576.391 rows=3D9 loops=3D1) Filter: (prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64) Rows Removed by Filter: 11999991 Total runtime: 2576.420 ms (4 rows) *Time: 2577.160 ms* *2. SELECT..WHERE on exact bitstring match* (standard b-tree index on bitstr so obviously fast here) This would mean I'd have to OR all the conditions which is a bit gnarly. rimig=3D# explain analyze select bitstr from bloomtest_bi where bitstr =3D '11111111111111111111111111111111111111111111111111111111111111111011011101= 110011111100110001101000111'; QUERY PLAN ---------------------------------------------------------------------------= --------------------------------------------------------------------- Index Only Scan using i_gist on bloomtest_bi (cost=3D0.56..8.58 rows=3D1 width=3D18) (actual time=3D0.040..0.040 rows=3D1 loops=3D1) Index Cond: (bitstr =3D B'1111111111111111111111111111111111111111111111111111111111111111101101110= 1110011111100110001101000111'::bit varying) Heap Fetches: 1 Total runtime: 0.056 ms (4 rows) *Time: 0.443 ms* *3. SELECT..WHERE using bitwise operator* This gets all the results I need however it's slow. rimig=3D# explain analyze select bitstr from bloomtest_bi where (bitstr & '11111111111111111111111111111111111111111111111111111111111111111000000000= 000000000000000000000000000' ) =3D '11111111111111111111111111111111111111111111111111111111111111111000000000= 000000000000000000000000000'; QUERY PLAN ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------- Seq Scan on bloomtest_bi (cost=3D0.00..410770.00 rows=3D60000 width=3D18) (actual time=3D856.595..9359.566 rows=3D9 loops=3D1) Filter: (((bitstr)::"bit" & B'1111111111111111111111111111111111111111111111111111111111111111100000000= 0000000000000000000000000000'::"bit") =3D B'1111111111111111111111111111111111111111111111111111111111111111100000000= 0000000000000000000000000000'::"bit") Rows Removed by Filter: 11999991 Total runtime: 9359.593 ms (4 rows) *Time: 9360.072 ms* On Thu, Apr 21, 2016 at 3:34 PM, Rob Imig wrote: > Hey all, > > Lots of interesting suggestions! I'm loving it. > > Just came back to this a bit earlier today and made a sample table to see > what non-index performance would be. Constructed data just like above (us= ed > 12M rows and 80% true for all 100 boolean columns) > > Here's an analyze for what I'd expect to be the types of queries that I'l= l > be handling from the frontend. I would expect around 40-70 properties per > query. > > Now I'm going to start experimenting with some ideas above and other > tuning. This isn't as bad as I thought it would be, though would like to > get this under 200ms. > > rimig=3D# explain analyze select count(*) from bloomtest where prop0 AND > prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND pro= p8 > AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop= 15 > AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND > prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28 > AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND > prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41 > AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND > prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54 > AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND > prop61 AND prop62 AND prop63 AND prop64; > > Aggregate (cost=3D351563.03..351563.04 rows=3D1 width=3D0) (actual > time=3D2636.829..2636.829 rows=3D1 loops=3D1) > > -> Seq Scan on bloomtest (cost=3D0.00..351563.02 rows=3D3 width=3D0) > (actual time=3D448.200..2636.811 rows=3D9 loops=3D1) > > Filter: (prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5 > AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12 > AND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND > prop19 AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25 > AND prop26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND > prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38 > AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND > prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 AND prop51 > AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57 AND > prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop64) > > Rows Removed by Filter: 11999991 > > Total runtime: 2636.874 ms > > On Thu, Apr 21, 2016 at 12:45 PM, Jeff Janes wrote= : > >> On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev wrote= : >> >> >> >> The obvious thing seems to make a table with ~100 columns, with 1 >> column >> >> for each boolean property. Though, what type of indexing strategy wou= ld >> >> one use on that table? Doesn't make sense to do BTREE. Is there a >> better >> >> way to structure it? >> >> >> > looks like a deal for contrib/bloom index in upcoming 9.6 release >> >> Not without doing a custom compilation with an increased INDEX_MAX_KEYS: >> >> ERROR: cannot use more than 32 columns in an index >> >> But even so, I'm skeptical this would do better than a full scan. It >> would be interesting to test that. >> >> Cheers, >> >> Jeff >> > > --001a1135bf38bd601b0531133597 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Just to followup where I'm at, I've constructed a = new column which is a 100 bit bitstring representing all the flags. Created= a b-tree index on that column and can now do super fast lookups (2) for sp= ecific scenarios however getting the behavior I need would require a huge a= mount of OR conditions (as Rick mentioned earlier). Another option is to do= bitwiser operators (3) but that seems really slow. Not sure how I can spee= d that up.

For my specific use-case I think we are going= to be able to shard by a category so performance will be acceptable, so th= is is turning into an educational exercise.

1. SE= LECT..WHERE on each boolean property

rimig=3D# explain analyze select bitstr from= bloomtest_bi where prop0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5= AND prop6 AND prop7 AND prop8 AND prop9 AND prop10 AND prop11 AND prop12 A= ND prop13 AND prop14 AND prop15 AND prop16 AND prop17 AND prop18 AND prop19= AND prop20 AND prop21 AND prop22 AND prop23 AND prop24 AND prop25 AND prop= 26 AND prop27 AND prop28 AND prop29 AND prop30 AND prop31 AND prop32 AND pr= op33 AND prop34 AND prop35 AND prop36 AND prop37 AND prop38 AND prop39 AND = prop40 AND prop41 AND prop42 AND prop43 AND prop44 AND prop45 AND prop46 AN= D prop47 AND prop48 AND prop49 AND prop50 AND prop51 AND prop52 AND prop53 = AND prop54 AND prop55 AND prop56 AND prop57 AND prop58 AND prop59 AND prop6= 0 AND prop61 AND prop62 AND prop63 AND prop64;

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0

--------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ----------------------------------------------------------------------

=C2=A0Seq Scan on bloomtest_bi=C2=A0 (cost= =3D0.00..350770.00 rows=3D6 width=3D18) (actual time=3D229.365..2576.391 ro= ws=3D9 loops=3D1)

=C2=A0=C2=A0 Filter: (prop0 AND prop1 AND pr= op2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8 AND prop9 A= ND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15 AND prop16= AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND prop22 AND prop= 23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28 AND prop29 AND pr= op30 AND prop31 AND prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND = prop37 AND prop38 AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AN= D prop44 AND prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 = AND prop51 AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop5= 7 AND prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND pro= p64)

=C2=A0=C2=A0 Rows Removed by Filter: 1199999= 1

=C2=A0Total runtime: 2576.420 ms

(4 rows)


Time: 2577.160 ms



2. SELECT..WHERE on exact bitstring= match (standard b-tree index on bitstr so obviously fast here)

This would mean I'd have to OR all t= he conditions which is a bit gnarly.


rimig=3D# explain analyze select bitstr from= bloomtest_bi where bitstr =3D '111111111111111111111111111111111111111= 11111111111111111111111111011011101110011111100110001101000111';=

=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0

--------------------------------------------= ---------------------------------------------------------------------------= -------------------------

=C2=A0Index Only Scan using i_gist on bloomt= est_bi=C2=A0 (cost=3D0.56..8.58 rows=3D1 width=3D18) (actual time=3D0.040..= 0.040 rows=3D1 loops=3D1)

=C2=A0=C2=A0 Index Cond: (bitstr =3D B'1= 111111111111111111111111111111111111111111111111111111111111111101101110111= 0011111100110001101000111'::bit varying)

=C2=A0=C2=A0 Heap Fetches: 1

=C2=A0Total runtime: 0.056 ms

(4 rows)


Time: 0.443 ms


3. SELECT..WHERE using bitwise operator<= /b>=C2=A0

This gets all the results I n= eed however it's slow.=C2=A0

rimig=3D= # explain analyze select bitstr from bloomtest_bi where (bitstr & '= 111111111111111111111111111111111111111111111111111111111111111110000000000= 00000000000000000000000000' ) =3D '11111111111111111111111111111111= 111111111111111111111111111111111000000000000000000000000000000000000';=

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0

---------------------------------------------------------------= ---------------------------------------------------------------------------= ---------------------------------------------------------------------------= ----------------------------------------------

=C2=A0Seq Scan on bloomtest_bi=C2=A0 (cost=3D0.00..410770.00 r= ows=3D60000 width=3D18) (actual time=3D856.595..9359.566 rows=3D9 loops=3D1= )

=C2=A0=C2=A0 Filter: (((bitstr):= :"bit" & B'1111111111111111111111111111111111111111111111= 1111111111111111111000000000000000000000000000000000000'::"bit&quo= t;) =3D B'1111111111111111111111111111111111111111111111111111111111111= 1111000000000000000000000000000000000000'::"bit")

<= p class=3D"">=C2=A0=C2=A0 Rows Removed by Filter: 11999991=

=C2=A0Total runtime: 9359.593 ms<= /span>

(4 rows)

<= span class=3D"">

Time: 9360.072 ms



On Thu, Apr 21, 2016 at 3:34 PM, Rob Imig <= rimig88@gmail.com> wrote:
Hey all,

Lots of interesting suggestions! I'm lovi= ng it.

Just came back to this a bit earlier today and ma= de a sample table to see what non-index performance would be. Constructed d= ata just like above (used 12M rows and 80% true for all 100 boolean columns= )

Here's an analyze for what I'd expect to= be the types of queries that I'll be handling from the frontend. I wou= ld expect around 40-70 properties per query.

Now I= 'm going to start experimenting with some ideas above and other tuning.= This isn't as bad as I thought it would be, though would like to get t= his under 200ms.

rimig=3D# explain analyze select count(*) from bloomtest where pro= p0 AND prop1 AND prop2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AN= D prop8 AND prop9 AND prop10 AND prop11 AND prop12 AND prop13 AND prop14 AN= D prop15 AND prop16 AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 = AND prop22 AND prop23 AND prop24 AND prop25 AND prop26 AND prop27 AND prop2= 8 AND prop29 AND prop30 AND prop31 AND prop32 AND prop33 AND prop34 AND pro= p35 AND prop36 AND prop37 AND prop38 AND prop39 AND prop40 AND prop41 AND p= rop42 AND prop43 AND prop44 AND prop45 AND prop46 AND prop47 AND prop48 AND= prop49 AND prop50 AND prop51 AND prop52 AND prop53 AND prop54 AND prop55 A= ND prop56 AND prop57 AND prop58 AND prop59 AND prop60 AND prop61 AND prop62= AND prop63 AND prop64;

=C2=A0Aggregate=C2=A0 (cost=3D351563.03..351563.04 rows=3D1 width= =3D0) (actual time=3D2636.829..2636.829 rows=3D1 loops=3D1)

=C2=A0=C2=A0 ->=C2=A0 Seq Scan on bloomtest=C2=A0 (cost=3D0.00.= .351563.02 rows=3D3 width=3D0) (actual time=3D448.200..2636.811 rows=3D9 lo= ops=3D1)

=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: (prop0 AND prop1 AND pro= p2 AND prop3 AND prop4 AND prop5 AND prop6 AND prop7 AND prop8 AND prop9 AN= D prop10 AND prop11 AND prop12 AND prop13 AND prop14 AND prop15 AND prop16 = AND prop17 AND prop18 AND prop19 AND prop20 AND prop21 AND prop22 AND prop2= 3 AND prop24 AND prop25 AND prop26 AND prop27 AND prop28 AND prop29 AND pro= p30 AND prop31 AND prop32 AND prop33 AND prop34 AND prop35 AND prop36 AND p= rop37 AND prop38 AND prop39 AND prop40 AND prop41 AND prop42 AND prop43 AND= prop44 AND prop45 AND prop46 AND prop47 AND prop48 AND prop49 AND prop50 A= ND prop51 AND prop52 AND prop53 AND prop54 AND prop55 AND prop56 AND prop57= AND prop58 AND prop59 AND prop60 AND prop61 AND prop62 AND prop63 AND prop= 64)

=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 Rows Removed by Filter: 11999991=

=C2=A0Total runtime: 2636.874 ms


On Thu, Apr 21, 2016 at 12:45 PM, Jeff Janes <jeff.ja= nes@gmail.com> wrote:
On Wed, Apr 20, 2016 at 11:54 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>>
>> The obvious thing seems to make a table with ~100 columns, with 1 = column
>> for each boolean property. Though, what type of indexing strategy = would
>> one use on that table? Doesn't make sense to do BTREE. Is ther= e a better
>> way to structure it?
>>
> looks like a deal for contrib/bloom index in upcoming 9.6 release

Not without doing a custom compilation with an increased INDEX_MAX_K= EYS:

ERROR:=C2=A0 cannot use more than 32 columns in an index

But even so, I'm skeptical this would do better than a full scan.=C2=A0= It
would be interesting to test that.

Cheers,

Jeff


--001a1135bf38bd601b0531133597--