MIME-Version: 1.0
Date: Wed, 20 Apr 2016 14:41:54 -0400
Message-ID: 
 <CANcrS5pR1P1Tj=e-RQQ=FF3WPAy_fyruS0YJer-+iJHxR1JAiA@mail.gmail.com>
Subject: Performant queries on table with many boolean columns
From: Rob Imig <rimig88@gmail.com>
To: pgsql-performance@postgresql.org
Content-Type: multipart/alternative; boundary=001a11c2ce98a3c03f0530eef206
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a11c2ce98a3c03f0530eef206
Content-Type: text/plain; charset=UTF-8

Hey all,

New to the lists so please let me know if this isn't the right place for
this question.

I am trying to understand how to structure a table to allow for optimal
performance on retrieval. The data will not change frequently so you can
basically think of it as static and only concerned about optimizing reads
from basic SELECT...WHERE queries.

The data:

   - ~20 million records
   - Each record has 1 id and ~100 boolean properties
   - Each boolean property has ~85% of the records as true


The retrieval will always be something like "SELECT id FROM <table> WHERE
<conditions>.

<conditions> will be some arbitrary set of the ~100 boolean columns and you
want the ids that match all of the conditions (true for each boolean
column). Example:
WHERE prop1 AND prop18 AND prop24


The obvious thing seems to make a table with ~100 columns, with 1 column
for each boolean property. Though, what type of indexing strategy would one
use on that table? Doesn't make sense to do BTREE. Is there a better way to
structure it?


Any and all advice/tips/questions appreciated!

Thanks,
Rob

--001a11c2ce98a3c03f0530eef206
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hey all,<div><br></div><div>New to the lists so please let=
 me know if this isn&#39;t the right place for this question.</div><div><br=
></div><div>I am trying to understand how to structure a table to allow for=
 optimal performance on retrieval. The data will not change frequently so y=
ou can basically think of it as static and only concerned about optimizing =
reads from basic SELECT...WHERE queries.</div><div><br></div><div>The data:=
</div><div><ul><li>~20 million records<br></li><li>Each record has 1 id and=
 ~100 boolean properties<br></li><li>Each boolean property has ~85% of the =
records as true<br></li></ul></div><div><br></div><div>The retrieval will a=
lways be something like &quot;SELECT id FROM &lt;table&gt; WHERE &lt;condit=
ions&gt;.</div><div><br></div><div>&lt;conditions&gt; will be some arbitrar=
y set of the ~100 boolean columns and you want the ids that match all of th=
e conditions (true for each boolean column). Example:=C2=A0</div><div>WHERE=
 prop1 AND prop18 AND prop24</div><div><br></div><div><br></div><div>The ob=
vious thing seems to make a table with ~100 columns, with 1 column for each=
 boolean property. Though, what type of indexing strategy would one use on =
that table? Doesn&#39;t make sense to do BTREE. Is there a better way to st=
ructure it?</div><div><br></div><div><br></div><div>Any and all advice/tips=
/questions appreciated!</div><div><br></div><div>Thanks,</div><div>Rob</div=
><div><br></div></div>

--001a11c2ce98a3c03f0530eef206--