public inbox for [email protected]  
help / color / mirror / Atom feed
From: Jeff Janes <[email protected]>
To: Rob Imig <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Performant queries on table with many boolean columns
Date: Thu, 21 Apr 2016 09:36:37 -0700
Message-ID: <CAMkU=1yiaPZAuePwbD_=GS-JZ7r5hzpc9FhZbMX2DEBqTU=BRw@mail.gmail.com> (raw)
In-Reply-To: <CANcrS5pR1P1Tj=e-RQQ=FF3WPAy_fyruS0YJer-+iJHxR1JAiA@mail.gmail.com>
References: <CANcrS5pR1P1Tj=e-RQQ=FF3WPAy_fyruS0YJer-+iJHxR1JAiA@mail.gmail.com>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

On Wed, Apr 20, 2016 at 11:41 AM, Rob Imig <[email protected]> wrote:
> Hey all,
>
> New to the lists so please let me know if this isn't the right place for
> this question.
>
> I am trying to understand how to structure a table to allow for optimal
> performance on retrieval. The data will not change frequently so you can
> basically think of it as static and only concerned about optimizing reads
> from basic SELECT...WHERE queries.
>
> The data:
>
> ~20 million records
> Each record has 1 id and ~100 boolean properties
> Each boolean property has ~85% of the records as true
>
>
> The retrieval will always be something like "SELECT id FROM <table> WHERE
> <conditions>.
>
> <conditions> will be some arbitrary set of the ~100 boolean columns and you
> want the ids that match all of the conditions (true for each boolean
> column). Example:
> WHERE prop1 AND prop18 AND prop24


Is 3 a typical number of conditions to have?

85%^3 is 61.4%, so you are fetching most of the table.  At that point,
I think I would give up on indexes and just expect to do a full table
scan each time.   Which means a single column
bit-string data type might be the way to go, although the construction
of the queries would then be more cumbersome, especially if you will
do by hand.

I think the only way to know for sure is to write a few scripts to benchmark it.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Performant queries on table with many boolean columns
  In-Reply-To: <CAMkU=1yiaPZAuePwbD_=GS-JZ7r5hzpc9FhZbMX2DEBqTU=BRw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox