Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1asx6F-0005mQ-4p for pgsql-performance@arkaria.postgresql.org; Wed, 20 Apr 2016 18:43:43 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1asx6E-00070F-M3 for pgsql-performance@arkaria.postgresql.org; Wed, 20 Apr 2016 18:43:42 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1asx4Y-000502-QD for pgsql-performance@postgresql.org; Wed, 20 Apr 2016 18:41:58 +0000 Received: from mail-oi0-x230.google.com ([2607:f8b0:4003:c06::230]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1asx4V-0003yb-Ve for pgsql-performance@postgresql.org; Wed, 20 Apr 2016 18:41:57 +0000 Received: by mail-oi0-x230.google.com with SMTP id k142so49693051oib.1 for ; Wed, 20 Apr 2016 11:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to; bh=wu22RtjUdkCMBvVay1/80MHd1J3xDirvCL2sh3tKUrk=; b=fKPc8/deRTVDnyXsi8klQTHGZdQ+VESn5fO2R241WxclsgdHCYYmEF1q0lqZeaYIq9 efFaAGGWugq6UDhV6l0jfM1+ZGqEfqgcSSEDF0sDdh8eumfHu78/Zpu/UAkeP1JPyz4t tcL1jxxrjG68XGxjMubSbabEzgMmmHQix+j/AwrTzUlpImElmh5LWdiGLDkF+j7BvI4i vIZ/u9ZL33whOeprF86BNoeriacyR0PHl3etDDyfXY9A4G1BdsmIrycn9akCl1M5iHLc RtHNhxEOQUfK8ynOUOXgIJrd58tilxy/ADubPq9s5iYaAvZ8gSsZjry/WPdogy1UWERX LHzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=wu22RtjUdkCMBvVay1/80MHd1J3xDirvCL2sh3tKUrk=; b=gpzyJfLU1C7f8mePiy221FWzHoOl/UYVE8dx9HmAgtrFtuvDEDWFqh59TOKlAb6LGU mZBJQYGuB5NLWa6FfQttKkMhw0OPwIuxdp+us3Neotz23Fz3Fi2WYClRyH+MbbYm5UBl r0fZTkcneI0NZ6CXmKn1EI+UWFU1vJTyom8drSelG6g5J8IO8O/jL2oYDPcRWacJY8E1 qmrY0eT0Dx1lqUh2ieHMmhJHhYR/cF+fOIq/qeQ97Xfqxb1GZPwIkRJCRkw8+h3vUDd9 AfEcshNoCISVc8SaQAC1oduIv37XGZ/4r9s2RIdCqlTWpvgSczu+ayNRw4CLniwbxKD7 pipA== X-Gm-Message-State: AOPr4FVxE467WhyoTwsOUn9sPkyFPL0aQ+oBRE8zHRmjYnmpoLqJbxCs98O9OPCUHlPYdq3ax1ixRgz2XRecsg== MIME-Version: 1.0 X-Received: by 10.182.42.138 with SMTP id o10mr4374848obl.76.1461177714717; Wed, 20 Apr 2016 11:41:54 -0700 (PDT) Received: by 10.202.171.86 with HTTP; Wed, 20 Apr 2016 11:41:54 -0700 (PDT) Date: Wed, 20 Apr 2016 14:41:54 -0400 Message-ID: Subject: Performant queries on table with many boolean columns From: Rob Imig To: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary=001a11c2ce98a3c03f0530eef206 X-Pg-Spam-Score: -2.4 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a11c2ce98a3c03f0530eef206 Content-Type: text/plain; charset=UTF-8 Hey all, New to the lists so please let me know if this isn't the right place for this question. I am trying to understand how to structure a table to allow for optimal performance on retrieval. The data will not change frequently so you can basically think of it as static and only concerned about optimizing reads from basic SELECT...WHERE queries. The data: - ~20 million records - Each record has 1 id and ~100 boolean properties - Each boolean property has ~85% of the records as true The retrieval will always be something like "SELECT id FROM WHERE . will be some arbitrary set of the ~100 boolean columns and you want the ids that match all of the conditions (true for each boolean column). Example: WHERE prop1 AND prop18 AND prop24 The obvious thing seems to make a table with ~100 columns, with 1 column for each boolean property. Though, what type of indexing strategy would one use on that table? Doesn't make sense to do BTREE. Is there a better way to structure it? Any and all advice/tips/questions appreciated! Thanks, Rob --001a11c2ce98a3c03f0530eef206 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hey all,

New to the lists so please let= me know if this isn't the right place for this question.
I am trying to understand how to structure a table to allow for= optimal performance on retrieval. The data will not change frequently so y= ou can basically think of it as static and only concerned about optimizing = reads from basic SELECT...WHERE queries.

The data:=
  • ~20 million records
  • Each record has 1 id and= ~100 boolean properties
  • Each boolean property has ~85% of the = records as true

The retrieval will a= lways be something like "SELECT id FROM <table> WHERE <condit= ions>.

<conditions> will be some arbitrar= y set of the ~100 boolean columns and you want the ids that match all of th= e conditions (true for each boolean column). Example:=C2=A0
WHERE= prop1 AND prop18 AND prop24


The ob= vious thing seems to make a table with ~100 columns, with 1 column for each= boolean property. Though, what type of indexing strategy would one use on = that table? Doesn't make sense to do BTREE. Is there a better way to st= ructure it?


Any and all advice/tips= /questions appreciated!

Thanks,
Rob

--001a11c2ce98a3c03f0530eef206--