Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1atHcf-00016a-6F for pgsql-performance@arkaria.postgresql.org; Thu, 21 Apr 2016 16:38:33 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1atHce-0002Zn-PX for pgsql-performance@arkaria.postgresql.org; Thu, 21 Apr 2016 16:38:32 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1atHax-00062N-3u for pgsql-performance@postgresql.org; Thu, 21 Apr 2016 16:36:47 +0000 Received: from mail-lb0-x22f.google.com ([2a00:1450:4010:c04::22f]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1atHar-0006wN-08 for pgsql-performance@postgresql.org; Thu, 21 Apr 2016 16:36:45 +0000 Received: by mail-lb0-x22f.google.com with SMTP id u8so30181242lbk.0 for ; Thu, 21 Apr 2016 09:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=VE/7zZ2DZkRPi0qYpzm9ouDZcA0GYd67GrnYyenhCq4=; b=TUIAMWTCuJmHf6wdTG8mPoJE0tUvGh2tkoHbQ9jgCn3aLhXTfKt4KncYzOfwWvb3zr tgQ3UE6zxhkMAkMKLuOtvb1F5TCoC44CCW2+L7pKzmpuSr1+SEQbi31M68MsO3jGrt5z bwOyi4FNyYAZcHlKYglDG4YpLf8ncgINRPVwJkmctxwj50rsi0zEAX7gevykvmx4eDYk RImL+uaP8sVrEJQCAdD6iRUcAOzhZ5FSajfCRbSI1E5tFRPBk+SndPtQOqR6Ko0/KmjL XitehqScrpEH9iBZZUuKbTpG+/sHr6t68H5TSc8a7sF7bwocHhaYquiOsPdH5lyd4bW/ U9lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=VE/7zZ2DZkRPi0qYpzm9ouDZcA0GYd67GrnYyenhCq4=; b=GnsPgoW7oIABC+PaEzu6OEh8+OkSw+9DYwj46FFoOnar0O+kRrtI4F4db4hFksfPV0 GFiI5MJJQ8uxYCLL0bDf1lq9GoPvkLARG7BWiXmsbqtU7xfm2fSH3TSjM4BKFDg2yP7+ jgHslVcDTsRtdtAMsZtNWubxEVriZtO/iI+qPdRzJxDlXrGdqLgwcRp6kFVS/wzeKvR3 xvnlQ2C/664Zw3VpNIz66TkyBkCZhFQcIB8LiFT5qVC5/lBWWa7FYzkd7j2wz7SbYLVX 9x+6Sz6Acli9re0ZFIkpEj3EEiviNn0M8pcx0HWCvYyMbjZrRVfUGIL+ll6oHhPA743G DZjQ== X-Gm-Message-State: AOPr4FXH4ALc/AwHCPiPx+cMvEKDNgXfN7WFk8ob/eaut0pO98VxZPehpInOSVwDnlM/Xnb+3F3nnJDKpnErqA== MIME-Version: 1.0 X-Received: by 10.112.173.71 with SMTP id bi7mr6672121lbc.61.1461256597736; Thu, 21 Apr 2016 09:36:37 -0700 (PDT) Received: by 10.25.19.28 with HTTP; Thu, 21 Apr 2016 09:36:37 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 Apr 2016 09:36:37 -0700 Message-ID: Subject: Re: Performant queries on table with many boolean columns From: Jeff Janes To: Rob Imig Cc: "pgsql-performance@postgresql.org" Content-Type: text/plain; charset=UTF-8 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org On Wed, Apr 20, 2016 at 11:41 AM, Rob Imig wrote: > Hey all, > > New to the lists so please let me know if this isn't the right place for > this question. > > I am trying to understand how to structure a table to allow for optimal > performance on retrieval. The data will not change frequently so you can > basically think of it as static and only concerned about optimizing reads > from basic SELECT...WHERE queries. > > The data: > > ~20 million records > Each record has 1 id and ~100 boolean properties > Each boolean property has ~85% of the records as true > > > The retrieval will always be something like "SELECT id FROM WHERE > . > > will be some arbitrary set of the ~100 boolean columns and you > want the ids that match all of the conditions (true for each boolean > column). Example: > WHERE prop1 AND prop18 AND prop24 Is 3 a typical number of conditions to have? 85%^3 is 61.4%, so you are fetching most of the table. At that point, I think I would give up on indexes and just expect to do a full table scan each time. Which means a single column bit-string data type might be the way to go, although the construction of the queries would then be more cumbersome, especially if you will do by hand. I think the only way to know for sure is to write a few scripts to benchmark it. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance