Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1aZOIo-0007LU-MA for pgsql-performance@arkaria.postgresql.org; Fri, 26 Feb 2016 19:43:51 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84) (envelope-from ) id 1aZOIo-0000UH-5o for pgsql-performance@arkaria.postgresql.org; Fri, 26 Feb 2016 19:43:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84) (envelope-from ) id 1aZOIm-0000S9-Su for pgsql-performance@postgresql.org; Fri, 26 Feb 2016 19:43:49 +0000 Received: from mail-ig0-x234.google.com ([2607:f8b0:4001:c05::234]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84) (envelope-from ) id 1aZOIi-0003iJ-JD for pgsql-performance@postgresql.org; Fri, 26 Feb 2016 19:43:48 +0000 Received: by mail-ig0-x234.google.com with SMTP id g6so45838490igt.1 for ; Fri, 26 Feb 2016 11:43:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to; bh=xQK5FeiKMT1Pev/f/QWGlJ/txbu/tR8wxzZO8jSTC98=; b=xK3BpZ26EKvB6aMhmivD7iKkhsrAREBrJdZxQVvL35LWHpYrCCiE99PA19kz4ppYZh ZEZWAEW+FM5qYblTzJSsNDXm0kUzqBWggTghsWQ+7iEZdTiYuWMWs5ygLIlvoUx6YG/8 ZgzreeeJ1Y0QFgJ9TEX8zO8iZM47VuXLS13ZHaiC7P4ULgZOgIAjqmi2z25pnMwUPjS8 n0+ynlF6T1vUYqKvUrwCfw9spXaG1y/I+dVcy9UcfW32o2JRTnxtrNQFdbufb8CHvkve etw43VNU36FhHo8AlXI2vI4eIeX8VxBjTyrvmouz6bnfS1gp+ngOvgsEFYpEVFZJDTuc k2FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to; bh=xQK5FeiKMT1Pev/f/QWGlJ/txbu/tR8wxzZO8jSTC98=; b=X+609vMOEv9DOkMqbC2qaRAYhG35lEhMzEyiRCWUlJFFwYOEq9YaDHaRyi+ZsevImC bw5opRPq4xdARxXg3OC/LtE9v+JYcPY90ojwlwlmdToRYa6kD7SU3+PgoBIZGTfuZFwi ySzALWKUPPC+7yS0AwjEM3luIeaPCnWJFHNxqAqWiOkb+KkaYoX39RWRLedvt8YKOufl L++SjTV0cNKNUxrk3yAcFS295V7bCkLL4MVwSgp0agEeADHvI37XLXlSgsC/fZQRmTOF WuRatWY7UQHOh7ZfcJe9YxtEgKm/NqYuaoY//cJau2vK2QlSnoRxMN8bTBAf884ivAuL l7/A== X-Gm-Message-State: AD7BkJKyKY8RG1gkkeRtDew/mxhPDi6zH8mkKERNmO7dqMBBa+gGkhgYW7R6yxHA6n3C1T35+LdLMpGuUm8oZQ== MIME-Version: 1.0 X-Received: by 10.50.8.42 with SMTP id o10mr2152852iga.59.1456515822931; Fri, 26 Feb 2016 11:43:42 -0800 (PST) Received: by 10.107.190.68 with HTTP; Fri, 26 Feb 2016 11:43:42 -0800 (PST) Date: Fri, 26 Feb 2016 13:43:42 -0600 Message-ID: Subject: Odd behavior with indices From: joe meiring To: pgsql-performance@postgresql.org Content-Type: multipart/alternative; boundary=089e016345303c615b052cb18461 X-Pg-Spam-Score: -2.7 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --089e016345303c615b052cb18461 Content-Type: text/plain; charset=UTF-8 Also available on S.O.: http://stackoverflow.com/questions/35658238/postgres-odd-behavior-with-indices I've got a datavalue table with ~200M rows or so, with indices on both site_id and parameter_id. I need to execute queries like "return all sites with data" and "return all parameters with data". The site table has only 200 rows or so, and the parameter table has only 100 or so rows. The site query is fast and uses the index: EXPLAIN ANALYZEselect *from sitewhere exists ( select 1 from datavalue where datavalue.site_id = site.id limit 1); Seq Scan on site (cost=0.00..64.47 rows=64 width=113) (actual time=0.046..1.106 rows=89 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 39 SubPlan 1 -> Limit (cost=0.44..0.47 rows=1 width=0) (actual time=0.008..0.008 rows=1 loops=128) -> Index Only Scan using ix_datavalue_site_id on datavalue (cost=0.44..8142.71 rows=248930 width=0) (actual time=0.008..0.008 rows=1 loops=128) Index Cond: (site_id = site.id) Heap Fetches: 0 Planning time: 0.361 ms Execution time: 1.149 ms The same query for parameters is rather slow and does NOT use the index: EXPLAIN ANALYZEselect *from parameterwhere exists ( select 1 from datavalue where datavalue.parameter_id = parameter.id limit 1); Seq Scan on parameter (cost=0.00..20.50 rows=15 width=2648) (actual time=2895.972..21331.701 rows=15 loops=1) Filter: (SubPlan 1) Rows Removed by Filter: 6 SubPlan 1 -> Limit (cost=0.00..0.34 rows=1 width=0) (actual time=1015.790..1015.790 rows=1 loops=21) -> Seq Scan on datavalue (cost=0.00..502127.10 rows=1476987 width=0) (actual time=1015.786..1015.786 rows=1 loops=21) Filter: (parameter_id = parameter.id) Rows Removed by Filter: 7739355 Planning time: 0.123 ms Execution time: 21331.736 ms What the deuce is going on here? Alternatively, whats a good way to do this? Any help/guidance appreciated! Some of the table description: \d datavalue id BIGINT DEFAULT nextval('datavalue_id_seq'::regclass) NOT NULL, value DOUBLE PRECISION NOT NULL, site_id INTEGER NOT NULL, parameter_id INTEGER NOT NULL, deployment_id INTEGER, instrument_id INTEGER, invalid BOOLEAN, Indexes: "datavalue_pkey" PRIMARY KEY, btree (id) "datavalue_datetime_utc_site_id_parameter_id_instrument_id_key" UNIQUE CONSTRAINT, btree (datetime_utc, site_id, parameter_id, instrument_id) "ix_datavalue_instrument_id" btree (instrument_id) "ix_datavalue_parameter_id" btree (parameter_id) "ix_datavalue_site_id" btree (site_id) "tmp_idx" btree (site_id, datetime_utc) Foreign-key constraints: "datavalue_instrument_id_fkey" FOREIGN KEY (instrument_id) REFERENCES instrument(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_parameter_id_fkey" FOREIGN KEY (parameter_id) REFERENCES parameter(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES coastal.site(id) ON UPDATE CASCADE ON DELETE CASCADE "datavalue_statistic_type_id_fkey" --089e016345303c615b052cb18461 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Also available on S.O.:=C2=A0


I've got a=C2=A0datavalue=C2=A0table with ~20= 0M rows or so, with indices on both=C2=A0site_id=C2=A0= and=C2=A0parameter_id. I need to execute queries like = "return all sites with data" and "return all parameters with= data". The=C2=A0site=C2=A0table has only 200 row= s or so, and the=C2=A0parameter=C2=A0table has only 10= 0 or so rows.

The=C2=A0s= ite=C2=A0query is fast and uses the index:

EXPLAIN ANALYZE
select *
from site
where exists (
      select 1 from datavalue
      where datavalue.site_id =3D<=
span class=3D"" style=3D"margin:0px;padding:0px;border:0px;color:black"> si=
te.id limit 1
);

Seq Scan on site  (cost=3D0.00=
..=
64.47 rows=3D64 width=3D=
1=
13) (actual time=3D0.046..1.106=
 rows=3D89 loops=3D1)
  Filter: (SubPlan 1)
  Rows Removed by Filter: 39
  SubPlan 1
    ->  Limit  (cost=3D0.44..=
0=
.47 rows=3D1 width=3D0) (actual time=3D0.008..0.008 rows=3D1 loops=3D128)
          ->  Index Only Scan u=
sing ix_datavalue_site_id on datavalue  (c=
ost=3D0.44..8142.71 rows=3D248930 width=3D0) (actual time=
=
=3D0.008..0.008 rows=3D1 =
loops=3D128)
                Index Cond: (site_id=
 =3D site.id)
                Heap Fetches: 0
Planning time: 0.361 ms
Execution time: 1.149 ms

The same query for parameters is rather slow and = does NOT use the index:

EXPLAIN ANALYZE
select *
from parameter
where exists (
      select 1 from datavalue
      where datavalue.parameter_id =3D parameter.id limit 1
);

Seq Scan on parameter  (cost=3D0.00=
..20.50 rows=3D15 width=
=
=3D2648) (actual time=3D2895.972..21331.701 rows=3D15 loops=
=
=3D1)
  Filter: (SubPlan 1)
  Rows Removed by Filter: 6
  SubPlan 1
    ->  Limit  (cost=3D0.00..=
0=
.34 rows=3D1 width=3D0) (actual time=3D1015.790..1015.790=
 <=
/span>rows=3D1 loops=3D21)
          ->  Seq Scan on datavalue  <=
span class=3D"" style=3D"margin:0px;padding:0px;border:0px;color:black">(cost=3D0.00..502127.10 rows=
=3D1476987 width=3D0) (actual tim=
e=3D1015.786..1015.786 rows=
=
=3D1 loops=3D21)
                Filter: (parameter_id =3D par=
ameter.id)
                Rows Removed by Filter: 7739355
Planning time: 0.123 ms
Execution time: 21331.736 ms

What the deuce is going on here? Alternatively,= whats a good way to do this?

Any help= /guidance appreciated!



Some o= f the table description:

\d datavalue
id BIGINT DEFAULT nextval=
('datavalue_id_seq'::regclass) NOT NULL,
value DOUBLE PRECISION NOT NULL,
site_id INTEGER NOT NULL,
parameter_id INTEGER NOT NULL,
deployment_id INTEGER,
instrument_id INTEGER,
invalid BOOLEAN,
Indexes:
    "datavalue_pkey" PRIMARY <=
span class=3D"" style=3D"margin:0px;padding:0px;border:0px;color:rgb(0,0,13=
9)">KEY, btree (id)
    "datavalue_datetime_utc_site_id_parameter_id_instrument_id_k=
ey" UNIQUE CONSTRAINT, btr=
ee (datetime_utc, site_id, parameter_id=
, instrument_id)
    "ix_datavalue_instrument_id" btree (inst=
rument_id)
    "ix_datavalue_parameter_id" btree (param=
eter_id)
    "ix_datavalue_site_id" btree (site_id)
    "tmp_idx" btree (site_id, date=
time_utc)
Foreign-key constraints:
    "datavalue_instrument_id_fkey" FOREIG=
N KEY (instrument_id) REFERENCES instrument(id) ON UPDATE CASCADE ON DELETE CASCADE
    "datavalue_parameter_id_fkey" FOREIG=
N KEY (parameter_id) =
REFERENCES parameter(id) ON UPDATE CASCADE ON DELETE CASCADE
    "datavalue_site_id_fkey" FOREIGN KEY (site_id) REFERENCES=
 coastal.site(id) ON<=
span class=3D"" style=3D"margin:0px;padding:0px;border:0px;color:black"> UPDATE CASCADE ON <=
span class=3D"" style=3D"margin:0px;padding:0px;border:0px;color:rgb(0,0,13=
9)">DELETE CASCADE
    "datavalue_statistic_type_id_fkey"
<= pre class=3D"" style=3D"margin-top:0px;margin-bottom:1em;padding:5px;border= :0px;font-size:13px;overflow:auto;width:auto;max-height:600px;font-family:C= onsolas,Menlo,Monaco,'Lucida Console','Liberation Mono',= 9;DejaVu Sans Mono','Bitstream Vera Sans Mono','Courier New= ',monospace,sans-serif;color:rgb(57,51,24);word-wrap:normal;background-= color:rgb(238,238,238)">
--089e016345303c615b052cb18461--