Received: from localhost (maia-3.hub.org [200.46.204.184]) by postgresql.org (Postfix) with ESMTP id 787779FB936 for ; Thu, 24 May 2007 20:16:05 -0300 (ADT) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) with ESMTP id 21121-05 for ; Thu, 24 May 2007 20:15:52 -0300 (ADT) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from mis010.exch010.intermedia.net (mis010.exch010.intermedia.net [64.78.61.97]) by postgresql.org (Postfix) with ESMTP id B3D3E9FB939 for ; Thu, 24 May 2007 20:15:56 -0300 (ADT) Received: from ehost010-33.exch010.intermedia.net ([64.78.20.173]) by mis010.exch010.intermedia.net with Microsoft SMTPSVC(6.0.3790.1830); Thu, 24 May 2007 16:11:34 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: index vs. seq scan choice? Date: Thu, 24 May 2007 16:15:54 -0700 Message-ID: <8C5B026B51B6854CBE88121DBF097A86C3A30D@ehost010-33.exch010.intermedia.net> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: index vs. seq scan choice? Thread-Index: AceeWXpA1aWjKfrUSniY2q7niQL4mA== From: "George Pavlov" To: X-OriginalArrivalTime: 24 May 2007 23:11:34.0058 (UTC) FILETIME=[DF1870A0:01C79E58] X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200705/1223 X-Sequence-Number: 114417 I am trying to figure out how the distribution of data affects index usage by the query because I am seeing some behavior that does not seem optimal to my uneducated eye.=20 I am on PG 8.1.8. I have two tables foo and foo_detail, both have been vacuum analyzed recently. Both have a property_id column, both have an index on it. The foo table has a state_code, also indexed, and the relative share of rows for the two state_codes used in the example below is: PA 2842 2.80% MN 2858 2.81% The distribution of distinct property_ids is fairly similar: PA 719 2.90% MN 765 3.09% A simple query filtered by PA vs. MN produces different results (see below). The PA query does a Seq Scan, the MN query uses the index and is >20 times faster. Both return about the same number of rows. I tried it with all state_codes that have rows in foo and it seems that the cutoff is somewhere around 3%, but there isn't a direct correlation (there are state_codes that are < 3% that trigger a Seq Scan and there are ones above 3% that result in an Index scan). I am curious what could make the PA query to ignore the index. What are the specific stats that are being used to make this decision? Would it perform better if it were to use the index? Anything I can do to "nudge" it towards using the index, which seems like a rather beneficial thing? The actual queries: explain analyze select f.property_id from foo f inner join foo_detail fd using (property_id) where f.state_code =3D 'PA' Merge Join (cost=3D17842.71..18436.30 rows=3D3347 width=3D4) (actual time=3D594.538..972.032 rows=3D2842 loops=3D1) Merge Cond: ("outer".property_id =3D "inner".property_id) -> Sort (cost=3D4381.72..4390.09 rows=3D3347 width=3D4) (actual time=3D14.092..18.497 rows=3D2842 loops=3D1) Sort Key: f.property_id -> Bitmap Heap Scan on foo f (cost=3D22.71..4185.78 = rows=3D3347 width=3D4) (actual time=3D0.826..7.008 rows=3D2842 loops=3D1) Recheck Cond: (state_code =3D 'PA'::bpchar) -> Bitmap Index Scan on mv_search_state (cost=3D0.00..22.71 rows=3D3347 width=3D0) (actual time=3D0.734..0.734 = rows=3D2842 loops=3D1) Index Cond: (state_code =3D 'PA'::bpchar) -> Sort (cost=3D13460.99..13732.84 rows=3D108742 width=3D4) (actual time=3D580.312..754.012 rows=3D110731 loops=3D1) Sort Key: fd.property_id -> Seq Scan on foo_detail fd (cost=3D0.00..4364.42 = rows=3D108742 width=3D4) (actual time=3D0.006..210.846 rows=3D108742 loops=3D1) Total runtime: 991.852 ms explain analyze select f.property_id from foo f inner join foo_detail fd using (property_id) where f.state_code =3D 'MN' Nested Loop (cost=3D7.62..8545.85 rows=3D1036 width=3D4) (actual time=3D0.877..44.196 rows=3D2858 loops=3D1) -> Bitmap Heap Scan on foo f (cost=3D7.62..2404.44 rows=3D1036 = width=3D4) (actual time=3D0.852..6.579 rows=3D2858 loops=3D1) Recheck Cond: (state_code =3D 'MN'::bpchar) -> Bitmap Index Scan on mv_search_state (cost=3D0.00..7.62 rows=3D1036 width=3D0) (actual time=3D0.744..0.744 rows=3D2858 = loops=3D1) Index Cond: (state_code =3D 'MN'::bpchar) -> Index Scan using ix_fd on foo_detail fd (cost=3D0.00..5.92 = rows=3D1 width=3D4) (actual time=3D0.005..0.007 rows=3D1 loops=3D2858) Index Cond: ("outer".property_id =3D fd.property_id) Total runtime: 48.439 ms