public inbox for [email protected]  
help / color / mirror / Atom feed
From: Mike Broers <[email protected]>
To: David Rowley <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: postgres performance list <[email protected]>
Subject: Re: query of partitioned object doesnt use index in qa
Date: Mon, 25 Sep 2017 11:21:41 -0500
Message-ID: <CAB9893gy7GC3S-_4raZ6b=Mpxn+j37X8j9hacwiicENsEOTojw@mail.gmail.com> (raw)
In-Reply-To: <CAKJS1f_ePEgnk_VwbB110c0xFOUVRLXOZvoJNhEzDjw8Jr9uiA@mail.gmail.com>
References: <CAB9893izHQaPTk1bGEDs8UTQUTKtpj1sk6PLyWrvXU-j0JBFaQ@mail.gmail.com>
	<CAKJS1f-AkrKfLEsrb7ymZve_b3e9cTKUcEdeeeJkVWnOTVdPnA@mail.gmail.com>
	<CAB9893hmTC-TMeFN8S91NWS_++3w2t5D0X7O-ogsZZ8zEyxv6w@mail.gmail.com>
	<[email protected]>
	<CAB9893g-1fpvh=0snbe7qFJKfXEsn2YxR3ZWZ6-JxrMCyaZg3Q@mail.gmail.com>
	<CAB9893iWzGVDh1GRKPdDwUx=fbe4uKCx2GxQOy3jDQ9Xe+uD8Q@mail.gmail.com>
	<CAKJS1f_ePEgnk_VwbB110c0xFOUVRLXOZvoJNhEzDjw8Jr9uiA@mail.gmail.com>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Very helpful thank you for the additional insight - I'd never checked into
pg_stats and that does reveal a difference in the distribution of the
validation_status_code between qa and production:

prod:
│ most_common_vals       │ {P,F}                  │
│ most_common_freqs      │ {0.925967,0.000933333} │
│ histogram_bounds       │ ❏                      │
│ correlation            │ 0.995533               │

qa:
│ most_common_vals │ {P} │
│ most_common_freqs │ {0.861633} │
│ histogram_bounds │ ❏ │
│ correlation │ 0.999961 │

so the way I am reading this is that there is likely no sensible way to
avoid postgres thinking it will just have to scan the whole table because
of these statistics.  I can force it by setting session parameters for this
particular query but I probably shouldnt be looking at system settings to
brutally force random fetches.

thanks again for the assistance!



On Wed, Sep 20, 2017 at 6:05 PM, David Rowley <[email protected]>
wrote:

> On 21 September 2017 at 04:15, Mike Broers <[email protected]> wrote:
> > Ultimately I think this is just highlighting the need in my environment
> to
> > set random_page_cost lower (we are on an SSD SAN anyway..), but I dont
> think
> > I have a satisfactory reason by the row estimates are so bad in the QA
> > planner and why it doesnt use that partition index there.
>
> Without the index there are no stats to allow the planner to perform a
> good estimate on "e.body->>'SID' is not null", so it applies a default
> of 99.5%. So, as a simple example, if you have a partition with 1
> million rows. If you apply 99.5% to that you get 995000 rows. Now if
> you add the selectivity for "e.validation_status_code = 'P' ", let's
> say that's 50%, the row estimate for the entire WHERE clause would be
> 497500 (1000000 * 0.995 * 0.5). Since the 99.5% is applied in both
> cases, then the only variable part is validation_status_code. Perhaps
> validation_status_code  = 'P' is much more common in QA than in
> production.
>
> You can look at the stats as gathered by ANALYZE with:
>
> \x on
> select * from pg_stats where tablename = 'event__99999999' and attname
> = 'validation_status_code';
> \x off
>
> --
>  David Rowley                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: query of partitioned object doesnt use index in qa
  In-Reply-To: <CAB9893gy7GC3S-_4raZ6b=Mpxn+j37X8j9hacwiicENsEOTojw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox