PG Query Planner

public inbox for [email protected]  
help / color / mirror / Atom feed

PG Query Planner
5+ messages / 4 participants
[nested] [flat]

* PG Query Planner
@ 2022-04-19 17:36 Kenny Bachman <[email protected]>
  2022-04-19 17:44 ` Re: PG Query Planner Gaurav Anand <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Kenny Bachman @ 2022-04-19 17:36 UTC (permalink / raw)
  To: [email protected]

Hello,

I wonder how the query planner works in postgresql. So, I have a query that
takes 6 seconds with an index scan. However, the same query takes 0.1ms
when I set disable index scan parameter.

How do I get the planner to make the right decision? Also, I ran ANALYZE
command many times.

And I have an index on job_next_process_time  column but did not use it by
the planner.

Limit  (cost=0.56..29.04 rows=1 width=695) (actual time=6386.751..6386.753
rows=0 loops=1)
   ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual
time=6386.749..6386.751 rows=0 loops=1)
         Join Filter: (subs.offer_id = offer1.id)
         ->  Index Scan using subs_pkey on subscription subs
 (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872
rows=8021769 loops=1)
               Filter: ((NOT job_in_progress) AND (job_next_process_time <
'2022-04-19 09:25:25.535'::timestamp without time zone))
               Rows Removed by Filter: 72039
         ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual
time=0.000..0.000 rows=0 loops=8021769)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
width=8) (actual time=0.087..0.088 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
 Planning Time: 1.335 ms
 Execution Time: 6386.792 ms

*SET enable_indexscan = OFF;*

  Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual
time=0.092..0.093 rows=0 loops=1)
   ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual
time=0.092..0.092 rows=0 loops=1)
         Sort Key: subs.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695)
(actual time=0.090..0.090 rows=0 loops=1)
               ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
width=8) (actual time=0.089..0.089 rows=0 loops=1)
                     Filter: (NOT is_external_lifecycle_management)
                     Rows Removed by Filter: 334
               ->  Bitmap Heap Scan on subscription subs
 (cost=955.54..83681.53 rows=92355 width=695) (never executed)
                     Recheck Cond: (offer_id = offer1.id)
                     Filter: ((NOT job_in_progress) AND
(job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time
zone))
                     ->  Bitmap Index Scan on i_fk_subscription_offer
 (cost=0.00..932.45 rows=93029 width=0) (never executed)
                           Index Cond: (offer_id = offer1.id)
 Planning Time: 0.266 ms
 Execution Time: 0.126 ms


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: PG Query Planner
  2022-04-19 17:36 PG Query Planner Kenny Bachman <[email protected]>
@ 2022-04-19 17:44 ` Gaurav Anand <[email protected]>
  2022-04-19 17:56   ` Re: PG Query Planner Kenny Bachman <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Gaurav Anand @ 2022-04-19 17:44 UTC (permalink / raw)
  To: Kenny Bachman <[email protected]>; +Cc: [email protected]

Looks like your Index has gone wrong, instead of 92355 records it is
scanning     8021769 rows which is why it is taking 6s.

Share the sql too

On Tue, 19 Apr 2022 at 11:07 PM, Kenny Bachman <[email protected]>
wrote:

> Hello,
>
> I wonder how the query planner works in postgresql. So, I have a query
> that takes 6 seconds with an index scan. However, the same query takes
> 0.1ms when I set disable index scan parameter.
>
> How do I get the planner to make the right decision? Also, I ran ANALYZE
> command many times.
>
> And I have an index on job_next_process_time  column but did not use it by
> the planner.
>
> Limit  (cost=0.56..29.04 rows=1 width=695) (actual time=6386.751..6386.753
> rows=0 loops=1)
>    ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual
> time=6386.749..6386.751 rows=0 loops=1)
>          Join Filter: (subs.offer_id = offer1.id)
>          ->  Index Scan using subs_pkey on subscription subs
>  (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872
> rows=8021769 loops=1)
>                Filter: ((NOT job_in_progress) AND (job_next_process_time <
> '2022-04-19 09:25:25.535'::timestamp without time zone))
>                Rows Removed by Filter: 72039
>          ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual
> time=0.000..0.000 rows=0 loops=8021769)
>                ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
> width=8) (actual time=0.087..0.088 rows=0 loops=1)
>                      Filter: (NOT is_external_lifecycle_management)
>                      Rows Removed by Filter: 334
>  Planning Time: 1.335 ms
>  Execution Time: 6386.792 ms
>
> *SET enable_indexscan = OFF;*
>
>   Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual
> time=0.092..0.093 rows=0 loops=1)
>    ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual
> time=0.092..0.092 rows=0 loops=1)
>          Sort Key: subs.id
>          Sort Method: quicksort  Memory: 25kB
>          ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695)
> (actual time=0.090..0.090 rows=0 loops=1)
>                ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
> width=8) (actual time=0.089..0.089 rows=0 loops=1)
>                      Filter: (NOT is_external_lifecycle_management)
>                      Rows Removed by Filter: 334
>                ->  Bitmap Heap Scan on subscription subs
>  (cost=955.54..83681.53 rows=92355 width=695) (never executed)
>                      Recheck Cond: (offer_id = offer1.id)
>                      Filter: ((NOT job_in_progress) AND
> (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time
> zone))
>                      ->  Bitmap Index Scan on i_fk_subscription_offer
>  (cost=0.00..932.45 rows=93029 width=0) (never executed)
>                            Index Cond: (offer_id = offer1.id)
>  Planning Time: 0.266 ms
>  Execution Time: 0.126 ms
>
-- 

Thanks.

Regards,
Gaurav Anand

[image: logo] <https://bit.ly/3B9rOrW;

-- 


This communication is confidential and
subject to and governed by Saama’s 
Electronic
Communications Disclaimer. 
<https://www.saama.com/email-communication-disclaimer/;



 


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: PG Query Planner
  2022-04-19 17:36 PG Query Planner Kenny Bachman <[email protected]>
  2022-04-19 17:44 ` Re: PG Query Planner Gaurav Anand <[email protected]>
@ 2022-04-19 17:56   ` Kenny Bachman <[email protected]>
  2022-04-19 22:23     ` Re: PG Query Planner David G. Johnston <[email protected]>
  2022-04-20 04:03     ` Re: PG Query Planner Jeff Janes <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Kenny Bachman @ 2022-04-19 17:56 UTC (permalink / raw)
  To: Gaurav Anand <[email protected]>; +Cc: [email protected]

Hello,

My query is :

SELECT subs.id AS id1_109,
scriber_id AS subs_109
FROM subscription subs
LEFT OUTER JOIN offer offer1
ON subs.offer_id = offer1.id
WHERE offer1.is_external_lifecycle_management = FALSE
AND subs.job_next_process_time < '2022-04-19 09:25:25.535'
AND subs.job_in_progress = FALSE
ORDER BY subs.id ASC LIMIT 1 ;


Gaurav Anand <[email protected]>, 19 Nis 2022 Sal, 20:45 tarihinde
şunu yazdı:

> Looks like your Index has gone wrong, instead of 92355 records it is
> scanning     8021769 rows which is why it is taking 6s.
>
> Share the sql too
>
> On Tue, 19 Apr 2022 at 11:07 PM, Kenny Bachman <[email protected]>
> wrote:
>
>> Hello,
>>
>> I wonder how the query planner works in postgresql. So, I have a query
>> that takes 6 seconds with an index scan. However, the same query takes
>> 0.1ms when I set disable index scan parameter.
>>
>> How do I get the planner to make the right decision? Also, I ran ANALYZE
>> command many times.
>>
>> And I have an index on job_next_process_time  column but did not use it
>> by the planner.
>>
>> Limit  (cost=0.56..29.04 rows=1 width=695) (actual
>> time=6386.751..6386.753 rows=0 loops=1)
>>    ->  Nested Loop  (cost=0.56..692987.46 rows=24331 width=695) (actual
>> time=6386.749..6386.751 rows=0 loops=1)
>>          Join Filter: (subs.offer_id = offer1.id)
>>          ->  Index Scan using subs_pkey on subscription subs
>>  (cost=0.56..572151.65 rows=8053633 width=695) (actual time=0.008..5554.872
>> rows=8021769 loops=1)
>>                Filter: ((NOT job_in_progress) AND (job_next_process_time
>> < '2022-04-19 09:25:25.535'::timestamp without time zone))
>>                Rows Removed by Filter: 72039
>>          ->  Materialize  (cost=0.00..31.31 rows=1 width=8) (actual
>> time=0.000..0.000 rows=0 loops=8021769)
>>                ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
>> width=8) (actual time=0.087..0.088 rows=0 loops=1)
>>                      Filter: (NOT is_external_lifecycle_management)
>>                      Rows Removed by Filter: 334
>>  Planning Time: 1.335 ms
>>  Execution Time: 6386.792 ms
>>
>> *SET enable_indexscan = OFF;*
>>
>>   Limit  (cost=84760.55..84760.55 rows=1 width=695) (actual
>> time=0.092..0.093 rows=0 loops=1)
>>    ->  Sort  (cost=84760.55..84822.63 rows=24832 width=695) (actual
>> time=0.092..0.092 rows=0 loops=1)
>>          Sort Key: subs.id
>>          Sort Method: quicksort  Memory: 25kB
>>          ->  Nested Loop  (cost=955.54..84636.39 rows=24832 width=695)
>> (actual time=0.090..0.090 rows=0 loops=1)
>>                ->  Seq Scan on offer offer1  (cost=0.00..31.31 rows=1
>> width=8) (actual time=0.089..0.089 rows=0 loops=1)
>>                      Filter: (NOT is_external_lifecycle_management)
>>                      Rows Removed by Filter: 334
>>                ->  Bitmap Heap Scan on subscription subs
>>  (cost=955.54..83681.53 rows=92355 width=695) (never executed)
>>                      Recheck Cond: (offer_id = offer1.id)
>>                      Filter: ((NOT job_in_progress) AND
>> (job_next_process_time < '2022-04-19 09:25:25.535'::timestamp without time
>> zone))
>>                      ->  Bitmap Index Scan on i_fk_subscription_offer
>>  (cost=0.00..932.45 rows=93029 width=0) (never executed)
>>                            Index Cond: (offer_id = offer1.id)
>>  Planning Time: 0.266 ms
>>  Execution Time: 0.126 ms
>>
> --
>
> Thanks.
>
> Regards,
> Gaurav Anand
>
> [image: logo] <https://bit.ly/3B9rOrW;
>
> This communication is confidential and subject to and governed by Saama’s Electronic
> Communications Disclaimer.
> <https://www.saama.com/email-communication-disclaimer/;
>
>
>


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: PG Query Planner
  2022-04-19 17:36 PG Query Planner Kenny Bachman <[email protected]>
  2022-04-19 17:44 ` Re: PG Query Planner Gaurav Anand <[email protected]>
  2022-04-19 17:56   ` Re: PG Query Planner Kenny Bachman <[email protected]>
@ 2022-04-19 22:23     ` David G. Johnston <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: David G. Johnston @ 2022-04-19 22:23 UTC (permalink / raw)
  To: Kenny Bachman <[email protected]>; +Cc: Gaurav Anand <[email protected]>; Pgsql-admin <[email protected]>

On Tue, Apr 19, 2022 at 10:57 AM Kenny Bachman <[email protected]>
wrote:

> SELECT subs.id AS id1_109,
> scriber_id AS subs_109
> FROM subscription subs
> LEFT OUTER JOIN offer offer1
> ON subs.offer_id = offer1.id
> WHERE offer1.is_external_lifecycle_management = FALSE
>

You've made the left outer join specification pointless here.

What version?

The fundamental issue is the first query is optimistic - it expects it will
find at least one result, potentially fairly quickly, if it just walks the
possible subscriptions in order.  But it doesn't ever find a result after
scanning the whole thing.

The second query figures it can prove that there are no possible results
because the offer1 restriction makes the final result an empty set.

I'm not sure how to rework the query here - though I'd at least clean up
the lie that you are doing an outer join even if that doesn't impact the
query plan (though it may very well do that).

The statistics seem OK, and if you don't actually make changes to the
data, repeatedly running analyze is no better than running it once.

You may find some success using a materialized CTE explicitly to scan offer
and then semi-join that result to subscriptions; the empty materialized CTE
needed for the semi-join would then enable the short-circuit that is seen
in the second query.

The convention on these lists is to inline or bottom post, trimming context.

David J.

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: PG Query Planner
  2022-04-19 17:36 PG Query Planner Kenny Bachman <[email protected]>
  2022-04-19 17:44 ` Re: PG Query Planner Gaurav Anand <[email protected]>
  2022-04-19 17:56   ` Re: PG Query Planner Kenny Bachman <[email protected]>
@ 2022-04-20 04:03     ` Jeff Janes <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Jeff Janes @ 2022-04-20 04:03 UTC (permalink / raw)
  To: Kenny Bachman <[email protected]>; +Cc: Gaurav Anand <[email protected]>; Pgsql-admin <[email protected]>

On Tue, Apr 19, 2022 at 1:57 PM Kenny Bachman <[email protected]>
wrote:

> Hello,
>
> My query is :
>
> SELECT subs.id AS id1_109,
> scriber_id AS subs_109
> FROM subscription subs
> LEFT OUTER JOIN offer offer1
> ON subs.offer_id = offer1.id
> WHERE offer1.is_external_lifecycle_management = FALSE
> AND subs.job_next_process_time < '2022-04-19 09:25:25.535'
> AND subs.job_in_progress = FALSE
> ORDER BY subs.id ASC LIMIT 1 ;
>

An easy (if you can control the queries) way to force the faster plan is to
prohibit it from using using the index to fulfill the ORDER BY, by changing
it to "ORDER BY subs.id+0 ASC LIMIT 1"

As for getting the planner to get a better plan on its own, I don't think
there is much hope.  The difference between 0 rows with
is_external_lifecycle_management=FALSE, and 1 row meeting that, is the
smallest possible difference. Yet still the ratio between them is
infinite.  Changing a histogram bin count or adding a multivariate
statistic is not going to change that.

Maybe the executor should be smart enough to cut off the nested loop once
it sees the Materialize will never return a row.  But that is not a change
you can make in user-land.

Cheers,

Jeff

>

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2022-04-20 04:03 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 17:36 PG Query Planner Kenny Bachman <[email protected]>
2022-04-19 17:44 ` Gaurav Anand <[email protected]>
2022-04-19 17:56   ` Kenny Bachman <[email protected]>
2022-04-19 22:23     ` David G. Johnston <[email protected]>
2022-04-20 04:03     ` Jeff Janes <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox