Cheaper subquery scan not considered unless offset 0

public inbox for [email protected]  
help / color / mirror / Atom feed

Cheaper subquery scan not considered unless offset 0
5+ messages / 3 participants
[nested] [flat]

* Cheaper subquery scan not considered unless offset 0
@ 2017-10-29 11:24 Benjamin Coutu <[email protected]>
  2017-10-29 11:46 ` Re: Cheaper subquery scan not considered unless offset 0 David Rowley <[email protected]>
  2017-10-29 14:58 ` Re: Cheaper subquery scan not considered unless offset 0 Tom Lane <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Benjamin Coutu @ 2017-10-29 11:24 UTC (permalink / raw)
  To: pgsql-performance; +Cc: Tom Lane <[email protected]>

Hello everyone,

Please consider the following three semantically equivalent, but differently written queries:

Query A:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions GROUP BY item HAVING sum(amount) >= 1
) b ON b.item = a. "ID"

Query B:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions GROUP BY item
) b ON b.item = a. "ID" WHERE b.stock >= 1

Query C:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions b GROUP BY item OFFSET 0
) b ON b.item = a. "ID" WHERE b.stock >= 1

FYI: stocktransactions.item and stocktransactions.amount have not null constraints and stocktransactions.item is a foreign key referencing items.ID, the primary key of items.

Queries A + B generate the same plan and execute as follows:

Merge Join  (cost=34935.30..51701.59 rows=22285 width=344) (actual time=463.824..659.553 rows=15521 loops=1)
  Merge Cond: (a."ID" = b.item)
  ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..15592.23 rows=336083 width=332) (actual time=0.012..153.899 rows=336064 loops=1)
  ->  Sort  (cost=34934.87..34990.59 rows=22285 width=12) (actual time=463.677..466.146 rows=15521 loops=1)
        Sort Key: b.item
        Sort Method: quicksort  Memory: 1112kB
        ->  Finalize HashAggregate  (cost=32879.78..33102.62 rows=22285 width=12) (actual time=450.724..458.667 rows=15521 loops=1)
              Group Key: b.item
              Filter: (sum(b.amount) >= '1'::double precision)
              Rows Removed by Filter: 48277
              ->  Gather  (cost=27865.65..32545.50 rows=44570 width=12) (actual time=343.715..407.243 rows=162152 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Partial HashAggregate  (cost=26865.65..27088.50 rows=22285 width=12) (actual time=336.416..348.105 rows=54051 loops=3)
                          Group Key: b.item
                          ->  Parallel Seq Scan on stocktransactions b  (cost=0.00..23281.60 rows=716810 width=12) (actual time=0.015..170.646 rows=579563 loops=3)
Planning time: 0.277 ms
Execution time: 661.342 ms

Plan C though, thanks to the "offset optimization fence", executes the following, more efficient plan:

Nested Loop  (cost=32768.77..41146.56 rows=7428 width=344) (actual time=456.611..525.395 rows=15521 loops=1 total=525.395)
  ->  Subquery Scan on c  (cost=32768.35..33269.76 rows=7428 width=12) (actual time=456.591..475.204 rows=15521 loops=1 total=475.204)
        Filter: (c.stock >= '1'::double precision)
        Rows Removed by Filter: 48277
        ->  Finalize HashAggregate  (cost=32768.35..32991.20 rows=22285 width=12) (actual time=456.582..468.124 rows=63798 loops=1 total=468.124)
              Group Key: b.item
              ->  Gather  (cost=27865.65..32545.50 rows=44570 width=12) (actual time=348.479..415.463 rows=162085 loops=1 total=415.463)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Partial HashAggregate  (cost=26865.65..27088.50 rows=22285 width=12) (actual time=343.952..355.912 rows=54028 loops=3 total=1067.736)
                          Group Key: b.item
                          ->  Parallel Seq Scan on stocktransactions b  (cost=0.00..23281.60 rows=716810 width=12) (actual time=0.015..172.235 rows=579563 loops=3 total=516.705)
  ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..1.05 rows=1 width=332) (actual time=0.003..0.003 rows=1 loops=15521 total=46.563)
        Index Cond: ("ID" = c.item)
Planning time: 0.223 ms
Execution time: 526.203 ms

I'm wondering, given that Query C's plan has lower overall costs than Query A/B's, why wouldn't the planner choose to execute that plan for queries A+B as well?
It has lower projected startup cost as well as lower total cost so apparently the optimzer does not consider such a plan with a subquery scan at all (otherwise it would choose it based on the lower cost estimates, right?) unless one forces it to via OFFSET 0.

Though I wouldn't necessarily consider this a bug, it is an issue that one has to explicitly work around with inadvisable optimization fences and it would be great if this could be fixed.

Thanks to the developer community for delivering this great product, I hope this helps in enhancing it.

Cheers,

Benjamin

-- 

Bejamin Coutu
[email protected]

ZeyOS, Inc.
http://www.zeyos.com

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Cheaper subquery scan not considered unless offset 0
  2017-10-29 11:24 Cheaper subquery scan not considered unless offset 0 Benjamin Coutu <[email protected]>
@ 2017-10-29 11:46 ` David Rowley <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: David Rowley @ 2017-10-29 11:46 UTC (permalink / raw)
  To: Benjamin Coutu <[email protected]>; +Cc: pgsql-performance; Tom Lane <[email protected]>

On 30 October 2017 at 00:24, Benjamin Coutu <[email protected]> wrote:
>   ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..1.05 rows=1 width=332) (actual time=0.003..0.003 rows=1 loops=15521 total=46.563)

I've never seen EXPLAIN output like that before.

Is this some modified version of PostgreSQL?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Cheaper subquery scan not considered unless offset 0
  2017-10-29 11:24 Cheaper subquery scan not considered unless offset 0 Benjamin Coutu <[email protected]>
@ 2017-10-29 14:58 ` Tom Lane <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Tom Lane @ 2017-10-29 14:58 UTC (permalink / raw)
  To: Benjamin Coutu <[email protected]>; +Cc: pgsql-performance

Benjamin Coutu <[email protected]> writes:
> Please consider the following three semantically equivalent, but differently written queries:
> ...
> Queries A + B generate the same plan and execute as follows:

>         ->  Finalize HashAggregate  (cost=32879.78..33102.62 rows=22285 width=12) (actual time=450.724..458.667 rows=15521 loops=1)
>               Group Key: b.item
>               Filter: (sum(b.amount) >= '1'::double precision)
>               Rows Removed by Filter: 48277

> Plan C though, thanks to the "offset optimization fence", executes the following, more efficient plan:

>   ->  Subquery Scan on c  (cost=32768.35..33269.76 rows=7428 width=12) (actual time=456.591..475.204 rows=15521 loops=1 total=475.204)
>         Filter: (c.stock >= '1'::double precision)
>         Rows Removed by Filter: 48277
>         ->  Finalize HashAggregate  (cost=32768.35..32991.20 rows=22285 width=12) (actual time=456.582..468.124 rows=63798 loops=1 total=468.124)
>               Group Key: b.item

Huh.  So we can see that the grouping step produces 63798 rows in reality,
of which 15521 pass the >= filter condition.  In Plan C, the planner
estimates the total number of group rows at 22285; then, having no
information about the statistics of c.stock, it uses DEFAULT_INEQ_SEL
(0.333) as the filter selectivity estimate, arriving at 7428 as the
estimated number of result rows for the subquery.

In Plan A+B, the planner presumably estimated the number of group rows at
22285 as well, but then it comes up with 22285 as the overall result.
Uh, what about the HAVING?

Evidently, the difference between 7428 and 22285 estimated rows out of
the subquery is enough to prompt a change in join plan for this query.
Since the true number is in between, it's just luck that Plan C is faster.
I don't put any great amount of stock in one join plan or the other
having been chosen for this case based on those estimates.

But ... what about the HAVING?  I took a quick look around and couldn't
find anyplace where the selectivity of an aggregate's filter condition
gets accounted for, which explains this observed behavior.  That seems
like a big oversight :-(

Now, it's true that we're basically never gonna be able to do better than
default selectivity estimates for post-aggregation filter conditions.
Maybe, at some point in the dim past, somebody intentionally decided that
applying the standard selectivity estimation logic to HAVING clauses was a
loser.  But I don't see any comments to that effect, and anyway taking the
selectivity as 1.0 all the time doesn't seem very bright either.

Changing this in back branches might be too much of a behavioral change,
but it seems like we oughta change HEAD to apply standard selectivity
estimation to the HAVING clause.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Cheaper subquery scan not considered unless offset 0
@ 2017-10-29 13:17 Benjamin Coutu <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Benjamin Coutu @ 2017-10-29 13:17 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: pgsql-performance; Tom Lane <[email protected]>

It's not a modified postgres version. It's simply for my convenience that my tooling calculats "total" as "actual time" multiplied by "loops". Looks like I didn't properly strip that away when copy-pasting.

Here are the queries and original plans again, sorry for the confusion.

Query A:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions b GROUP BY item HAVING sum(amount) >= 1
) c ON c.item = a."ID"

Query B:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions b GROUP BY item
) c ON c.item = a."ID" WHERE c.stock >= 1

Query C:

SELECT * FROM items a INNER JOIN (
  SELECT item, sum(amount) stock FROM stocktransactions b GROUP BY item OFFSET 0
) c ON c.item = a."ID" WHERE c.stock >= 1

Queries A + B generate the same plan and execute as follows:

Merge Join  (cost=34935.30..51701.59 rows=22285 width=344) (actual time=463.824..659.553 rows=15521 loops=1)
  Merge Cond: (a."ID" = b.item)
  ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..15592.23 rows=336083 width=332) (actual time=0.012..153.899 rows=336064 loops=1)
  ->  Sort  (cost=34934.87..34990.59 rows=22285 width=12) (actual time=463.677..466.146 rows=15521 loops=1)
        Sort Key: b.item
        Sort Method: quicksort  Memory: 1112kB
        ->  Finalize HashAggregate  (cost=32879.78..33102.62 rows=22285 width=12) (actual time=450.724..458.667 rows=15521 loops=1)
              Group Key: b.item
              Filter: (sum(b.amount) >= '1'::double precision)
              Rows Removed by Filter: 48277
              ->  Gather  (cost=27865.65..32545.50 rows=44570 width=12) (actual time=343.715..407.243 rows=162152 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Partial HashAggregate  (cost=26865.65..27088.50 rows=22285 width=12) (actual time=336.416..348.105 rows=54051 loops=3)
                          Group Key: b.item
                          ->  Parallel Seq Scan on stocktransactions b  (cost=0.00..23281.60 rows=716810 width=12) (actual time=0.015..170.646 rows=579563 loops=3)
Planning time: 0.277 ms
Execution time: 661.342 ms

Plan C though, thanks to the "offset optimization fence", executes the following, more efficient plan:

Nested Loop  (cost=32768.77..41146.56 rows=7428 width=344) (actual time=456.611..525.395 rows=15521 loops=1)
  ->  Subquery Scan on c  (cost=32768.35..33269.76 rows=7428 width=12) (actual time=456.591..475.204 rows=15521 loops=1)
        Filter: (c.stock >= '1'::double precision)
        Rows Removed by Filter: 48277
        ->  Finalize HashAggregate  (cost=32768.35..32991.20 rows=22285 width=12) (actual time=456.582..468.124 rows=63798 loops=1)
              Group Key: b.item
              ->  Gather  (cost=27865.65..32545.50 rows=44570 width=12) (actual time=348.479..415.463 rows=162085 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Partial HashAggregate  (cost=26865.65..27088.50 rows=22285 width=12) (actual time=343.952..355.912 rows=54028 loops=3)
                          Group Key: b.item
                          ->  Parallel Seq Scan on stocktransactions b  (cost=0.00..23281.60 rows=716810 width=12) (actual time=0.015..172.235 rows=579563 loops=3)
  ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..1.05 rows=1 width=332) (actual time=0.003..0.003 rows=1 loops=15521)
        Index Cond: ("ID" = c.item)
Planning time: 0.223 ms
Execution time: 526.203 ms


========== Original ==========
From: David Rowley <[email protected]>
To: Benjamin Coutu <[email protected]>
Date: Sun, 29 Oct 2017 12:46:42 +0100
Subject: Re: [PERFORM] Cheaper subquery scan not considered unless offset 0

> 
> 
> On 30 October 2017 at 00:24, Benjamin Coutu <[email protected]> wrote:
> >   ->  Index Scan using "PK_items_ID" on items a  (cost=0.42..1.05 rows=1 width=332) (actual time=0.003..0.003 rows=1 loops=15521 total=46.563)
> 
> I've never seen EXPLAIN output like that before.
> 
> Is this some modified version of PostgreSQL?
>



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Cheaper subquery scan not considered unless offset 0
@ 2017-10-29 14:41 Benjamin Coutu <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Benjamin Coutu @ 2017-10-29 14:41 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: pgsql-performance; Tom Lane <[email protected]>

There is actually another separate issue here apart from that the planner obviously choosing the wrong plan as originally described in my last message, a plan it knows to be more expensive based on cost estimates.

Take a look at the way the filter condition is treated differently when estimating the number of returned rows when applied in different nodes.

Queries A/B:

 ->  Finalize HashAggregate  (cost=32879.78..33102.62 rows=22285 width=12) (actual time=450.724..458.667 rows=15521 loops=1)
              Group Key: b.item
              Filter: (sum(b.amount) >= '1'::double precision)
              Rows Removed by Filter: 48277
              ->  Gather ...

Query C:

 ->  Subquery Scan on c  (cost=32768.35..33269.76 rows=7428 width=12) (actual time=456.591..475.204 rows=15521 loops=1)
        Filter: (c.stock >= '1'::double precision)
        Rows Removed by Filter: 48277
        ->  Finalize HashAggregate  (cost=32768.35..32991.20 rows=22285 width=12) (actual time=456.582..468.124 rows=63798 loops=1)
              Group Key: b.item
              ->  Gather ...

Interestingly enough the subquery scan with query C correctly accounts for the filter when estimating rows=7428, while A/B doesn't seem to account for the filter in the HasAggregate node (estimated rows=22285). This looks like a bug.

-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2017-10-29 14:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-10-29 11:24 Cheaper subquery scan not considered unless offset 0 Benjamin Coutu <[email protected]>
2017-10-29 11:46 ` David Rowley <[email protected]>
2017-10-29 14:58 ` Tom Lane <[email protected]>
2017-10-29 13:17 Re: Cheaper subquery scan not considered unless offset 0 Benjamin Coutu <[email protected]>
2017-10-29 14:41 Re: Cheaper subquery scan not considered unless offset 0 Benjamin Coutu <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox