public inbox for [email protected]
help / color / mirror / Atom feedselect with max functions
8+ messages / 5 participants
[nested] [flat]
* select with max functions
@ 2017-10-01 12:41 Mariel Cherkassky <[email protected]>
2017-10-01 13:35 ` Re: select with max functions Gerardo Herzig <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
0 siblings, 2 replies; 8+ messages in thread
From: Mariel Cherkassky @ 2017-10-01 12:41 UTC (permalink / raw)
To: pgsql-performance
Hi,
I need to use the max function in my query. I had very bad performance when
I used the max :
SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma
WHERE Ma.Bb_Open_Date =
(SELECT max(Bb_Open_Date)
FROM Manuim Man
WHERE Man.User_Id = Ma.User_Id
)
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.56..3250554784.13 rows=115111 width=18)
Group Key: ma.user_id
Filter: (count(*) > 1)
-> Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3250552295.59 rows=178324 width=10)
Filter: (bb_open_date = (SubPlan 1))
SubPlan 1
-> Aggregate (cost=90.98..90.99 rows=1 width=8)
-> Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
Index Cond: ((user_id)::text = (ma.user_id)::text)
(9 rows)
So I used the limit 1 option :
SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma
WHERE Ma.Bb_Open_Date =
(SELECT Bb_Open_Date
FROM Manuim Man
WHERE Man.User_Id = Ma.User_Id order
by bb_open_date desc limit 1
)
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
and the performance are still the same :
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.56..3252248863.46 rows=115111 width=18)
Group Key: ma.user_id
Filter: (count(*) > 1)
-> Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3252246374.92 rows=178324 width=10)
Filter: (bb_open_date = (SubPlan 1))
SubPlan 1
-> Limit (cost=91.03..91.03 rows=1 width=8)
-> Sort (cost=91.03..91.09 rows=22 width=8)
Sort Key: man.bb_open_date DESC
-> Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
Index Cond: ((user_id)::text =
(ma.user_id)::text)
(11 rows)
the reading on the table manuim takes a lot of effort, what else can I do ?
the table`s size is 8G.
select count(*) from manuim;
count
----------
35664828
(1 row)
the indexes on the table :
"manuim_bb_open_date" btree (bb_open_date)
"manuim_i_user_id" btree (user_id)
Any idea how can I continue from here ? Thanks , Mariel.
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
@ 2017-10-01 13:35 ` Gerardo Herzig <[email protected]>
1 sibling, 0 replies; 8+ messages in thread
From: Gerardo Herzig @ 2017-10-01 13:35 UTC (permalink / raw)
To: Mariel Cherkassky <[email protected]>; +Cc: pgsql-performance
----- Mensaje original -----
> De: "Mariel Cherkassky" <[email protected]>
> Para: [email protected]
> Enviados: Domingo, 1 de Octubre 2017 9:41:37
> Asunto: [PERFORM] select with max functions
>
> Hi,
> I need to use the max function in my query. I had very bad performance when
> I used the max :
>
> SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma
> WHERE Ma.Bb_Open_Date =
> (SELECT max(Bb_Open_Date)
> FROM Manuim Man
> WHERE Man.User_Id = Ma.User_Id
> )
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
>
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=0.56..3250554784.13 rows=115111 width=18)
> Group Key: ma.user_id
> Filter: (count(*) > 1)
> -> Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3250552295.59 rows=178324 width=10)
> Filter: (bb_open_date = (SubPlan 1))
> SubPlan 1
> -> Aggregate (cost=90.98..90.99 rows=1 width=8)
> -> Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
> Index Cond: ((user_id)::text = (ma.user_id)::text)
> (9 rows)
>
>
>
> So I used the limit 1 option :
>
> SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma
> WHERE Ma.Bb_Open_Date =
> (SELECT Bb_Open_Date
> FROM Manuim Man
> WHERE Man.User_Id = Ma.User_Id order
> by bb_open_date desc limit 1
> )
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
>
> and the performance are still the same :
>
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------
> GroupAggregate (cost=0.56..3252248863.46 rows=115111 width=18)
> Group Key: ma.user_id
> Filter: (count(*) > 1)
> -> Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3252246374.92 rows=178324 width=10)
> Filter: (bb_open_date = (SubPlan 1))
> SubPlan 1
> -> Limit (cost=91.03..91.03 rows=1 width=8)
> -> Sort (cost=91.03..91.09 rows=22 width=8)
> Sort Key: man.bb_open_date DESC
> -> Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
> Index Cond: ((user_id)::text =
> (ma.user_id)::text)
> (11 rows)
>
>
>
> the reading on the table manuim takes a lot of effort, what else can I do ?
> the table`s size is 8G.
>
> select count(*) from manuim;
> count
> ----------
> 35664828
> (1 row)
>
> the indexes on the table :
> "manuim_bb_open_date" btree (bb_open_date)
> "manuim_i_user_id" btree (user_id)
>
>
> Any idea how can I continue from here ? Thanks , Mariel.
Start by posting the results of "explain analyze" of that queries, so we can see some timming stuff.
Gerardo
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
@ 2017-10-01 18:48 ` Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Re: select with max functions Mariel Cherkassky <[email protected]>
1 sibling, 1 reply; 8+ messages in thread
From: Andreas Kretschmer @ 2017-10-01 18:48 UTC (permalink / raw)
To: pgsql-performance
Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
> Hi,
> I need to use the max function in my query. I had very bad performance
> when I used the max :
>
> SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma
> WHERE Ma.Bb_Open_Date =
> (SELECT max(Bb_Open_Date)
> FROM Manuim Man
> WHERE Man.User_Id = Ma.User_Id
> )
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
>
>
> Any idea how can I continue from here ? Thanks , Mariel.
Maybe you can rewrite it, for instance to
select distinct on (user_id, bb_open_date) user_id, bb_open_date,
count(1) from Manuim group by 1,2 having count(1) > 1;
maybe much cheaper, but untested! If not, please share more details, at
least table-definition.
Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
@ 2017-10-02 13:25 ` Mariel Cherkassky <[email protected]>
2017-10-02 13:45 ` Re: select with max functions Gerardo Herzig <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Mariel Cherkassky @ 2017-10-02 13:25 UTC (permalink / raw)
To: Andreas Kretschmer <[email protected]>; +Cc: pgsql-performance
Andreas I tried to rewrite it with the function rank() but I failed. The
query you wrote isnt the same as what I search. Moreover, I cant use
explain analyze because it is taking to much time to run and I'm getting
timeout..
2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:
>
>
> Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
>
>> Hi,
>> I need to use the max function in my query. I had very bad performance
>> when I used the max :
>>
>> SELECT Ma.User_Id,
>> COUNT(*) COUNT
>> FROM Manuim Ma
>> WHERE Ma.Bb_Open_Date =
>> (SELECT max(Bb_Open_Date)
>> FROM Manuim Man
>> WHERE Man.User_Id = Ma.User_Id
>> )
>> GROUP BY Ma.User_Id
>> HAVING COUNT(*) > 1;
>>
>>
>> Any idea how can I continue from here ? Thanks , Mariel.
>>
>
>
> Maybe you can rewrite it, for instance to
>
> select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1)
> from Manuim group by 1,2 having count(1) > 1;
>
> maybe much cheaper, but untested! If not, please share more details, at
> least table-definition.
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
> --
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Re: select with max functions Mariel Cherkassky <[email protected]>
@ 2017-10-02 13:45 ` Gerardo Herzig <[email protected]>
2017-10-02 14:45 ` Re: select with max functions Mariel Cherkassky <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Gerardo Herzig @ 2017-10-02 13:45 UTC (permalink / raw)
To: Mariel Cherkassky <[email protected]>; +Cc: pgsql-performance
----- Mensaje original -----
> De: "Mariel Cherkassky" <[email protected]>
> Para: "Andreas Kretschmer" <[email protected]>
> CC: [email protected]
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
>
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
>
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:
Do a "set statement_timeout TO 0" prior to "explain analyze"
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Re: select with max functions Mariel Cherkassky <[email protected]>
2017-10-02 13:45 ` Re: select with max functions Gerardo Herzig <[email protected]>
@ 2017-10-02 14:45 ` Mariel Cherkassky <[email protected]>
2017-10-02 15:29 ` Re: select with max functions Tom Lane <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Mariel Cherkassky @ 2017-10-02 14:45 UTC (permalink / raw)
To: Gerardo Herzig <[email protected]>; +Cc: pgsql-performance
explain analyze SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma
WHERE Ma.Bb_Open_Date =
(SELECT Bb_Open_Date
FROM Manuim Man
WHERE Man.User_Id = Ma.User_Id order
by bb_open_date desc limit 1
)
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
----------------------------------------
GroupAggregate (cost=0.56..2430770384.80 rows=128137 width=18) (actual
time=55.823..2970443.757 rows=1213 loops=1)
Group Key: ma.user_id
Filter: (count(*) > 1)
Rows Removed by Filter: 3693020
-> Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249
..2966355.734 rows=3695461 loops=1)
Filter: (bb_open_date = (SubPlan 1))
Rows Removed by Filter: 31969367
SubPlan 1
-> Limit (cost=68.00..68.00 rows=1 width=8) (actual
time=0.082..0.082 rows=0 loops=35664828)
-> Sort (cost=68.00..68.04 rows=16 width=8) (actual
time=0.081..0.081 rows=0 loops=35664828)
Sort Key: man.bb_open_date DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using manuim_i_user_id on manuim man
(cost=0.56..67.92 rows=16 width=8) (actual ti
me=0.001..0.069 rows=85 loops=35664828)
Index Cond: ((user_id)::text =
(ma.user_id)::text)
Planning time: 0.414 ms
Execution time: 2970444.732 ms
(16 rows)
2017-10-02 16:45 GMT+03:00 Gerardo Herzig <[email protected]>:
>
>
> ----- Mensaje original -----
> > De: "Mariel Cherkassky" <[email protected]>
> > Para: "Andreas Kretschmer" <[email protected]>
> > CC: [email protected]
> > Enviados: Lunes, 2 de Octubre 2017 10:25:19
> > Asunto: Re: [PERFORM] select with max functions
> >
> > Andreas I tried to rewrite it with the function rank() but I failed. The
> > query you wrote isnt the same as what I search. Moreover, I cant use
> > explain analyze because it is taking to much time to run and I'm getting
> > timeout..
> >
> > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:
>
> Do a "set statement_timeout TO 0" prior to "explain analyze"
>
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Re: select with max functions Mariel Cherkassky <[email protected]>
2017-10-02 13:45 ` Re: select with max functions Gerardo Herzig <[email protected]>
2017-10-02 14:45 ` Re: select with max functions Mariel Cherkassky <[email protected]>
@ 2017-10-02 15:29 ` Tom Lane <[email protected]>
2017-10-03 21:16 ` Re: select with max functions Mark Kirkwood <[email protected]>
0 siblings, 1 reply; 8+ messages in thread
From: Tom Lane @ 2017-10-02 15:29 UTC (permalink / raw)
To: Mariel Cherkassky <[email protected]>; +Cc: Gerardo Herzig <[email protected]>; pgsql-performance
Mariel Cherkassky <[email protected]> writes:
> explain analyze SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma
> WHERE Ma.Bb_Open_Date =
> (SELECT Bb_Open_Date
> FROM Manuim Man
> WHERE Man.User_Id = Ma.User_Id order
> by bb_open_date desc limit 1
> )
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of
SELECT Ma.User_Id,
COUNT(*) COUNT
FROM Manuim Ma,
(SELECT User_Id, max(Bb_Open_Date) as max
FROM Manuim Man
GROUP BY User_Id) ss
WHERE Ma.User_Id = ss.User_Id AND
Ma.Bb_Open_Date = ss.max
GROUP BY Ma.User_Id
HAVING COUNT(*) > 1;
This is still not going to be instantaneous, but it might be better.
It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.
regards, tom lane
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: select with max functions
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 18:48 ` Re: select with max functions Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Re: select with max functions Mariel Cherkassky <[email protected]>
2017-10-02 13:45 ` Re: select with max functions Gerardo Herzig <[email protected]>
2017-10-02 14:45 ` Re: select with max functions Mariel Cherkassky <[email protected]>
2017-10-02 15:29 ` Re: select with max functions Tom Lane <[email protected]>
@ 2017-10-03 21:16 ` Mark Kirkwood <[email protected]>
0 siblings, 0 replies; 8+ messages in thread
From: Mark Kirkwood @ 2017-10-03 21:16 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Mariel Cherkassky <[email protected]>; +Cc: Gerardo Herzig <[email protected]>; pgsql-performance
On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <[email protected]> writes:
>> explain analyze SELECT Ma.User_Id,
>> COUNT(*) COUNT
>> FROM Manuim Ma
>> WHERE Ma.Bb_Open_Date =
>> (SELECT Bb_Open_Date
>> FROM Manuim Man
>> WHERE Man.User_Id = Ma.User_Id order
>> by bb_open_date desc limit 1
>> )
>> GROUP BY Ma.User_Id
>> HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table). Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
> COUNT(*) COUNT
> FROM Manuim Ma,
> (SELECT User_Id, max(Bb_Open_Date) as max
> FROM Manuim Man
> GROUP BY User_Id) ss
> WHERE Ma.User_Id = ss.User_Id AND
> Ma.Bb_Open_Date = ss.max
> GROUP BY Ma.User_Id
> HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
> regards, tom lane
>
>
Further ideas based on Tom's rewrite: If that MAX is still expensive it
might be worth breaking
SELECT User_Id, max(Bb_Open_Date) as max
FROM Manuim Man
GROUP BY User_Id
out into a VIEW, and considering making it MATERIALIZED, or creating an
equivalent trigger based summary table (there are examples in the docs
of how to do this).
Cheers
Mark
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2017-10-03 21:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 13:35 ` Gerardo Herzig <[email protected]>
2017-10-01 18:48 ` Andreas Kretschmer <[email protected]>
2017-10-02 13:25 ` Mariel Cherkassky <[email protected]>
2017-10-02 13:45 ` Gerardo Herzig <[email protected]>
2017-10-02 14:45 ` Mariel Cherkassky <[email protected]>
2017-10-02 15:29 ` Tom Lane <[email protected]>
2017-10-03 21:16 ` Mark Kirkwood <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox