public inbox for [email protected]  
help / color / mirror / Atom feed
select with max functions
8+ messages / 5 participants
[nested] [flat]

* select with max functions
@ 2017-10-01 12:41  Mariel Cherkassky <[email protected]>
  0 siblings, 2 replies; 8+ messages in thread

From: Mariel Cherkassky @ 2017-10-01 12:41 UTC (permalink / raw)
  To: pgsql-performance

Hi,
I need to use the max function in my query. I had very bad performance when
I used the max  :

               SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  =
                                  (SELECT max(Bb_Open_Date)
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;


                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3250552295.59 rows=178324 width=10)
         Filter: (bb_open_date = (SubPlan 1))
         SubPlan 1
           ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
                 ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
                       Index Cond: ((user_id)::text = (ma.user_id)::text)
(9 rows)



So I used the limit 1 option :

               SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  =
                                  (SELECT Bb_Open_Date
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id order
by                                   bb_open_date desc limit 1
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;

and the performance are still the same :

                                                  QUERY PLAN

---------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..3252246374.92 rows=178324 width=10)
         Filter: (bb_open_date = (SubPlan 1))
         SubPlan 1
           ->  Limit  (cost=91.03..91.03 rows=1 width=8)
                 ->  Sort  (cost=91.03..91.09 rows=22 width=8)
                       Sort Key: man.bb_open_date DESC
                       ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..90.92 rows=22 width=8)
                             Index Cond: ((user_id)::text =
(ma.user_id)::text)
(11 rows)



the reading on the table manuim takes a lot of effort, what else can I do ?
the table`s size is 8G.

select count(*) from manuim;
  count
----------
 35664828
(1 row)

the indexes on the table :
 "manuim_bb_open_date" btree (bb_open_date)
"manuim_i_user_id" btree (user_id)


Any idea how can I continue from here ? Thanks , Mariel.


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-01 13:35  Gerardo Herzig <[email protected]>
  parent: Mariel Cherkassky <[email protected]>
  1 sibling, 0 replies; 8+ messages in thread

From: Gerardo Herzig @ 2017-10-01 13:35 UTC (permalink / raw)
  To: Mariel Cherkassky <[email protected]>; +Cc: pgsql-performance



----- Mensaje original -----
> De: "Mariel Cherkassky" <[email protected]>
> Para: [email protected]
> Enviados: Domingo, 1 de Octubre 2017 9:41:37
> Asunto: [PERFORM] select with max functions
> 
> Hi,
> I need to use the max function in my query. I had very bad performance when
> I used the max  :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> 
>                                                QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3250554784.13 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3250552295.59 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Aggregate  (cost=90.98..90.99 rows=1 width=8)
>                  ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                        Index Cond: ((user_id)::text = (ma.user_id)::text)
> (9 rows)
> 
> 
> 
> So I used the limit 1 option :
> 
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
> 
> and the performance are still the same :
> 
>                                                   QUERY PLAN
> 
> ---------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.56..3252248863.46 rows=115111 width=18)
>    Group Key: ma.user_id
>    Filter: (count(*) > 1)
>    ->  Index Scan using manuim_i_user_id on manuim ma
> (cost=0.56..3252246374.92 rows=178324 width=10)
>          Filter: (bb_open_date = (SubPlan 1))
>          SubPlan 1
>            ->  Limit  (cost=91.03..91.03 rows=1 width=8)
>                  ->  Sort  (cost=91.03..91.09 rows=22 width=8)
>                        Sort Key: man.bb_open_date DESC
>                        ->  Index Scan using manuim_i_user_id on manuim man
> (cost=0.56..90.92 rows=22 width=8)
>                              Index Cond: ((user_id)::text =
> (ma.user_id)::text)
> (11 rows)
> 
> 
> 
> the reading on the table manuim takes a lot of effort, what else can I do ?
> the table`s size is 8G.
> 
> select count(*) from manuim;
>   count
> ----------
>  35664828
> (1 row)
> 
> the indexes on the table :
>  "manuim_bb_open_date" btree (bb_open_date)
> "manuim_i_user_id" btree (user_id)
> 
> 
> Any idea how can I continue from here ? Thanks , Mariel.

Start by posting the results of "explain analyze" of that queries, so we can see some timming stuff.

Gerardo


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-01 18:48  Andreas Kretschmer <[email protected]>
  parent: Mariel Cherkassky <[email protected]>
  1 sibling, 1 reply; 8+ messages in thread

From: Andreas Kretschmer @ 2017-10-01 18:48 UTC (permalink / raw)
  To: pgsql-performance



Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
> Hi,
> I need to use the max function in my query. I had very bad performance 
> when I used the max  :
>
>                SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT max(Bb_Open_Date)
>                                    FROM  Manuim Man
>                                    WHERE Man.User_Id = Ma.User_Id
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;
>
>
> Any idea how can I continue from here ? Thanks , Mariel.


Maybe you can rewrite it, for instance to

select distinct on (user_id, bb_open_date) user_id, bb_open_date, 
count(1) from Manuim group by 1,2 having count(1) > 1;

maybe much cheaper, but untested! If not, please share more details, at 
least table-definition.

Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-02 13:25  Mariel Cherkassky <[email protected]>
  parent: Andreas Kretschmer <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Mariel Cherkassky @ 2017-10-02 13:25 UTC (permalink / raw)
  To: Andreas Kretschmer <[email protected]>; +Cc: pgsql-performance

Andreas I tried to rewrite it with the function rank() but I failed. The
query you wrote isnt the same as what I search. Moreover, I cant use
explain analyze because it is taking to much time to run and I'm getting
timeout..

2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:

>
>
> Am 01.10.2017 um 14:41 schrieb Mariel Cherkassky:
>
>> Hi,
>> I need to use the max function in my query. I had very bad performance
>> when I used the max  :
>>
>>                SELECT Ma.User_Id,
>>                       COUNT(*) COUNT
>>                FROM   Manuim Ma
>>                WHERE  Ma.Bb_Open_Date  =
>>                                   (SELECT max(Bb_Open_Date)
>>                                    FROM  Manuim Man
>>                                    WHERE Man.User_Id = Ma.User_Id
>>                                   )
>>                GROUP  BY Ma.User_Id
>>                HAVING COUNT(*) > 1;
>>
>>
>> Any idea how can I continue from here ? Thanks , Mariel.
>>
>
>
> Maybe you can rewrite it, for instance to
>
> select distinct on (user_id, bb_open_date) user_id, bb_open_date, count(1)
> from Manuim group by 1,2 having count(1) > 1;
>
> maybe much cheaper, but untested! If not, please share more details, at
> least table-definition.
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
> --
> Sent via pgsql-performance mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-02 13:45  Gerardo Herzig <[email protected]>
  parent: Mariel Cherkassky <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Gerardo Herzig @ 2017-10-02 13:45 UTC (permalink / raw)
  To: Mariel Cherkassky <[email protected]>; +Cc: pgsql-performance



----- Mensaje original -----
> De: "Mariel Cherkassky" <[email protected]>
> Para: "Andreas Kretschmer" <[email protected]>
> CC: [email protected]
> Enviados: Lunes, 2 de Octubre 2017 10:25:19
> Asunto: Re: [PERFORM] select with max functions
> 
> Andreas I tried to rewrite it with the function rank() but I failed. The
> query you wrote isnt the same as what I search. Moreover, I cant use
> explain analyze because it is taking to much time to run and I'm getting
> timeout..
> 
> 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:

Do a "set statement_timeout TO 0" prior to "explain analyze"


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-02 14:45  Mariel Cherkassky <[email protected]>
  parent: Gerardo Herzig <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Mariel Cherkassky @ 2017-10-02 14:45 UTC (permalink / raw)
  To: Gerardo Herzig <[email protected]>; +Cc: pgsql-performance

explain analyze   SELECT Ma.User_Id,
                      COUNT(*) COUNT
               FROM   Manuim Ma
               WHERE  Ma.Bb_Open_Date  =
                                  (SELECT Bb_Open_Date
                                   FROM   Manuim Man
                                   WHERE  Man.User_Id = Ma.User_Id order
by                                   bb_open_date desc limit 1
                                  )
               GROUP  BY Ma.User_Id
               HAVING COUNT(*) > 1;

 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
----------------------------------------
 GroupAggregate  (cost=0.56..2430770384.80 rows=128137 width=18) (actual
time=55.823..2970443.757 rows=1213 loops=1)
   Group Key: ma.user_id
   Filter: (count(*) > 1)
   Rows Removed by Filter: 3693020
   ->  Index Scan using manuim_i_user_id on manuim ma
(cost=0.56..2430767766.00 rows=178324 width=10) (actual time=0.249
..2966355.734 rows=3695461 loops=1)
         Filter: (bb_open_date = (SubPlan 1))
         Rows Removed by Filter: 31969367
         SubPlan 1
           ->  Limit  (cost=68.00..68.00 rows=1 width=8) (actual
time=0.082..0.082 rows=0 loops=35664828)
                 ->  Sort  (cost=68.00..68.04 rows=16 width=8) (actual
time=0.081..0.081 rows=0 loops=35664828)
                       Sort Key: man.bb_open_date DESC
                       Sort Method: quicksort  Memory: 25kB
                       ->  Index Scan using manuim_i_user_id on manuim man
(cost=0.56..67.92 rows=16 width=8) (actual ti
me=0.001..0.069 rows=85 loops=35664828)
                             Index Cond: ((user_id)::text =
(ma.user_id)::text)
 Planning time: 0.414 ms
 Execution time: 2970444.732 ms
(16 rows)

2017-10-02 16:45 GMT+03:00 Gerardo Herzig <[email protected]>:

>
>
> ----- Mensaje original -----
> > De: "Mariel Cherkassky" <[email protected]>
> > Para: "Andreas Kretschmer" <[email protected]>
> > CC: [email protected]
> > Enviados: Lunes, 2 de Octubre 2017 10:25:19
> > Asunto: Re: [PERFORM] select with max functions
> >
> > Andreas I tried to rewrite it with the function rank() but I failed. The
> > query you wrote isnt the same as what I search. Moreover, I cant use
> > explain analyze because it is taking to much time to run and I'm getting
> > timeout..
> >
> > 2017-10-01 21:48 GMT+03:00 Andreas Kretschmer <[email protected]>:
>
> Do a "set statement_timeout TO 0" prior to "explain analyze"
>


^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-02 15:29  Tom Lane <[email protected]>
  parent: Mariel Cherkassky <[email protected]>
  0 siblings, 1 reply; 8+ messages in thread

From: Tom Lane @ 2017-10-02 15:29 UTC (permalink / raw)
  To: Mariel Cherkassky <[email protected]>; +Cc: Gerardo Herzig <[email protected]>; pgsql-performance

Mariel Cherkassky <[email protected]> writes:
> explain analyze   SELECT Ma.User_Id,
>                       COUNT(*) COUNT
>                FROM   Manuim Ma
>                WHERE  Ma.Bb_Open_Date  =
>                                   (SELECT Bb_Open_Date
>                                    FROM   Manuim Man
>                                    WHERE  Man.User_Id = Ma.User_Id order
> by                                   bb_open_date desc limit 1
>                                   )
>                GROUP  BY Ma.User_Id
>                HAVING COUNT(*) > 1;

The core problem with this query is that the sub-select has to be done
over again for each row of the outer table, since it's a correlated
sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
a max() call with handmade logic doesn't do anything to help that.
I'd try refactoring it so that you calculate the max Bb_Open_Date just
once for each user id, perhaps along the lines of

SELECT Ma.User_Id,
       COUNT(*) COUNT
       FROM   Manuim Ma,
              (SELECT User_Id, max(Bb_Open_Date) as max
               FROM   Manuim Man
               GROUP BY User_Id) ss
       WHERE  Ma.User_Id = ss.User_Id AND
              Ma.Bb_Open_Date = ss.max
       GROUP  BY Ma.User_Id
       HAVING COUNT(*) > 1;

This is still not going to be instantaneous, but it might be better.

It's possible that an index on (User_Id, Bb_Open_Date) would help,
but I'm not sure.

			regards, tom lane


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



^ permalink  raw  reply  [nested|flat] 8+ messages in thread

* Re: select with max functions
@ 2017-10-03 21:16  Mark Kirkwood <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 8+ messages in thread

From: Mark Kirkwood @ 2017-10-03 21:16 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Mariel Cherkassky <[email protected]>; +Cc: Gerardo Herzig <[email protected]>; pgsql-performance

On 03/10/17 04:29, Tom Lane wrote:
> Mariel Cherkassky <[email protected]> writes:
>> explain analyze   SELECT Ma.User_Id,
>>                        COUNT(*) COUNT
>>                 FROM   Manuim Ma
>>                 WHERE  Ma.Bb_Open_Date  =
>>                                    (SELECT Bb_Open_Date
>>                                     FROM   Manuim Man
>>                                     WHERE  Man.User_Id = Ma.User_Id order
>> by                                   bb_open_date desc limit 1
>>                                    )
>>                 GROUP  BY Ma.User_Id
>>                 HAVING COUNT(*) > 1;
> The core problem with this query is that the sub-select has to be done
> over again for each row of the outer table, since it's a correlated
> sub-select (ie, it refers to Ma.User_Id from the outer table).  Replacing
> a max() call with handmade logic doesn't do anything to help that.
> I'd try refactoring it so that you calculate the max Bb_Open_Date just
> once for each user id, perhaps along the lines of
>
> SELECT Ma.User_Id,
>         COUNT(*) COUNT
>         FROM   Manuim Ma,
>                (SELECT User_Id, max(Bb_Open_Date) as max
>                 FROM   Manuim Man
>                 GROUP BY User_Id) ss
>         WHERE  Ma.User_Id = ss.User_Id AND
>                Ma.Bb_Open_Date = ss.max
>         GROUP  BY Ma.User_Id
>         HAVING COUNT(*) > 1;
>
> This is still not going to be instantaneous, but it might be better.
>
> It's possible that an index on (User_Id, Bb_Open_Date) would help,
> but I'm not sure.
>
> 			regards, tom lane
>
>

Further ideas based on Tom's rewrite: If that MAX is still expensive it 
might be worth breaking


SELECT User_Id, max(Bb_Open_Date) as max
                FROM   Manuim Man
                GROUP BY User_Id

out into a VIEW, and considering making it MATERIALIZED, or creating an 
equivalent  trigger based summary table (there are examples in the docs 
of how to do this).

Cheers

Mark


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




^ permalink  raw  reply  [nested|flat] 8+ messages in thread


end of thread, other threads:[~2017-10-03 21:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2017-10-01 12:41 select with max functions Mariel Cherkassky <[email protected]>
2017-10-01 13:35 ` Gerardo Herzig <[email protected]>
2017-10-01 18:48 ` Andreas Kretschmer <[email protected]>
2017-10-02 13:25   ` Mariel Cherkassky <[email protected]>
2017-10-02 13:45     ` Gerardo Herzig <[email protected]>
2017-10-02 14:45       ` Mariel Cherkassky <[email protected]>
2017-10-02 15:29         ` Tom Lane <[email protected]>
2017-10-03 21:16           ` Mark Kirkwood <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox