Re: Query performance issue

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Query performance issue
9+ messages / 6 participants
[nested] [flat]

* Re: Query performance issue
@ 2024-10-22 05:31  yudhi s <[email protected]>
  0 siblings, 2 replies; 9+ messages in thread

From: yudhi s @ 2024-10-22 05:31 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Thu, Oct 17, 2024 at 3:06 AM Peter J. Holzer <[email protected]> wrote:

>
> The execution plan looks like a postgresql execution plan, not a mysql
> execution plan. Did you run this query on postgresql? That may be
> interesting for comparison purposese, but ultimately it is useless: You
> won't get mysql to work like postgresql, and any tips to speed up this
> query on postgresql (which is all you can expect on a postgresql mailing
> list) probably won't work on mysql.
>
>
>

Tried running the same in postgres and below is the plan from bothe
postgres and mysql. Can you please guide me to understand ,  if anything
else can be done to make it better?

https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 16:00  Jeff Ross <[email protected]>
  parent: yudhi s <[email protected]>
  1 sibling, 0 replies; 9+ messages in thread

From: Jeff Ross @ 2024-10-22 16:00 UTC (permalink / raw)
  To: [email protected]

On 10/21/24 23:31, yudhi s wrote:

>
> On Thu, Oct 17, 2024 at 3:06 AM Peter J. Holzer <[email protected]> wrote:
>
>
>     The execution plan looks like a postgresql execution plan, not a mysql
>     execution plan. Did you run this query on postgresql? That may be
>     interesting for comparison purposese, but ultimately it is
>     useless: You
>     won't get mysql to work like postgresql, and any tips to speed up this
>     query on postgresql (which is all you can expect on a postgresql
>     mailing
>     list) probably won't work on mysql.
>
>
>
> Tried running the same in postgres and below is the plan from bothe 
> postgres and mysql. Can you please guide me to understand ,  if 
> anything else can be done to make it better?
>
> https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8
>
Here's your postgres query as analyzed and made more readable by the 
most excellent explain.depesz.com.

https://explain.depesz.com/s/VyeM#html

Under the hints tab are suggestions to bump your work_mem to avoid 
writing sorts out to disk.

Jeff


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 19:01  Greg Sabino Mullane <[email protected]>
  parent: yudhi s <[email protected]>
  1 sibling, 2 replies; 9+ messages in thread

From: Greg Sabino Mullane @ 2024-10-22 19:01 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: [email protected]; [email protected]

To be frank, there is so much wrong with this query that it is hard to know
where to start. But a few top items:

* Make sure all of the tables involved have been analyzed. You might want
to bump default_statistics_target up and see if that helps.

* As mentioned already, increase work_mem, as you have things spilling to
disk (e.g. external merge Disk: 36280kB)

* Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
JOIN table2 ON (...) JOIN table3 ON (...)

* Try not to use subselects. Things like WHERE x IN (SELECT ...) are
expensive and hard to optimize.

* You have useless GROUP BY clauses in there. Remove to simplify the query

* There is no LIMIT. Does the client really need all 135,214 rows?

Cheers,
Greg

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 20:00  Ron Johnson <[email protected]>
  parent: Greg Sabino Mullane <[email protected]>
  1 sibling, 1 reply; 9+ messages in thread

From: Ron Johnson @ 2024-10-22 20:00 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Oct 22, 2024 at 3:02 PM Greg Sabino Mullane <[email protected]>
wrote:
[snip]

> * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
> JOIN table2 ON (...) JOIN table3 ON (...)
>

Why?

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> crustacean!


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 20:06  David G. Johnston <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: David G. Johnston @ 2024-10-22 20:06 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Oct 22, 2024 at 1:01 PM Ron Johnson <[email protected]> wrote:

> On Tue, Oct 22, 2024 at 3:02 PM Greg Sabino Mullane <[email protected]>
> wrote:
> [snip]
>
>> * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
>> JOIN table2 ON (...) JOIN table3 ON (...)
>>
>
> Why?
>
>
Readability is improved when done correctly; and you need to know it for
outer joins anyway, so consistency.

David J.


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 20:36  yudhi s <[email protected]>
  parent: Greg Sabino Mullane <[email protected]>
  1 sibling, 1 reply; 9+ messages in thread

From: yudhi s @ 2024-10-22 20:36 UTC (permalink / raw)
  To: Greg Sabino Mullane <[email protected]>; [email protected]; +Cc: [email protected]; [email protected]

On Wed, Oct 23, 2024 at 12:32 AM Greg Sabino Mullane <[email protected]>
wrote:

> To be frank, there is so much wrong with this query that it is hard to
> know where to start. But a few top items:
>
> * Make sure all of the tables involved have been analyzed. You might want
> to bump default_statistics_target up and see if that helps.
>
> * As mentioned already, increase work_mem, as you have things spilling to
> disk (e.g. external merge Disk: 36280kB)
>
> * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
> JOIN table2 ON (...) JOIN table3 ON (...)
>
> * Try not to use subselects. Things like WHERE x IN (SELECT ...) are
> expensive and hard to optimize.
>
> * You have useless GROUP BY clauses in there. Remove to simplify the query
>
> * There is no LIMIT. Does the client really need all 135,214 rows?
>
>
>
I tried running the query by removing both the "group by" from the inner
subqueries (I think the initial thought was that they will give distinct
records to the outer query and will thus help), and added limit 500 at the
last and also set the work_mem to 2GB for that session before running the
query. But seeing the response increased to ~5 seconds (from ~3.1 seconds
earlier). Below I have updated the execution plan for the same at the last
section.

https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8

Again , not able to clearly understand the third point you said below. Can
you please clarify a bit more. Do you mean we should write it as exists
/not exists rather IN and NOT IN and that will improve the performance? I
hope the third point doesn't matter much as we have all equijoin used here.
Correct me if I'm wrong.

*"Try not to use subselects. Things like WHERE x IN (SELECT ...) are
expensive and hard to optimize*."

Additionally in the plan which mysql makes and showing the highest response
time, is it suffering because of differences of the speed of the underlying
IO/storage or is it just because of the optimization features which are
available in postgres and not there in mysql ? Trying to understand if it
can be identified from the execution plan itself.

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 20:47  Laurenz Albe <[email protected]>
  parent: David G. Johnston <[email protected]>
  0 siblings, 0 replies; 9+ messages in thread

From: Laurenz Albe @ 2024-10-22 20:47 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Tue, 2024-10-22 at 13:06 -0700, David G. Johnston wrote:
> On Tue, Oct 22, 2024 at 1:01 PM Ron Johnson <[email protected]> wrote:
> > On Tue, Oct 22, 2024 at 3:02 PM Greg Sabino Mullane <[email protected]> wrote:
> > > * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1 JOIN table2 ON (...) JOIN table3 ON (...)
> > 
> > Why?
> 
> Readability is improved when done correctly; and you need to know it for outer joins anyway, so consistency.

Also, you cannot accidentally forget a join condition.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-22 21:23  yudhi s <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 9+ messages in thread

From: yudhi s @ 2024-10-22 21:23 UTC (permalink / raw)
  To: Greg Sabino Mullane <[email protected]>; [email protected]; +Cc: [email protected]; [email protected]

On Wed, Oct 23, 2024 at 2:06 AM yudhi s <[email protected]> wrote:

>
>
> On Wed, Oct 23, 2024 at 12:32 AM Greg Sabino Mullane <[email protected]>
> wrote:
>
>> To be frank, there is so much wrong with this query that it is hard to
>> know where to start. But a few top items:
>>
>> * Make sure all of the tables involved have been analyzed. You might want
>> to bump default_statistics_target up and see if that helps.
>>
>> * As mentioned already, increase work_mem, as you have things spilling to
>> disk (e.g. external merge Disk: 36280kB)
>>
>> * Don't use the "FROM table1, table2, table3" syntax but use "FROM table1
>> JOIN table2 ON (...) JOIN table3 ON (...)
>>
>> * Try not to use subselects. Things like WHERE x IN (SELECT ...) are
>> expensive and hard to optimize.
>>
>> * You have useless GROUP BY clauses in there. Remove to simplify the query
>>
>> * There is no LIMIT. Does the client really need all 135,214 rows?
>>
>>
>>
> I tried running the query by removing both the "group by" from the inner
> subqueries (I think the initial thought was that they will give distinct
> records to the outer query and will thus help), and added limit 500 at the
> last and also set the work_mem to 2GB for that session before running the
> query. But seeing the response increased to ~5 seconds (from ~3.1 seconds
> earlier). Below I have updated the execution plan for the same at the last
> section.
>
> https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8
>
> Again , not able to clearly understand the third point you said below. Can
> you please clarify a bit more. Do you mean we should write it as exists
> /not exists rather IN and NOT IN and that will improve the performance? I
> hope the third point doesn't matter much as we have all equijoin used here.
> Correct me if I'm wrong.
>
> *"Try not to use subselects. Things like WHERE x IN (SELECT ...) are
> expensive and hard to optimize*."
>
> Additionally in the plan which mysql makes and showing the highest
> response time, is it suffering because of differences of the speed of the
> underlying IO/storage or is it just because of the optimization features
> which are available in postgres and not there in mysql ? Trying to
> understand if it can be identified from the execution plan itself.
>

Additionally, I see below differences, when I compare two specific lines of
the plan which fetches data using the same index in both posgres and mysql,
so does it point that the speed in mysql is slower as compared to postgres.
And thus is it possible that the underlying infrastructure is playing a
role here in the mysql slowness too and that might be the key one here to
first address for mysql
OR its any optimization feature which helps postgres to give it an edge in
performance here over mysql?

In mysql plan:-
-> Index lookup on EX_STS using EX_STS_INDEX (AID=b3.AID, RC_ID=b3.RC_ID,
RC_VNB=b3.RC_VNB)  (cost=0.43 rows=2) (actual time=0.014..0.021 rows=2
loops=70904)
VS
In postgres plan:-
-> Index Scan using EX_STS_INDEX on RCE_STS EX_STS  (cost=0.42..0.82 rows=1
width=424) (actual time=0.006..0.007 rows=2 loops=70904)

**************

In mysql plan
-> Covering index lookup on mns using M_INF_AID_index
(AID='XXXXXXXXXXXXXXXXXXX')  (cost=9187.54 rows=72748) (actual
time=0.058..19.637 rows=35980 loops=1)
                                                            -> Filter:
(RNS.ASID = 'XXXXXXXXXXXXXXXXXXX')  (cost=43.50 rows=42) (actual
time=0.042..0.218 rows=97 loops=1)
VS
In postgres plan
->  Bitmap Index Scan on M_INF_AID_index  (cost=0.00..406.98 rows=36074
width=0) (actual time=0.790..0.790 rows=35980 loops=1)
                            Index Cond: ((AID)::text =
'XXXXXXXXXXXXXXXXXXX'::text)


^ permalink  raw  reply  [nested|flat] 9+ messages in thread

* Re: Query performance issue
@ 2024-10-24 14:01  Greg Sabino Mullane <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 0 replies; 9+ messages in thread

From: Greg Sabino Mullane @ 2024-10-24 14:01 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: [email protected]; [email protected]; [email protected]

>
> Additionally in the plan which mysql makes and showing the highest
> response time, is it suffering because of differences of the speed of the
> underlying IO/storage or is it just because of the optimization features
> which are available in postgres and not there in mysql ? Trying to
> understand if it can be identified from the execution plan itself.
>

I think trying to compare postgres and mysql plans against each other is
not a very useful endeavor. There are fundamental design decisions between
the two. Focus on making your Postgres query the best it can be, full stop.
Optimize your queries, make sure the database is analyzed, and tweak some
configs as needed.

Also, you cannot accidentally forget a join condition.

Yes, this is the primary reason. Cartesian joins hurt.

Again , not able to clearly understand the third point you said below. Can
> you please clarify a bit more. Do you mean we should write it as exists
> /not exists rather IN and NOT IN and that will improve the performance?

It gives Postgres more options on how to do things, so yes, it can be
better.

Cheers,
Greg

^ permalink  raw  reply  [nested|flat] 9+ messages in thread

end of thread, other threads:[~2024-10-24 14:01 UTC | newest]

Thread overview: 9+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-10-22 05:31 Re: Query performance issue yudhi s <[email protected]>
2024-10-22 16:00 ` Jeff Ross <[email protected]>
2024-10-22 19:01 ` Greg Sabino Mullane <[email protected]>
2024-10-22 20:00   ` Ron Johnson <[email protected]>
2024-10-22 20:06     ` David G. Johnston <[email protected]>
2024-10-22 20:47       ` Laurenz Albe <[email protected]>
2024-10-22 20:36   ` yudhi s <[email protected]>
2024-10-22 21:23     ` yudhi s <[email protected]>
2024-10-24 14:01       ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox