Re: Problem with a Query

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tom Lane <[email protected]>
To: Siraj G <[email protected]>
Cc: [email protected]
Subject: Re: Problem with a Query
Date: Tue, 13 Aug 2024 00:27:11 -0400
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAC5iy610q5vrvmcU88CTEP7367CJVYagDmpYnKZ9G08uQf9ZUg@mail.gmail.com>
References: <CAC5iy610q5vrvmcU88CTEP7367CJVYagDmpYnKZ9G08uQf9ZUg@mail.gmail.com>

Siraj G <[email protected]> writes:
> We migrated a PgSQL database from Cloud SQL to compute engine and since
> then there is a SQL we observed taking a long time. After some study, I
> found that the SQL is using NESTED LOOP where the cost is too high.

The core of your problem seems to be here:

>                      ->  Index Scan using marketing_a_cancel__55ffff_idx on
> marketing_app_leadhistory w0  (cost=0.57..4274.30 rows=1 width=8) (actual
> time=46.678..51.232 rows=44 loops=1)
>                            Index Cond: ((cancel_event_id IS NOT NULL) AND
> (cancel_event_type = 1))
>                            Filter: ((status_id = 93) AND
> ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date <=
> '2024-08-07'::date))
>                            Rows Removed by Filter: 22268
>                            Buffers: shared hit=9170 read=19

If the planner had estimated 40-some rows out of this step, rather
than one, it would certainly not have chosen to use nestloop joins
atop this.  So the big problem to focus on is making that estimate
better.

A secondary problem is that the choice of index seems poor: the
index itself is selecting 44+22268 = 22312 rows and then the filter
condition is throwing away 99.8% of those rows.  Probably, using
an index on (status_id, followup_date) would have worked better.

I suspect that both of these things are tied to the non-normalization
of your "cancel" condition.  The planner probably believes that
"cancel_event_id IS NOT NULL" is statistically independent of
"cancel_event_type = 1"; but I'll bet it isn't, and thus the index
condition selects many more rows than the planner guessed.  You might
be able to improve that estimate by creating extended stats on both of
those columns, but really a better idea would be to take a step back
and figure out if those two columns can't be merged into one.

			regards, tom lane

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Problem with a Query
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox