Re: Problem with a Query

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Siraj G <[email protected]>
To: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: Problem with a Query
Date: Mon, 26 Aug 2024 15:59:57 +0530
Message-ID: <CAC5iy62+3TSUo_wys0izZsN=LL2SqQMwj_NiYY4SOq9_hrd=KA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAC5iy610q5vrvmcU88CTEP7367CJVYagDmpYnKZ9G08uQf9ZUg@mail.gmail.com>
	<[email protected]>

Thanks Tom. Collecting full stats on the tables involved corrected the
execution.

On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <[email protected]> wrote:

> Siraj G <[email protected]> writes:
> > We migrated a PgSQL database from Cloud SQL to compute engine and since
> > then there is a SQL we observed taking a long time. After some study, I
> > found that the SQL is using NESTED LOOP where the cost is too high.
>
> The core of your problem seems to be here:
>
> >                      ->  Index Scan using marketing_a_cancel__55ffff_idx
> on
> > marketing_app_leadhistory w0  (cost=0.57..4274.30 rows=1 width=8) (actual
> > time=46.678..51.232 rows=44 loops=1)
> >                            Index Cond: ((cancel_event_id IS NOT NULL) AND
> > (cancel_event_type = 1))
> >                            Filter: ((status_id = 93) AND
> > ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date
> <=
> > '2024-08-07'::date))
> >                            Rows Removed by Filter: 22268
> >                            Buffers: shared hit=9170 read=19
>
> If the planner had estimated 40-some rows out of this step, rather
> than one, it would certainly not have chosen to use nestloop joins
> atop this.  So the big problem to focus on is making that estimate
> better.
>
> A secondary problem is that the choice of index seems poor: the
> index itself is selecting 44+22268 = 22312 rows and then the filter
> condition is throwing away 99.8% of those rows.  Probably, using
> an index on (status_id, followup_date) would have worked better.
>
> I suspect that both of these things are tied to the non-normalization
> of your "cancel" condition.  The planner probably believes that
> "cancel_event_id IS NOT NULL" is statistically independent of
> "cancel_event_type = 1"; but I'll bet it isn't, and thus the index
> condition selects many more rows than the planner guessed.  You might
> be able to improve that estimate by creating extended stats on both of
> those columns, but really a better idea would be to take a step back
> and figure out if those two columns can't be merged into one.
>
>                         regards, tom lane
>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Problem with a Query
  In-Reply-To: <CAC5iy62+3TSUo_wys0izZsN=LL2SqQMwj_NiYY4SOq9_hrd=KA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox