public inbox for [email protected]
help / color / mirror / Atom feedFrom: Siraj G <[email protected]>
To: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: Problem with a Query
Date: Mon, 26 Aug 2024 15:59:57 +0530
Message-ID: <CAC5iy62+3TSUo_wys0izZsN=LL2SqQMwj_NiYY4SOq9_hrd=KA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAC5iy610q5vrvmcU88CTEP7367CJVYagDmpYnKZ9G08uQf9ZUg@mail.gmail.com>
<[email protected]>
Thanks Tom. Collecting full stats on the tables involved corrected the
execution.
On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <[email protected]> wrote:
> Siraj G <[email protected]> writes:
> > We migrated a PgSQL database from Cloud SQL to compute engine and since
> > then there is a SQL we observed taking a long time. After some study, I
> > found that the SQL is using NESTED LOOP where the cost is too high.
>
> The core of your problem seems to be here:
>
> > -> Index Scan using marketing_a_cancel__55ffff_idx
> on
> > marketing_app_leadhistory w0 (cost=0.57..4274.30 rows=1 width=8) (actual
> > time=46.678..51.232 rows=44 loops=1)
> > Index Cond: ((cancel_event_id IS NOT NULL) AND
> > (cancel_event_type = 1))
> > Filter: ((status_id = 93) AND
> > ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date
> <=
> > '2024-08-07'::date))
> > Rows Removed by Filter: 22268
> > Buffers: shared hit=9170 read=19
>
> If the planner had estimated 40-some rows out of this step, rather
> than one, it would certainly not have chosen to use nestloop joins
> atop this. So the big problem to focus on is making that estimate
> better.
>
> A secondary problem is that the choice of index seems poor: the
> index itself is selecting 44+22268 = 22312 rows and then the filter
> condition is throwing away 99.8% of those rows. Probably, using
> an index on (status_id, followup_date) would have worked better.
>
> I suspect that both of these things are tied to the non-normalization
> of your "cancel" condition. The planner probably believes that
> "cancel_event_id IS NOT NULL" is statistically independent of
> "cancel_event_type = 1"; but I'll bet it isn't, and thus the index
> condition selects many more rows than the planner guessed. You might
> be able to improve that estimate by creating extended stats on both of
> those columns, but really a better idea would be to take a step back
> and figure out if those two columns can't be merged into one.
>
> regards, tom lane
>
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Problem with a Query
In-Reply-To: <CAC5iy62+3TSUo_wys0izZsN=LL2SqQMwj_NiYY4SOq9_hrd=KA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox