Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1siZfw-006Tba-QW for pgsql-general@arkaria.postgresql.org; Mon, 26 Aug 2024 13:22:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1siZfu-00A6W6-FU for pgsql-general@arkaria.postgresql.org; Mon, 26 Aug 2024 13:22:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1siZfu-00A6Vy-3a for pgsql-general@lists.postgresql.org; Mon, 26 Aug 2024 13:22:26 +0000 Received: from mail-oo1-xc29.google.com ([2607:f8b0:4864:20::c29]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1siZfr-001abw-8i for pgsql-general@lists.postgresql.org; Mon, 26 Aug 2024 13:22:26 +0000 Received: by mail-oo1-xc29.google.com with SMTP id 006d021491bc7-5d5b1e33fa8so3203356eaf.3 for ; Mon, 26 Aug 2024 06:22:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724678543; x=1725283343; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=ZvL5PB4t61yPxZxVU1dLwScuRGxN+KyOk7HriyKbaBs=; b=FrlCYCM0RaipKi4QV7MK862LtgqlqQVScy7eZVSpYmsJJKY3w+ZgGOoXQFZJuOzvCx SWDiM1+9319vkAm7Usj8BjubgR63CRHvtQNGBzHzLaC+GOF5j2T/trKat27ILBxghDCm 6TJ1i7qe0ZwfnXo9iDtqvFCpzrVMVrHVJ09j5tDqI6LxjroYx1qsnQ3hwmDXKR5oOTrJ k4DEe+gU823/gbyMDsVLEvv47/eXksmvUg2kiM8JcM1MQA7FSiJ5yqhldZkSMFw1JopC 5l1Ixaxp4CTbuzDTYRnvSdyrv2TJnS8ZVXEr4jX4wUAkScNuVd7QZzdjqlGhIM8k418b eaSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724678543; x=1725283343; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZvL5PB4t61yPxZxVU1dLwScuRGxN+KyOk7HriyKbaBs=; b=ERJ/Z62I2o9EUu8j+otHfHW1Bma5HlvqKB+eubYxsvvaN0RlBE94HrRQPHWYGwSh3I FofEBlXWGkdMbGD0HGe+iRIM2j6fQD7Cjy5SIR0owyCNMlV/G9Zv04+Voho4WyG5zfa9 pa4bJAsE9H6/iZlpqhwlQby5uXNRs1+0D45AxhoVyCAFPXT5hYOTjzrm5dhDYtzqwTtu sjnz+MiZRozUY1LfrpebxhjL1mn3j3dGNEFIfEX0wy/d/xNkNZ/n+DuRse4+yeO87F/6 lHEEo39d7PtZ3CFzZBbAfOKSwr1ItI9yyPD3Ws51S8xy9P0bn/nhLxwcReynt22UOry4 zsKQ== X-Gm-Message-State: AOJu0YwQW74714Ka6FN8vde1M9AA4DL3N8ixVKEkowIWhXjhXC5N5gAN IQJMJRJk/tDfaS9JbOmnYCTZXqWv8DzovZ/Q5dV6rjQegMuT5NbtcDiY0JOC1qGjQ1rGMLf8Pw8 C6ZIHs27Q7ma7vcTjsbTxMmbhMSJxgafY X-Google-Smtp-Source: AGHT+IEmBeCsTVMLgKd2MIMH5VLpObOqRjw1UTVWroE9SJ/f6jrlVBXD9M533DgwOWsZ3UTdAKZnRB9iMBkplL24vLo= X-Received: by 2002:a4a:e8c2:0:b0:5dc:f89e:d77f with SMTP id 006d021491bc7-5dcf89ed7d5mr4774899eaf.7.1724678542749; Mon, 26 Aug 2024 06:22:22 -0700 (PDT) MIME-Version: 1.0 References: <1155364.1723523231@sss.pgh.pa.us> In-Reply-To: From: Ron Johnson Date: Mon, 26 Aug 2024 09:22:11 -0400 Message-ID: Subject: Re: Problem with a Query To: "pgsql-generallists.postgresql.org" Content-Type: multipart/alternative; boundary="000000000000e478810620960357" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e478810620960357 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Aggressive autoanalyze and autovacuum settings solve most query problems. These are my settings: default_statistics_target =3D 5000 autovacuum_vacuum_scale_factor =3D 0.015 autovacuum_vacuum_threshold =3D 250 autovacuum_analyze_scale_factor =3D 0.015 autovacuum_analyze_threshold =3D 250 Such a high default_statistics_target value is controversial, but works for our databases, and resetting it to 100 doesn't noticably speed up slow parse/optimize on queries that take a long time to parse/optimize any more than the 5000 value. On Mon, Aug 26, 2024 at 6:30=E2=80=AFAM Siraj G wrote= : > Thanks Tom. Collecting full stats on the tables involved corrected the > execution. > > On Tue, Aug 13, 2024 at 9:57=E2=80=AFAM Tom Lane wrot= e: > >> Siraj G writes: >> > We migrated a PgSQL database from Cloud SQL to compute engine and sinc= e >> > then there is a SQL we observed taking a long time. After some study, = I >> > found that the SQL is using NESTED LOOP where the cost is too high. >> >> The core of your problem seems to be here: >> >> > -> Index Scan using >> marketing_a_cancel__55ffff_idx on >> > marketing_app_leadhistory w0 (cost=3D0.57..4274.30 rows=3D1 width=3D8= ) >> (actual >> > time=3D46.678..51.232 rows=3D44 loops=3D1) >> > Index Cond: ((cancel_event_id IS NOT NULL) >> AND >> > (cancel_event_type =3D 1)) >> > Filter: ((status_id =3D 93) AND >> > ((followup_date)::date >=3D '2024-08-01'::date) AND >> ((followup_date)::date <=3D >> > '2024-08-07'::date)) >> > Rows Removed by Filter: 22268 >> > Buffers: shared hit=3D9170 read=3D19 >> >> If the planner had estimated 40-some rows out of this step, rather >> than one, it would certainly not have chosen to use nestloop joins >> atop this. So the big problem to focus on is making that estimate >> better. >> >> A secondary problem is that the choice of index seems poor: the >> index itself is selecting 44+22268 =3D 22312 rows and then the filter >> condition is throwing away 99.8% of those rows. Probably, using >> an index on (status_id, followup_date) would have worked better. >> >> I suspect that both of these things are tied to the non-normalization >> of your "cancel" condition. The planner probably believes that >> "cancel_event_id IS NOT NULL" is statistically independent of >> "cancel_event_type =3D 1"; but I'll bet it isn't, and thus the index >> condition selects many more rows than the planner guessed. You might >> be able to improve that estimate by creating extended stats on both of >> those columns, but really a better idea would be to take a step back >> and figure out if those two columns can't be merged into one. >> >> regards, tom lane >> > --=20 Death to America, and butter sauce. Iraq lobster! --000000000000e478810620960357 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Aggressive=C2=A0autoanalyz= e=C2=A0and autovacuum settings solve most query problems.=C2=A0 These are m= y settings:
default_statistics_target = =3D 5000
autovacuum_vacuum_sc= ale_factor =3D 0.015
autovacuum_vacuum_threshold =3D 250
autovacuum_a= nalyze_scale_factor =3D 0.015
autovacuum_analyze_threshold =3D 250

Such a high default_statistics_target value = is controversial, but works for our databases, and resetting it to 100 does= n't noticably speed up slow parse/optimize on queries that take a long = time to parse/optimize any more than the 5000 value.

On Mon, = Aug 26, 2024 at 6:30=E2=80=AFAM Siraj G <tosiraj.g@gmail.com> wrote:
Thanks Tom. Collecting full sta= ts on the tables involved corrected=C2=A0the execution.

On Tue, Aug 13, 2024= at 9:57=E2=80=AFAM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Siraj G <tosiraj.g@gmail.com> writes:
> We migrated a PgSQL database from Cloud SQL to compute engine and sinc= e
> then there is a SQL we observed taking a long time. After some study, = I
> found that the SQL is using NESTED LOOP where the cost is too high.
The core of your problem seems to be here:

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 ->=C2=A0 Index Scan using marketing_a_cancel__55ffff_idx on
> marketing_app_leadhistory w0=C2=A0 (cost=3D0.57..4274.30 rows=3D1 widt= h=3D8) (actual
> time=3D46.678..51.232 rows=3D44 loops=3D1)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: ((cancel_event_id IS NOT NULL) AND<= br> > (cancel_event_type =3D 1))
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: ((status_id =3D 93) AND
> ((followup_date)::date >=3D '2024-08-01'::date) AND ((follo= wup_date)::date <=3D
> '2024-08-07'::date))
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Rows Removed by Filter: 22268
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Buffers: shared hit=3D9170 read=3D19

If the planner had estimated 40-some rows out of this step, rather
than one, it would certainly not have chosen to use nestloop joins
atop this.=C2=A0 So the big problem to focus on is making that estimate
better.

A secondary problem is that the choice of index seems poor: the
index itself is selecting 44+22268 =3D 22312 rows and then the filter
condition is throwing away 99.8% of those rows.=C2=A0 Probably, using
an index on (status_id, followup_date) would have worked better.

I suspect that both of these things are tied to the non-normalization
of your "cancel" condition.=C2=A0 The planner probably believes t= hat
"cancel_event_id IS NOT NULL" is statistically independent of
"cancel_event_type =3D 1"; but I'll bet it isn't, and thu= s the index
condition selects many more rows than the planner guessed.=C2=A0 You might<= br> be able to improve that estimate by creating extended stats on both of
those columns, but really a better idea would be to take a step back
and figure out if those two columns can't be merged into one.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 regards, tom lane


--
Death to America, and butter sauce.
Iraq lobster!
<= /div>
--000000000000e478810620960357--