Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1siWzn-005lvM-Q8 for pgsql-general@arkaria.postgresql.org; Mon, 26 Aug 2024 10:30:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1siWzl-007t7P-Vg for pgsql-general@arkaria.postgresql.org; Mon, 26 Aug 2024 10:30:46 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1siWzE-007olI-Ml for pgsql-general@lists.postgresql.org; Mon, 26 Aug 2024 10:30:13 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1siWzB-001Z5v-Ir for pgsql-general@lists.postgresql.org; Mon, 26 Aug 2024 10:30:12 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-5bf006f37daso6836394a12.1 for ; Mon, 26 Aug 2024 03:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724668210; x=1725273010; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yKlyQ3UkchtjSUi/Cdi7c7KGeGLxmxC7vvr2tGvsDfQ=; b=bRZSxcWYOxPEdwxgQxXK2otbIRxb3a49YMq+Jajg5wu4gvVXJIN/Dwk9mVadBJNrxp tfFYWZvOPqaQwJmlMzctPuwzCuCVfldalHIQHjJbHIU+ilivzgSYGvLldr3qFEfXQvsc qduXF9CB+GgCQcAenDqeveIJ4sduEQBX7JkEsElKk9Fz4yhC52aZzZvgnkZCf5P9CYdz F+P8jzHgcI1QJ4CZrp+TMeI9ujCoETPZnHHj8khT/xOORITsbZ7eqK5SMgxhRTa7mBEC zL9EXeD502UTWUxE54FU+cjmAdrQJaUcyq23fjwPbr8Fvvuaerc7WJpxKjOrZGvzd3cE +h0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724668210; x=1725273010; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yKlyQ3UkchtjSUi/Cdi7c7KGeGLxmxC7vvr2tGvsDfQ=; b=ouS7qR0dMDwI94n1ViBFDBIaGe/+6PdcGI5ic3NNKyuha5vFzTBxeJ8C4zGu0H8f4J HSHbFZDSZITzEkcCgvAhj89cRMio1OJ+/wwnzhD4Pzy8QvjVgFRMdLDrbKvlsX3HShV+ pj7awUblBhfJsPQ4JgAQ4QO9ropMXcquF1NQBS5XISkfgK1vUiaNYONPYFKyxBxmEkda SDGwxjme022oNIBn8Gghyhv++Mew8MU7AGjxvhs+1kXEbd3hQVmrcPAYejDbu9c7UN83 KqSweux5KRHaVHTO8sq7S5S4aK60awXO4tng54msfBGKsZSw0rCq1tG7lq/GelJac929 MvBA== X-Gm-Message-State: AOJu0YywS7YzcppEDF/vbKOU6peGThalnC6cTx1ioa5spRqVQPJVd9Fm z+HB2VEiOfP+LMYDq4Ys6IBQjGkPnzYFxtV1Kt4jol2vfhWTEwv0xwGnxn18a+mWu4Tl8eP+ceo E+y1kD0IVhwdJAaaGpMK3Hyhhh5c= X-Google-Smtp-Source: AGHT+IHF1cXZRycs73Fr8aJtdV1CM7roNURmZtGTIfdFzgM4X3DNNOgNFIgNr/FQeQIGkaRz9V7dRh4mn+q9XfjXE4Y= X-Received: by 2002:a05:6402:2345:b0:5be:e9f8:9ba4 with SMTP id 4fb4d7f45d1cf-5c088114571mr11182721a12.4.1724668209366; Mon, 26 Aug 2024 03:30:09 -0700 (PDT) MIME-Version: 1.0 References: <1155364.1723523231@sss.pgh.pa.us> In-Reply-To: <1155364.1723523231@sss.pgh.pa.us> From: Siraj G Date: Mon, 26 Aug 2024 15:59:57 +0530 Message-ID: Subject: Re: Problem with a Query To: Tom Lane Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000f98d500620939b19" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f98d500620939b19 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks Tom. Collecting full stats on the tables involved corrected the execution. On Tue, Aug 13, 2024 at 9:57=E2=80=AFAM Tom Lane wrote: > Siraj G writes: > > We migrated a PgSQL database from Cloud SQL to compute engine and since > > then there is a SQL we observed taking a long time. After some study, I > > found that the SQL is using NESTED LOOP where the cost is too high. > > The core of your problem seems to be here: > > > -> Index Scan using marketing_a_cancel__55ffff_id= x > on > > marketing_app_leadhistory w0 (cost=3D0.57..4274.30 rows=3D1 width=3D8)= (actual > > time=3D46.678..51.232 rows=3D44 loops=3D1) > > Index Cond: ((cancel_event_id IS NOT NULL) A= ND > > (cancel_event_type =3D 1)) > > Filter: ((status_id =3D 93) AND > > ((followup_date)::date >=3D '2024-08-01'::date) AND ((followup_date)::d= ate > <=3D > > '2024-08-07'::date)) > > Rows Removed by Filter: 22268 > > Buffers: shared hit=3D9170 read=3D19 > > If the planner had estimated 40-some rows out of this step, rather > than one, it would certainly not have chosen to use nestloop joins > atop this. So the big problem to focus on is making that estimate > better. > > A secondary problem is that the choice of index seems poor: the > index itself is selecting 44+22268 =3D 22312 rows and then the filter > condition is throwing away 99.8% of those rows. Probably, using > an index on (status_id, followup_date) would have worked better. > > I suspect that both of these things are tied to the non-normalization > of your "cancel" condition. The planner probably believes that > "cancel_event_id IS NOT NULL" is statistically independent of > "cancel_event_type =3D 1"; but I'll bet it isn't, and thus the index > condition selects many more rows than the planner guessed. You might > be able to improve that estimate by creating extended stats on both of > those columns, but really a better idea would be to take a step back > and figure out if those two columns can't be merged into one. > > regards, tom lane > --000000000000f98d500620939b19 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Tom. Collecting full stats on the tables involved c= orrected=C2=A0the execution.

On Tue, Aug 13, 2024 at 9:57=E2=80=AFAM Tom L= ane <tgl@sss.pgh.pa.us> wrot= e:
Siraj G <<= a href=3D"mailto:tosiraj.g@gmail.com" target=3D"_blank">tosiraj.g@gmail.com= > writes:
> We migrated a PgSQL database from Cloud SQL to compute engine and sinc= e
> then there is a SQL we observed taking a long time. After some study, = I
> found that the SQL is using NESTED LOOP where the cost is too high.
The core of your problem seems to be here:

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 ->=C2=A0 Index Scan using marketing_a_cancel__55ffff_idx on
> marketing_app_leadhistory w0=C2=A0 (cost=3D0.57..4274.30 rows=3D1 widt= h=3D8) (actual
> time=3D46.678..51.232 rows=3D44 loops=3D1)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: ((cancel_event_id IS NOT NULL) AND<= br> > (cancel_event_type =3D 1))
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: ((status_id =3D 93) AND
> ((followup_date)::date >=3D '2024-08-01'::date) AND ((follo= wup_date)::date <=3D
> '2024-08-07'::date))
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Rows Removed by Filter: 22268
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 Buffers: shared hit=3D9170 read=3D19

If the planner had estimated 40-some rows out of this step, rather
than one, it would certainly not have chosen to use nestloop joins
atop this.=C2=A0 So the big problem to focus on is making that estimate
better.

A secondary problem is that the choice of index seems poor: the
index itself is selecting 44+22268 =3D 22312 rows and then the filter
condition is throwing away 99.8% of those rows.=C2=A0 Probably, using
an index on (status_id, followup_date) would have worked better.

I suspect that both of these things are tied to the non-normalization
of your "cancel" condition.=C2=A0 The planner probably believes t= hat
"cancel_event_id IS NOT NULL" is statistically independent of
"cancel_event_type =3D 1"; but I'll bet it isn't, and thu= s the index
condition selects many more rows than the planner guessed.=C2=A0 You might<= br> be able to improve that estimate by creating extended stats on both of
those columns, but really a better idea would be to take a step back
and figure out if those two columns can't be merged into one.

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 regards, tom lane
--000000000000f98d500620939b19--