MIME-Version: 1.0
From: Malay Keshav <malay.keshav@databricks.com>
Date: Thu, 29 Sep 2022 11:33:54 -0700
Message-ID: 
 <CAJzqzvo+zATPZTtmFK0gjRQUftOuyMr6mB_bT_Pk+T2QNiSdrw@mail.gmail.com>
Subject: [Bug][Ver 11]: Generic query plan selected is worse than custom query
 plan
To: pgsql-bugs@lists.postgresql.org, pgsql-sql@lists.postgresql.org
Cc: "malay.keshav@gmail.com" <malay.keshav@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000004ef49005e9d51f78"
Archived-At: 
 <https://www.postgresql.org/message-id/CAJzqzvo%2BzATPZTtmFK0gjRQUftOuyMr6mB_bT_Pk%2BT2QNiSdrw%40mail.gmail.com>
Precedence: bulk

--0000000000004ef49005e9d51f78
Content-Type: text/plain; charset="UTF-8"

Hi,

We are using Postgres 11.13 for our company's critical database. However,
recently after the addition of an index to a table, we found significant
degradation in a specific query's execution time.

We found that Postgres11 caches a generic execution plan for a
parameterized query on the 6th execution of the query based on some
heuristic comparison b/w the generic plan and the custom plan for that
query.

In our particular case, the Postgres engine decided to pick the generic
query plan and cache it for all further calls with that query. My
understanding was that the generic query plan would only be selected if it
had a better execution time than the custom query plan. Which in our case
is not true.

We were able to reproduce this deterministically using the same query
parameters to trigger the engine to pick the bad generic query plan on the
6th run (first 5 runs shows the engine used the efficient query plan). Why
does the engine pick the generic query plan when its execution time is
worse than the custom query plan? Is this a bug?

We have run vacuum analyze, created new tables from existing data, etc but
the problem still persisted. Funny thing is, this only happens in one of
the many deployed regions suggesting it has to do with the data
distribution of that region. We were also able to trick the postgres engine
into not caching the generic plan and to always go for the custom query
plan on each execution. We did this by formulating a query that on the 6th
execution would trigger the heuristic to pick the custom plan. However,
this is not a scalable or practical solution with the 100s of queries we
run against the database - finding a query that can trick the engine into
selecting the custom query plan.

What are our options other than upgrading to Postgres 12 which provides a
configuration to override and use a custom query plan on every execution?

All best,
Malay Keshav

--0000000000004ef49005e9d51f78
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br><div>We are using Postgres 11.13 for our compa=
ny&#39;s critical database. However, recently after the addition of an inde=
x to a table, we found significant degradation in a specific query&#39;s ex=
ecution time.</div><div><br></div><div>We found that Postgres11 caches a ge=
neric execution plan for a parameterized query on the 6th execution of the =
query based on some heuristic comparison b/w the generic plan and the custo=
m plan for that query.=C2=A0</div></div><div><br></div><div>In our particul=
ar case, the Postgres engine decided to pick the generic query plan and cac=
he it for all further calls with that query. My understanding was that the =
generic query plan would only be selected if it had a better execution time=
 than the custom query plan. Which in our case is not true.</div><div><br><=
/div><div>We were able to reproduce this deterministically using the same q=
uery parameters to trigger the engine to pick the bad generic query plan on=
 the 6th run (first 5 runs shows the engine used the efficient query plan).=
  Why does the engine pick the generic query plan when its execution time i=
s worse than the custom query plan? Is this a bug?=C2=A0</div><div><br></di=
v><div>We have run vacuum analyze, created new tables from existing data, e=
tc but the problem still persisted. Funny thing is, this only happens in on=
e of the many deployed regions suggesting it has to do with the data distri=
bution of that region. We were also able to trick the postgres engine into =
not caching the generic plan and to always go for the custom query plan on =
each execution. We did this by formulating a query that on the 6th executio=
n would trigger the heuristic to pick the custom plan. However, this is not=
 a scalable or practical solution with the 100s of queries we run against t=
he database - finding a query that can trick the engine into selecting the =
custom query plan.<br></div><div><br></div><div>What are our options other =
than upgrading to Postgres 12 which provides a configuration to override an=
d use a custom query plan on every execution?<br></div><div><br></div><div>=
All best,</div><div>Malay Keshav</div></div>

--0000000000004ef49005e9d51f78--