MIME-Version: 1.0
References: <CAG-eXHJ+KbQ8_k-jKSGZU9V6HkLKU2Jqz7nYMYGhHuC-Zqm7qQ@mail.gmail.com>
 <CAGsyd8WqPEgoAkNO0Q7rpQpOWOZ-Z6wCM7xh5d6nXCxLH_GM_A@mail.gmail.com> <CAFL4M8EmboE4wXBHe2EMFcShxUAxXgWFa4TT-iVD2hJcHumetg@mail.gmail.com>
In-Reply-To: <CAFL4M8EmboE4wXBHe2EMFcShxUAxXgWFa4TT-iVD2hJcHumetg@mail.gmail.com>
From: David Mullineux <dmullx@gmail.com>
Date: Fri, 8 Nov 2024 13:39:02 +0000
Message-ID: <CAGsyd8X7U07UK8hjapwYBfbtK0KnMSxLtH6BFaxe1_i2=BR-+A@mail.gmail.com>
Subject: Re: Performance Issue with Hash Partition Query Execution in
 PostgreSQL 16
To: ravi k <ravisql09@gmail.com>
Cc: Ramakrishna m <ram.pgdb@gmail.com>, pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000005c87ed062666e0cd"
Archived-At: <https://www.postgresql.org/message-id/CAGsyd8X7U07UK8hjapwYBfbtK0KnMSxLtH6BFaxe1_i2%3DBR-%2BA%40mail.gmail.com>
Precedence: bulk

--0000000000005c87ed062666e0cd
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Just spotted a potential problem. The indexed column is a bigint. Are you,
in your prepared statement passing a string or a big int ?
I notice your plan is doing an implicit type conversion when you run it
manually.
Sometimes the wrong type will make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <ravisql09@gmail.com> wrote:

> Hi ,
>
> Thanks for the suggestions.
>
> Two more observations:
>
> 1) no sequence scan noticed from pg_stat_user_tables ( hope stats are
> accurate in postgres 16) if parameter sniffing happens the possibility of
> going to  sequence scan is more right.
>
> 2) no blockings or IO issue during the time.
>
> 3) even with limit clause if touch all partitions also it could have been
> completed in milliseconds as this is just one record.
>
> 4) auto_explain in prod we cannot enable as this is expensive and with
> high TPS we may face latency issues and lower environment this issue cann=
ot
> be reproduced,( this is happening out of Million one case)
>
> This looks puzzle to us, just in case anyone experianced pls share your
> experience.
>
> Regards,
> Ravi
>
> On Thu, 7 Nov, 2024, 3:41=E2=80=AFam David Mullineux, <dmullx@gmail.com> =
wrote:
>
>> It might be worth eliminating the use of cached plans here. Is your app
>> using prepared statements at all?
>> Point is that if the optimizer sees the same prepared query , 5 times,
>> the  it locks the plan that it found at that time. This is a good trade =
off
>> as it avoids costly planning-time for repetitive queries. But if you are
>> manually querying, the  a custom plan will be generated  anew.
>> A quick analyze of the table should reset the stats and invalidate any
>> cached plans.
>> This may not be your problem  just worth eliminating it from the list of
>> potential causes.
>>
>> On Wed, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com> wrote:
>>
>>> Hi Team,
>>>
>>> One of the queries, which retrieves a single record from a table with 1=
6
>>> hash partitions, is taking more than 10 seconds to execute. In contrast=
,
>>> when we run the same query manually, it completes within milliseconds. =
This
>>> issue is causing exhaustion of the application pools. Do we have any bu=
gs
>>> in postgrs16 hash partitions? Please find the attached log, table, and
>>> execution plan.
>>>
>>> size of the each partitions : 300GB
>>> Index Size : 12GB
>>>
>>> Postgres Version : 16.x
>>> Shared Buffers : 75 GB
>>> Effective_cache :  175 GB
>>> Work _mem : 4MB
>>> Max_connections : 3000
>>>
>>> OS  : Ubuntu 22.04
>>> Ram : 384 GB
>>> CPU : 64
>>>
>>> Please let us know if you need any further information or if there are
>>> additional details required.
>>>
>>>
>>> Regards,
>>> Ram.
>>>
>>

--0000000000005c87ed062666e0cd
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">Just spotted a potential problem. The indexed column is a=
 bigint. Are you, in your prepared statement passing a string or a big int =
?<div dir=3D"auto">I notice your plan is doing an implicit type conversion =
when you run it manually.</div><div dir=3D"auto">Sometimes the wrong type w=
ill make it not use the index.</div></div><br><div class=3D"gmail_quote"><d=
iv dir=3D"ltr" class=3D"gmail_attr">On Fri, 8 Nov 2024, 03:07 ravi k, &lt;<=
a href=3D"mailto:ravisql09@gmail.com">ravisql09@gmail.com</a>&gt; wrote:<br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"auto">Hi ,<div dir=3D"aut=
o"><br></div><div dir=3D"auto">Thanks for the suggestions.</div><div dir=3D=
"auto"><br></div><div dir=3D"auto">Two more observations:</div><div dir=3D"=
auto"><br></div><div dir=3D"auto">1) no sequence scan noticed from pg_stat_=
user_tables ( hope stats are accurate in postgres 16) if parameter sniffing=
 happens the possibility of going to=C2=A0 sequence scan is more right.</di=
v><div dir=3D"auto"><br></div><div dir=3D"auto">2) no blockings or IO issue=
 during the time.</div><div dir=3D"auto"><br></div><div dir=3D"auto">3) eve=
n with limit clause if touch all partitions also it could have been complet=
ed in milliseconds as this is just one record.</div><div dir=3D"auto"><br><=
/div><div dir=3D"auto">4) auto_explain in prod we cannot enable as this is =
expensive and with high TPS we may face latency issues and lower environmen=
t this issue cannot be reproduced,( this is happening out of Million one ca=
se)</div><div dir=3D"auto"><br></div><div dir=3D"auto">This looks puzzle to=
 us, just in case anyone experianced pls share your experience.</div><div d=
ir=3D"auto"><br></div><div dir=3D"auto">Regards,</div><div dir=3D"auto">Rav=
i</div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail=
_attr">On Thu, 7 Nov, 2024, 3:41=E2=80=AFam David Mullineux, &lt;<a href=3D=
"mailto:dmullx@gmail.com" target=3D"_blank" rel=3D"noreferrer">dmullx@gmail=
.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"mar=
gin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"aut=
o">It might be worth eliminating the use of cached plans here. Is your app =
using prepared statements at all?=C2=A0=C2=A0<div dir=3D"auto">Point is tha=
t if the optimizer sees the same prepared query , 5 times, the=C2=A0 it loc=
ks the plan that it found at that time. This is a good trade off as it avoi=
ds costly planning-time for repetitive queries. But if you are manually que=
rying, the=C2=A0 a custom plan will be generated=C2=A0 anew.</div><div dir=
=3D"auto">A quick analyze of the table should reset the stats and invalidat=
e any cached plans.</div><div dir=3D"auto">This may not be your problem=C2=
=A0 just worth eliminating it from the list of potential causes.</div></div=
><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On We=
d, 6 Nov 2024, 17:14 Ramakrishna m, &lt;<a href=3D"mailto:ram.pgdb@gmail.co=
m" rel=3D"noreferrer noreferrer" target=3D"_blank">ram.pgdb@gmail.com</a>&g=
t; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Hi =
Team,</div><div><div><div><div><div><div></div></div></div></div></div><div=
><div><div><div dir=3D"auto"><div><div><p>One of the queries, which retriev=
es a single record from a table with 16 hash partitions, is taking more tha=
n 10 seconds to execute. In contrast, when we run the same query manually, =
it completes within milliseconds. This issue is causing exhaustion of the a=
pplication pools.=C2=A0Do we have any bugs in postgrs16 hash partitions? Pl=
ease find the attached log, table, and execution plan.=C2=A0</p><p><font fa=
ce=3D"arial, sans-serif">size of the each partitions : 300GB=C2=A0<br>Index=
 Size : 12GB</font></p><p><span style=3D"font-family:arial,sans-serif">Post=
gres Version : 16.x</span><font face=3D"arial, sans-serif"><br></font><span=
 style=3D"font-family:arial,sans-serif">Shared Buffers : 75 GB</span><font =
face=3D"arial, sans-serif"><br></font><span style=3D"font-family:arial,sans=
-serif">Effective_cache :=C2=A0 175 GB</span><font face=3D"arial, sans-seri=
f"><br></font><span style=3D"font-family:arial,sans-serif">Work _mem : 4MB<=
/span><font face=3D"arial, sans-serif"><br></font><span style=3D"font-famil=
y:arial,sans-serif">Max_connections : 3000</span><font face=3D"arial, sans-=
serif"></font></p><p><span style=3D"font-family:arial,sans-serif">OS=C2=A0 =
:=C2=A0Ubuntu 22.04</span><br style=3D"font-family:arial,sans-serif"><span =
style=3D"font-family:arial,sans-serif">Ram : 384 GB</span><br style=3D"font=
-family:arial,sans-serif"><span style=3D"font-family:arial,sans-serif">CPU =
: 64</span><font face=3D"arial, sans-serif"></font></p><p>Please let us kno=
w if you need any further information or if there are additional details re=
quired.=C2=A0=C2=A0</p><p><br></p></div></div></div></div></div></div></div=
><div>Regards,</div><div dir=3D"ltr" class=3D"gmail_signature" data-smartma=
il=3D"gmail_signature"><div dir=3D"ltr"><div>Ram.<br></div></div></div></di=
v>
</blockquote></div>
</blockquote></div>
</blockquote></div>

--0000000000005c87ed062666e0cd--