Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t9PCt-009v4x-Pu for pgsql-general@arkaria.postgresql.org; Fri, 08 Nov 2024 13:39:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t9PCq-009Kxg-H1 for pgsql-general@arkaria.postgresql.org; Fri, 08 Nov 2024 13:39:21 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t9PCq-009Ks2-1m for pgsql-general@lists.postgresql.org; Fri, 08 Nov 2024 13:39:20 +0000 Received: from mail-oi1-x236.google.com ([2607:f8b0:4864:20::236]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t9PCj-000oJp-TC for pgsql-general@lists.postgresql.org; Fri, 08 Nov 2024 13:39:19 +0000 Received: by mail-oi1-x236.google.com with SMTP id 5614622812f47-3e607556c83so1380319b6e.1 for ; Fri, 08 Nov 2024 05:39:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731073153; x=1731677953; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=V2KbEk21rA/I+cC6PgEe2hcjYbq7LaYbx3LXududbRI=; b=HbxQJHPfC6/TLWYB8xR77Z4Xj0ZKGpl+eFcgnMlDnopVoqekllj9cc1mBgJi2wMNTo TJkZzpvHq5flLCoFwXcLbB1IdiV7f46lwPg5Y6mCob0hW/gSuQqrTt+yDYrRWI/JlxFw cK93CHKkvyhbyo7LW2YM7G5ipZ6UQUf29swX7kONEZhCNCr+zrMSBYg6VJ4WEOefDVWG IyOV+rXLT3ENnh3lHeCZS2bDqsGVBE4NiBwO2ZViltNTai7l8m4mBh6MeAvxp6p522hi DJ8zz+2gWJlVBr2kIQu6wX3epVC2+OSax/X28xuugLVmSy1nhAPL8Wi3xZcGjpuTVcpW 8lEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731073153; x=1731677953; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=V2KbEk21rA/I+cC6PgEe2hcjYbq7LaYbx3LXududbRI=; b=N1mYZZ3YUU7Sb7EJcV4JsRvYCLdQSE3RuKht4i7VbFowGi1KYKofNEg+1bWG2hz0nj 29L8qg044tnuj+HU81zXZsu6tONxuIUm/MRu8CNnV+OgOOJzVHhfyMOTzMX9mMz6fd08 KRb2qJpN4FBRIC5eM1DIiKNkeDA8z+N5+WMBLGBCrIEgwAwtqJNJG7x2Ck74/LmMig9D kU3BzsvXo+c2pRER0JhVoVodIgugUavBm9//FcwLMD2zPt2HhidhS/NAkU9tvdbMNvOg tirUUwLlhSm69rtVKONBAtEAtRiHKeZiwtZo9o9daVbHJTV0EyKdXvl8zae0STlzMDKT 5OkQ== X-Forwarded-Encrypted: i=1; AJvYcCU/zdukizafTudAk0h+HTurL+ORVyoocWEaokwGQU2+Z7+BOOaqSg8lnDl9u0XhHNmjybERrOXm3bAvRavP@lists.postgresql.org X-Gm-Message-State: AOJu0YytDx+ihmjxgom1XVUgZ7NnaoQ88DW86HO9rVV59/silAs89Iab B3LkxThe2jVI9MOhuniPneHLOBtiD9KKfzfqXq/xwaHuK/rhEWYTYfGJbnC4IsB1C1oDSg/10+j vX4S907U8+LajyDSo/RIP+N7oubMVRg== X-Google-Smtp-Source: AGHT+IEU+8m8kJjyHwk36a/erPTgOtOcFONWDveFVSHMh/Ns516FP73sYZpRM/uvgpqLyosiavkA2Xvhhqie8I++oKk= X-Received: by 2002:a05:6870:ac22:b0:287:9a03:94f0 with SMTP id 586e51a60fabf-295602608c8mr3454391fac.27.1731073152941; Fri, 08 Nov 2024 05:39:12 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: David Mullineux Date: Fri, 8 Nov 2024 13:39:02 +0000 Message-ID: Subject: Re: Performance Issue with Hash Partition Query Execution in PostgreSQL 16 To: ravi k Cc: Ramakrishna m , pgsql-general Content-Type: multipart/alternative; boundary="0000000000005c87ed062666e0cd" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005c87ed062666e0cd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Just spotted a potential problem. The indexed column is a bigint. Are you, in your prepared statement passing a string or a big int ? I notice your plan is doing an implicit type conversion when you run it manually. Sometimes the wrong type will make it not use the index. On Fri, 8 Nov 2024, 03:07 ravi k, wrote: > Hi , > > Thanks for the suggestions. > > Two more observations: > > 1) no sequence scan noticed from pg_stat_user_tables ( hope stats are > accurate in postgres 16) if parameter sniffing happens the possibility of > going to sequence scan is more right. > > 2) no blockings or IO issue during the time. > > 3) even with limit clause if touch all partitions also it could have been > completed in milliseconds as this is just one record. > > 4) auto_explain in prod we cannot enable as this is expensive and with > high TPS we may face latency issues and lower environment this issue cann= ot > be reproduced,( this is happening out of Million one case) > > This looks puzzle to us, just in case anyone experianced pls share your > experience. > > Regards, > Ravi > > On Thu, 7 Nov, 2024, 3:41=E2=80=AFam David Mullineux, = wrote: > >> It might be worth eliminating the use of cached plans here. Is your app >> using prepared statements at all? >> Point is that if the optimizer sees the same prepared query , 5 times, >> the it locks the plan that it found at that time. This is a good trade = off >> as it avoids costly planning-time for repetitive queries. But if you are >> manually querying, the a custom plan will be generated anew. >> A quick analyze of the table should reset the stats and invalidate any >> cached plans. >> This may not be your problem just worth eliminating it from the list of >> potential causes. >> >> On Wed, 6 Nov 2024, 17:14 Ramakrishna m, wrote: >> >>> Hi Team, >>> >>> One of the queries, which retrieves a single record from a table with 1= 6 >>> hash partitions, is taking more than 10 seconds to execute. In contrast= , >>> when we run the same query manually, it completes within milliseconds. = This >>> issue is causing exhaustion of the application pools. Do we have any bu= gs >>> in postgrs16 hash partitions? Please find the attached log, table, and >>> execution plan. >>> >>> size of the each partitions : 300GB >>> Index Size : 12GB >>> >>> Postgres Version : 16.x >>> Shared Buffers : 75 GB >>> Effective_cache : 175 GB >>> Work _mem : 4MB >>> Max_connections : 3000 >>> >>> OS : Ubuntu 22.04 >>> Ram : 384 GB >>> CPU : 64 >>> >>> Please let us know if you need any further information or if there are >>> additional details required. >>> >>> >>> Regards, >>> Ram. >>> >> --0000000000005c87ed062666e0cd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just spotted a potential problem. The indexed column is a= bigint. Are you, in your prepared statement passing a string or a big int = ?
I notice your plan is doing an implicit type conversion = when you run it manually.
Sometimes the wrong type w= ill make it not use the index.

On Fri, 8 Nov 2024, 03:07 ravi k, <<= a href=3D"mailto:ravisql09@gmail.com">ravisql09@gmail.com> wrote:
Hi ,

Thanks for the suggestions.

Two more observations:

1) no sequence scan noticed from pg_stat_= user_tables ( hope stats are accurate in postgres 16) if parameter sniffing= happens the possibility of going to=C2=A0 sequence scan is more right.

2) no blockings or IO issue= during the time.

3) eve= n with limit clause if touch all partitions also it could have been complet= ed in milliseconds as this is just one record.

<= /div>
4) auto_explain in prod we cannot enable as this is = expensive and with high TPS we may face latency issues and lower environmen= t this issue cannot be reproduced,( this is happening out of Million one ca= se)

This looks puzzle to= us, just in case anyone experianced pls share your experience.

Regards,
Rav= i

On Thu, 7 Nov, 2024, 3:41=E2=80=AFam David Mullineux, <dmullx@gmail= .com> wrote:
It might be worth eliminating the use of cached plans here. Is your app = using prepared statements at all?=C2=A0=C2=A0
Point is tha= t if the optimizer sees the same prepared query , 5 times, the=C2=A0 it loc= ks the plan that it found at that time. This is a good trade off as it avoi= ds costly planning-time for repetitive queries. But if you are manually que= rying, the=C2=A0 a custom plan will be generated=C2=A0 anew.
A quick analyze of the table should reset the stats and invalidat= e any cached plans.
This may not be your problem=C2= =A0 just worth eliminating it from the list of potential causes.

On We= d, 6 Nov 2024, 17:14 Ramakrishna m, <ram.pgdb@gmail.com&g= t; wrote:
Hi = Team,

One of the queries, which retriev= es a single record from a table with 16 hash partitions, is taking more tha= n 10 seconds to execute. In contrast, when we run the same query manually, = it completes within milliseconds. This issue is causing exhaustion of the a= pplication pools.=C2=A0Do we have any bugs in postgrs16 hash partitions? Pl= ease find the attached log, table, and execution plan.=C2=A0

size of the each partitions : 300GB=C2=A0
Index= Size : 12GB

Post= gres Version : 16.x
Shared Buffers : 75 GB
Effective_cache :=C2=A0 175 GB
Work _mem : 4MB<= /span>
Max_connections : 3000

OS=C2=A0 = :=C2=A0Ubuntu 22.04
Ram : 384 GB
CPU = : 64

Please let us kno= w if you need any further information or if there are additional details re= quired.=C2=A0=C2=A0


Regards,
Ram.
--0000000000005c87ed062666e0cd--