Top -N Query performance issue and high CPU usage

public inbox for [email protected]  
help / color / mirror / Atom feed

Top -N Query performance issue and high CPU usage
24+ messages / 7 participants
[nested] [flat]

* Top -N Query performance issue and high CPU usage
@ 2026-01-31 13:30  yudhi s <[email protected]>
  0 siblings, 3 replies; 24+ messages in thread

From: yudhi s @ 2026-01-31 13:30 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

Hello Experts,
 We have a "Select" query which is using three to five main transaction
tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million
rows in each of them(which is going to increase to have ~50-100million in
future) and others(6-7) tables out of which some are master and some other
small tables.

When we are running this query , and it's taking ~2-3seconds , however when
we hit this query from 10-15 session at same time its causing CPU spike up
to ~50-60% for the DB instance and this is increasing and touching 90% when
we are increasing the hits further to 40-50 times concurrently.

This query is going to be called in the first page of an UI screen and is
supposed to show the latest 1000 rows based on a certain transaction date.
This query is supposed to allow thousands of users to hit this same query
at the first landing page at the same time.  Its postgres version 17.  The
instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in
postgres to understand what part/line of the query is contributing to the
high cpu time?
2)How can we tune this query to further reduce response time and mainly CPU
consumption ? Is any additional index or anything will make this plan
better further?
3) Is there any guidance or best practices exists , to create/design top
N-queries for such UI scenarios where performance is an important factor?
4)And based on the CPU core and memory , is there any calculation by using
which , we can say that this machine can support a maximum N number of
concurrent queries of such type beyond which we need more cpu cores
machines?

Below is the query and its current plan:-
https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f

Regards
Yudhi

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-01-31 14:41  David Mullineux <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 1 reply; 24+ messages in thread

From: David Mullineux @ 2026-01-31 14:41 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

On Sat, 31 Jan 2026, 13:30 yudhi s, <[email protected]> wrote:

> Hello Experts,
>  We have a "Select" query which is using three to five main transaction
> tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million
> rows in each of them(which is going to increase to have ~50-100million in
> future) and others(6-7) tables out of which some are master and some other
> small tables.
>
> When we are running this query , and it's taking ~2-3seconds , however
> when we hit this query from 10-15 session at same time its causing CPU
> spike up to ~50-60% for the DB instance and this is increasing and touching
> 90% when we are increasing the hits further to 40-50 times concurrently.
>
> This query is going to be called in the first page of an UI screen and is
> supposed to show the latest 1000 rows based on a certain transaction date.
> This query is supposed to allow thousands of users to hit this same query
> at the first landing page at the same time.  Its postgres version 17.  The
> instance has 2-VCPU and 16GB RAM.
>
> I have the following questions.
>
> 1)Why is this query causing a high cpu spike ,if there is any way in
> postgres to understand what part/line of the query is contributing to the
> high cpu time?
> 2)How can we tune this query to further reduce response time and mainly
> CPU consumption ? Is any additional index or anything will make this plan
> better further?
> 3) Is there any guidance or best practices exists , to create/design top
> N-queries for such UI scenarios where performance is an important factor?
> 4)And based on the CPU core and memory , is there any calculation by using
> which , we can say that this machine can support a maximum N number of
> concurrent queries of such type beyond which we need more cpu cores
> machines?
>
> Below is the query and its current plan:-
> https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f
>
> Regards
> Yudhi
>

Plan says it's using temp files for sorting....I would suggest you increase
work_mem for this to avoid temp.fike creation...Although not the answer to
all your problems, it would be a good start .


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-01-31 16:14  Adrian Klaver <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 1 reply; 24+ messages in thread

From: Adrian Klaver @ 2026-01-31 16:14 UTC (permalink / raw)
  To: yudhi s <[email protected]>; pgsql-general <[email protected]>

On 1/31/26 05:30, yudhi s wrote:
> Hello Experts,

> This query is going to be called in the first page of an UI screen and 
> is supposed to show the latest 1000 rows based on a certain transaction 
> date. This query is supposed to allow thousands of users to hit this 
> same query at the first landing page at the same time. Its postgres 
> version 17.  The instance has 2-VCPU and 16GB RAM.

1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM 
and is insufficient resources for what you want to do.

2) You will need to provide the schema definitions for the tables involved.

4)And based on the CPU core and memory , is there any calculation by 
using which , we can say that this machine can support a maximum N 
number of concurrent queries of such type beyond which we need more cpu 
cores machines?

You already have the beginnings of a chart:

1 session 2-3 secs

10-15 sessions 50-60% usage

40-50 sessions 90% usage

> 
> Regards
> Yudhi

-- 
Adrian Klaver
[email protected]

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-01-31 19:39  yudhi s <[email protected]>
  parent: David Mullineux <[email protected]>
  0 siblings, 0 replies; 24+ messages in thread

From: yudhi s @ 2026-01-31 19:39 UTC (permalink / raw)
  To: David Mullineux <[email protected]>; +Cc: pgsql-general <[email protected]>

>
>
>>
> Plan says it's using temp files for sorting....I would suggest you
> increase work_mem for this to avoid temp.fike creation...Although not the
> answer to all your problems, it would be a good start .
>
>
Even setting work_mem to 64MB remove all the "temp read" and showig all
memory reads, but still we are seeing similar cpu spike when executing this
query from multiple sessions and also the response time is staying same.


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-01-31 19:46  yudhi s <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 3 replies; 24+ messages in thread

From: yudhi s @ 2026-01-31 19:46 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general <[email protected]>

Thank you.

>
> 1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
> and is insufficient resources for what you want to do.
>
>
Can you please explain a bit in detail, how much minimum VCPU and RAM will
be enough resources to suffice this requirement? and you normally do that
calculation?

> 2) You will need to provide the schema definitions for the tables involved.
>
> Do you mean table DDL or just the index definitions on the tables should
help?

Also i was trying to understand , by just looking into the "explain
analyze" output, is there any way we can tie the specific step in the plan
, which is the major contributor of the cpu resources? Such that we can
then try to fix that part rather than looking throughout the query as its
big query?

And if any suggestion to improve the TOP-N queries where the base table may
have many rows in it.

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-01-31 21:05  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-01-31 21:05 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

On Sat, Jan 31, 2026 at 2:47 PM yudhi s <[email protected]> wrote:

> Thank you.
>
>>
>> 1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
>> and is insufficient resources for what you want to do.
>>
>>
> Can you please explain a bit in detail, how much minimum VCPU and RAM will
> be enough resources to suffice this requirement? and you normally do that
> calculation?
>
>
>> 2) You will need to provide the schema definitions for the tables
>> involved.
>>
>> Do you mean table DDL or just the index definitions on the tables should
> help?
>
> Also i was trying to understand , by just looking into the "explain
> analyze" output, is there any way we can tie the specific step in the plan
> , which is the major contributor of the cpu resources? Such that we can
> then try to fix that part rather than looking throughout the query as its
> big query?
>

It looks like 71% (748ms of a total 1056ms) of elapsed time is taken by the
c_1.tran_date  external sort on line 150.

That, obviously, is what you should work on.

1. You say you increased work_mem.  From what, to what?
2. But that it did not reduce execution time.  Please post the EXPLAIN from
after increasing work_mem.
3. Did you remember to run SELECT pg_reload_conf(); after increasing
work_mem?
4. Is there an index on APP_schema.txn_tbl.tran_date?

And if any suggestion to improve the TOP-N queries where the base table may
> have many rows in it.
>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-01 12:54  Luigi Nardi <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 0 replies; 24+ messages in thread

From: Luigi Nardi @ 2026-02-01 12:54 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

On Sat, Jan 31, 2026 at 10:05 PM Ron Johnson <[email protected]>
wrote:

> On Sat, Jan 31, 2026 at 2:47 PM yudhi s <[email protected]>
> wrote:
>
>> Thank you.
>>
>>>
>>> 1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
>>> and is insufficient resources for what you want to do.
>>>
>>>
>> Can you please explain a bit in detail, how much minimum VCPU and RAM
>> will be enough resources to suffice this requirement? and you normally do
>> that calculation?
>>
>>
>>> 2) You will need to provide the schema definitions for the tables
>>> involved.
>>>
>>> Do you mean table DDL or just the index definitions on the tables should
>> help?
>>
>> Also i was trying to understand , by just looking into the "explain
>> analyze" output, is there any way we can tie the specific step in the plan
>> , which is the major contributor of the cpu resources? Such that we can
>> then try to fix that part rather than looking throughout the query as its
>> big query?
>>
>
> It looks like 71% (748ms of a total 1056ms) of elapsed time is taken by
> the c_1.tran_date  external sort on line 150.
>
> That, obviously, is what you should work on.
>
> 1. You say you increased work_mem.  From what, to what?
> 2. But that it did not reduce execution time.  Please post the EXPLAIN
> from after increasing work_mem.
> 3. Did you remember to run SELECT pg_reload_conf(); after increasing
> work_mem?
> 4. Is there an index on APP_schema.txn_tbl.tran_date?
>
> And if any suggestion to improve the TOP-N queries where the base table
>> may have many rows in it.
>>
>
>
The DBtune Free Edition <http://app.dbtune.com; can help you find the
correct adjustments for work_mem and other server parameters
<https://dbtune.com/blog/dbtunes-multi-dimensional-performance-tuning-space;.
It's designed to help optimize your PostgreSQL runtime for your current
hardware setup.



> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-01 21:47  Peter J. Holzer <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 3 replies; 24+ messages in thread

From: Peter J. Holzer @ 2026-02-01 21:47 UTC (permalink / raw)
  To: [email protected]

On 2026-02-01 01:16:56 +0530, yudhi s wrote:
> Thank you. 
> 
> 
>     1) Without even looking at the plan I'm going to say 2-VCPU and 16GB RAM
>     and is insufficient resources for what you want to do.
> 
> 
> 
> Can you please explain a bit in detail, how much minimum VCPU and RAM will be
> enough resources to suffice this requirement? and you normally do that
> calculation?

You wrote:

| This query is supposed to allow thousands of users to hit this same
| query at the first landing page at the same time.

If you meant that literally, you would need thousands of cores to handle
those thousands of simultaneous queries and enough RAM for thousands of
sessions, each performing a rather complex query. So possibly hundreds
of maybe even thousands of gigabytes, not 16.

However, maybe you didn't mean that. There are relatively few
applications where thousands of users log in within a second. Maybe you
just meant that there would be thousands of users logged in in total. If
so, how many simultaneus queries do you really expect?

If you do have that many simultaneous accesses to the landing page, and
you can't speed up the query significantly (I take it you've seen the
suggestion to check whether there's an index on
APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
for every user? I don't know what the query is supposed to do, but
unless the "ent_id" is really a user id, it doesn't seem to be specific
to the user. So maybe you can cache the result for a minute or an hour
and show the same result to everybody who logs in during that time.

        hjp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | [email protected]         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-01 21:56  Ron Johnson <[email protected]>
  parent: Peter J. Holzer <[email protected]>
  2 siblings, 0 replies; 24+ messages in thread

From: Ron Johnson @ 2026-02-01 21:56 UTC (permalink / raw)
  To: [email protected]

On Sun, Feb 1, 2026 at 4:47 PM Peter J. Holzer <[email protected]> wrote:
[snip]

> If you do have that many simultaneous accesses to the landing page, and
> you can't speed up the query significantly (I take it you've seen the
> suggestion to check whether there's an index on
> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
> for every user? I don't know what the query is supposed to do, but
> unless the "ent_id" is really a user id, it doesn't seem to be specific
> to the user. So maybe you can cache the result for a minute or an hour
> and show the same result to everybody who logs in during that time.
>

That's what I was thinking, too: app server background process continually
runs that query in a loop, feeding the results to a shared cache; the end
user connections then read the latest version of the cached results.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 01:06  Adrian Klaver <[email protected]>
  parent: yudhi s <[email protected]>
  2 siblings, 0 replies; 24+ messages in thread

From: Adrian Klaver @ 2026-02-02 01:06 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-general <[email protected]>

On 1/31/26 11:46, yudhi s wrote:
> Thank you.
> 
> 
>     1) Without even looking at the plan I'm going to say 2-VCPU and 16GB
>     RAM
>     and is insufficient resources for what you want to do.
> 
> 
> Can you please explain a bit in detail, how much minimum VCPU and RAM 
> will be enough resources to suffice this requirement? and you normally 
> do that calculation?

Don't know what the minimum requirements are. It would depend on many 
variables 1) The plan being chosen, which in turn depends on the schema 
information as well as the data turnover. 2) What the VCPU is actually 
emulating. 3) The efficiency of of the virtual machines/containers with 
regard to accessing memory and storage. 4) The service limits of the 
virtualization. 5) What the storage system and how performant it is.

In other words this is something you will need to test and derive your 
own formula for.

> 
>     2) You will need to provide the schema definitions for the tables
>     involved.
> 
> Do you mean table DDL or just the index definitions on the tables should 
> help?

Basically what you get in psql when you do \d some_table.

> 
> Also i was trying to understand , by just looking into the "explain 
> analyze" output, is there any way we can tie the specific step in the 
> plan , which is the major contributor of the cpu resources? Such that we 
> can then try to fix that part rather than looking throughout the query 
> as its big query?
> 
> And if any suggestion to improve the TOP-N queries where the base table 
> may have many rows in it.


-- 
Adrian Klaver
[email protected]




Received: from malur.postgresql.org ([217.196.149.56])
	by arkaria.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96)
	(envelope-from <[email protected]>)
	id 1vmiOY-00FYkq-0e
	for [email protected];
	Mon, 02 Feb 2026 01:06:26 +0000
Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
	by malur.postgresql.org with esmtp (Exim 4.96)
	(envelope-from <[email protected]>)
	id 1vmiOW-00Axt2-0y
	for [email protected];
	Mon, 02 Feb 2026 01:06:25 +0000
Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29])
	by malur.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96)
	(envelope-from <[email protected]>)
	id 1vmiOV-00Axsu-0X
	for [email protected];
	Mon, 02 Feb 2026 01:06:24 +0000
Received: from fout-a7-smtp.messagingengine.com ([103.168.172.150])
	by magus.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.98.2)
	(envelope-from <[email protected]>)
	id 1vmiOS-00000000aZz-2Exr
	for [email protected];
	Mon, 02 Feb 2026 01:06:23 +0000
Received: from phl-compute-07.internal (phl-compute-07.internal [10.202.2.47])
	by mailfout.phl.internal (Postfix) with ESMTP id 48692EC0120;
	Sun,  1 Feb 2026 20:06:18 -0500 (EST)
Received: from phl-frontend-03 ([10.202.2.162])
  by phl-compute-07.internal (MEProxy); Sun, 01 Feb 2026 20:06:18 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h=
	cc:cc:content-transfer-encoding:content-type:content-type:date
	:date:from:from:in-reply-to:in-reply-to:message-id:mime-version
	:references:reply-to:subject:subject:to:to; s=fm2; t=1769994378;
	 x=1770080778; bh=ir+G/EPW/FpWxQlPktDREsIx+678cu9NdBR2jW/r3h0=; b=
	KqijiKGUh7YcYMtTXYC1GQ6q+8SigLhJdR2aISyzlWe1zSunPtaq6M5H5YXipOtZ
	ezhoXxM250KIt5pWe5TO301z6/jM1Qfx8el9oB5/k1+93xYVQD0ZWR6TO+vdwlgX
	GgYTI30RBIYQQLPJRBXzWABwJUSgIAlQY52jvOOz1dwy0Xg1/p3uQTcjUu/CXvW+
	ysKe31qp+cXjhegkQCnqQifO6/wj2JI0YxVW/EX9xq/XqA/+WuSqgA0tv79mP0ij
	03tW6OW1bPR9rChJXFo3AFVHmT52zP4p7ZOUpfrwUqhjdfSCxzuxVljq3UTmR8aj
	AnMW92nIiKZvsqMNroEV0Q==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
	messagingengine.com; h=cc:cc:content-transfer-encoding
	:content-type:content-type:date:date:feedback-id:feedback-id
	:from:from:in-reply-to:in-reply-to:message-id:mime-version
	:references:reply-to:subject:subject:to:to:x-me-proxy
	:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1769994378; x=
	1770080778; bh=ir+G/EPW/FpWxQlPktDREsIx+678cu9NdBR2jW/r3h0=; b=l
	Hcmw+w1iJ2rNpIVocnWU/HHJo6ahNFVqh1WpXNeW9z8SN0W9hWI50FJ3Hn7fKIA6
	b11xNfmPzVgRLImuAEy2mZDJ5zsaWt/S9J0cV77a5j7bJYUYW+jlItDx3TWF29hH
	Z/S+DyE6R2J6dFadqLJvSzle0SyZ9MYGlHkNDpg2vtP4AbOgn/gDo1m0mGHdCrBu
	BLDeealet2zjcfBW5WKxNABIjh9+Vx06jsAXelZyKPckvXuJPGcyHGkZs4un5Cg+
	izLzYODusvA8eH6uA+8ndXI7JYFHG1pvXEHXM6evy1la4oy3pMWDgt2Z4ZlCtNEP
	WiGKkDmNiWcaeWyKY78rg==
X-ME-Sender: <xms:ivh_aTMeR6Xfv0rx4LHA_1w5Rv3A9EB2LhkJuvpxwtWvpXbuffdhJw>
    <xme:ivh_aT9NAS70NqUV-vgUu29iSmePiXa21sPTzZxnMrBuDgS7dr5FBt1GrcDi3-6C4
    wRnyND1ppZ-RJf_vP15rOSkXWKPXa0nKQUjW1DEwjaSr7jdws8>
X-ME-Received: <xmr:ivh_af5zUc5YHuLLCE_IBpDuwEMgObgdiU77r_U4NnuNZ7yakatn8CMQ-eA1FD-DxUilGAkwGnkSEf0CWoYDfwA74PkU8tK8>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddujeeifeduucetufdoteggodetrf
    dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu
    rghilhhouhhtmecufedttdenucenucfjughrpefkffggfgfuvfevfhfhjggtgfesthekre
    dttddvjeenucfhrhhomheptegurhhirghnucfmlhgrvhgvrhcuoegrughrihgrnhdrkhhl
    rghvvghrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrghtthgvrhhnpeefgeefieeutd
    fggfetgefgheekjeehteeileeigfetieekjedvieeviefgheevtdenucevlhhushhtvghr
    ufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrughrihgrnhdrkhhlrghvvg
    hrsegrkhhlrghvvghrrdgtohhmpdhnsggprhgtphhtthhopedvpdhmohguvgepshhmthhp
    ohhuthdprhgtphhtthhopehlvggrrhhnvghruggrthgrsggrshgvleelsehgmhgrihhlrd
    gtohhmpdhrtghpthhtohepphhgshhqlhdqghgvnhgvrhgrlheslhhishhtshdrphhoshht
    ghhrvghsqhhlrdhorhhg
X-ME-Proxy: <xmx:ivh_aS3g5mA15cEtEfvFjK6ZYaC9Uq47X6cmypDhv14MK4uGvk06CQ>
    <xmx:ivh_aeBe-EjyOEIY66JzJOD4m9DxJzN0DxeTdZNg0eRAxcdCVgUr5A>
    <xmx:ivh_ac2QFYtWy8fSBsi6t9ko6PClbfS8oVVf0wNCCBgYGogo0DegmQ>
    <xmx:ivh_aQuI17b_2vBM5CoCIR-OqlpByxq-o-RdjTZKXvDnVqqAATFFXw>
    <xmx:ivh_aQfuG44DZAOwCThKZJKsayyHFKPMZiQxgxHqwLQVOK7a0jsxm8k4>
Feedback-ID: i76984098:Fastmail
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun,
 1 Feb 2026 20:06:17 -0500 (EST)
Message-ID: <[email protected]>
Date: Sun, 1 Feb 2026 17:06:17 -0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Top -N Query performance issue and high CPU usage
To: yudhi s <[email protected]>
Cc: pgsql-general <[email protected]>
References: <CAEzWdqd0SPkZMYNaAbERdgczkfQqLmNV5JBMmF-F9s7KjxJ0gw@mail.gmail.com>
 <[email protected]>
 <CAEzWdqd6LAHs+FiFeJLqDTS-QBLq6+foE1-mgBC9AXVpFmVnZg@mail.gmail.com>
Content-Language: en-US
From: Adrian Klaver <[email protected]>
In-Reply-To: <CAEzWdqd6LAHs+FiFeJLqDTS-QBLq6+foE1-mgBC9AXVpFmVnZg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
List-Id: <pgsql-general.lists.postgresql.org>
List-Help: <https://lists.postgresql.org/manage/;
List-Subscribe: <https://lists.postgresql.org/manage/;
List-Post: <mailto:[email protected]>
List-Owner: <mailto:[email protected]>
List-Archive: <https://www.postgresql.org/list/pgsql-general;
Archived-At: <https://www.postgresql.org/message-id/b87f6128-5df8-4fca-ae25-da7c7fa870eb%40aklaver.com;
Precedence: bulk

On 1/31/26 11:46, yudhi s wrote:
> Thank you.
> 
> 
>     1) Without even looking at the plan I'm going to say 2-VCPU and 16GB
>     RAM
>     and is insufficient resources for what you want to do.
> 
> 
> Can you please explain a bit in detail, how much minimum VCPU and RAM 
> will be enough resources to suffice this requirement? and you normally 
> do that calculation?

Don't know what the minimum requirements are. It would depend on many 
variables 1) The plan being chosen, which in turn depends on the schema 
information as well as the data turnover. 2) What the VCPU is actually 
emulating. 3) The efficiency of of the virtual machines/containers with 
regard to accessing memory and storage. 4) The service limits of the 
virtualization. 5) What the storage system and how performant it is.

In other words this is something you will need to test and derive your 
own formula for.

> 
>     2) You will need to provide the schema definitions for the tables
>     involved.
> 
> Do you mean table DDL or just the index definitions on the tables should 
> help?

Basically what you get in psql when you do \d some_table.

> 
> Also i was trying to understand , by just looking into the "explain 
> analyze" output, is there any way we can tie the specific step in the 
> plan , which is the major contributor of the cpu resources? Such that we 
> can then try to fix that part rather than looking throughout the query 
> as its big query?
> 
> And if any suggestion to improve the TOP-N queries where the base table 
> may have many rows in it.


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 04:17  yudhi s <[email protected]>
  parent: Peter J. Holzer <[email protected]>
  2 siblings, 0 replies; 24+ messages in thread

From: yudhi s @ 2026-02-02 04:17 UTC (permalink / raw)
  To: [email protected]; Ron Johnson <[email protected]>; [email protected]

On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <[email protected]> wrote:

>
> However, maybe you didn't mean that. There are relatively few
> applications where thousands of users log in within a second. Maybe you
> just meant that there would be thousands of users logged in in total. If
> so, how many simultaneus queries do you really expect?
>
> If you do have that many simultaneous accesses to the landing page, and
> you can't speed up the query significantly (I take it you've seen the
> suggestion to check whether there's an index on
> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
> for every user? I don't know what the query is supposed to do, but
> unless the "ent_id" is really a user id, it doesn't seem to be specific
> to the user. So maybe you can cache the result for a minute or an hour
> and show the same result to everybody who logs in during that time.
>
>
Thank you so much. I need to get back on the exact number of such queries
which can hit the database. However, as 1000 of users will be there, so the
possibility of all logging into the system on the same page at same time
needs to be found out. Will double check on this.

However,  when you said caching :- The results on the base tables are going
to be ~30-50 million. This landing page has filters on it so it may be of
30+ different combinations based on the user's choice. So do you suggest ,
we will populate the base data in a materialized view(named like "landing
page data") which we can refresh (maybe once in ~5 minutes behind the
scenes) and then that can be queried in the landing page directly. And we
can have suitable indexes created on the materialized view based on the
dynamic filter criteria?

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 11:39  yudhi s <[email protected]>
  parent: Peter J. Holzer <[email protected]>
  2 siblings, 1 reply; 24+ messages in thread

From: yudhi s @ 2026-02-02 11:39 UTC (permalink / raw)
  To: [email protected]

On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <[email protected]> wrote:

>
> If you do have that many simultaneous accesses to the landing page, and
> you can't speed up the query significantly (I take it you've seen the
> suggestion to check whether there's an index on
> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
> for every user? I don't know what the query is supposed to do, but
> unless the "ent_id" is really a user id, it doesn't seem to be specific
> to the user. So maybe you can cache the result for a minute or an hour
> and show the same result to everybody who logs in during that time.
>
>
>

There was no index on column  tran_date  , I created one and it's making
the query finish in  ~200ms, a lot faster than in the past. Below is the
portion of the query and its plan which actually consumes most of the
resource and time post the new index creation.

https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492

1) Now the part  which takes time is the "nested loop" join on the
"ent_id"  column. Can we do anything to make it much better/faster?

2) Also another question I had was,  with this new index the table scan of
txn_tbl is now fully eliminated by the "Index Scan Backward" even i have
other columns from that table projected in the query, so how its getting
all those column values without visiting table but just that index scan
backward operation?

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 13:34  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-02-02 13:34 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 6:39 AM yudhi s <[email protected]> wrote:

>
>
> On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <[email protected]> wrote:
>
>>
>> If you do have that many simultaneous accesses to the landing page, and
>> you can't speed up the query significantly (I take it you've seen the
>> suggestion to check whether there's an index on
>> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
>> for every user? I don't know what the query is supposed to do, but
>> unless the "ent_id" is really a user id, it doesn't seem to be specific
>> to the user. So maybe you can cache the result for a minute or an hour
>> and show the same result to everybody who logs in during that time.
>>
>>
>>
>
> There was no index on column  tran_date  , I created one and it's making
> the query finish in  ~200ms, a lot faster than in the past. Below is the
> portion of the query and its plan which actually consumes most of the
> resource and time post the new index creation.
>
> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>
> 1) Now the part  which takes time is the "nested loop" join on the
> "ent_id"  column. Can we do anything to make it much better/faster?
>
> 2) Also another question I had was,  with this new index the table scan of
> txn_tbl is now fully eliminated by the "Index Scan Backward" even i have
> other columns from that table projected in the query, so how its getting
> all those column values without visiting table but just that index scan
> backward operation?
>

Reading through EXPLAIN output isn't always a mystery.

Search for "actual time" and you'll find row 53, which is the "deepest"
(most nested) row with the highest actual time.

That tells you where the time is now spent, and what it's doing.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 13:53  yudhi s <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: yudhi s @ 2026-02-02 13:53 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 7:04 PM Ron Johnson <[email protected]> wrote:

> On Mon, Feb 2, 2026 at 6:39 AM yudhi s <[email protected]>
> wrote:
>
>>
>>
>> On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <[email protected]> wrote:
>>
>>>
>>> If you do have that many simultaneous accesses to the landing page, and
>>> you can't speed up the query significantly (I take it you've seen the
>>> suggestion to check whether there's an index on
>>> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
>>> for every user? I don't know what the query is supposed to do, but
>>> unless the "ent_id" is really a user id, it doesn't seem to be specific
>>> to the user. So maybe you can cache the result for a minute or an hour
>>> and show the same result to everybody who logs in during that time.
>>>
>>>
>>>
>>
>> There was no index on column  tran_date  , I created one and it's making
>> the query finish in  ~200ms, a lot faster than in the past. Below is the
>> portion of the query and its plan which actually consumes most of the
>> resource and time post the new index creation.
>>
>> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>>
>> 1) Now the part  which takes time is the "nested loop" join on the
>> "ent_id"  column. Can we do anything to make it much better/faster?
>>
>> 2) Also another question I had was,  with this new index the table scan
>> of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have
>> other columns from that table projected in the query, so how its getting
>> all those column values without visiting table but just that index scan
>> backward operation?
>>
>
> Reading through EXPLAIN output isn't always a mystery.
>
> Search for "actual time" and you'll find row 53, which is the "deepest"
> (most nested) row with the highest actual time.
>
> That tells you where the time is now spent, and what it's doing.
>
>
>
My apologies if i misunderstand the plan, But If I see,   it's spending
~140ms(140ms-6ms) i.e. almost all the time now, in performing the below
nested loop join. So my question was , is there any possibility to reduce
the resource consumption or response time further here?  Hope my
understanding is correct here.

-> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=
*6.009..147.695* rows=1049 loops=1)
Join Filter: ((df.ent_id)::numeric = m.ent_id)
Rows Removed by Join Filter: 513436
Buffers: shared hit=1939


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 15:00  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-02-02 15:00 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 8:53 AM yudhi s <[email protected]> wrote:

>
>
> On Mon, Feb 2, 2026 at 7:04 PM Ron Johnson <[email protected]>
> wrote:
>
>> On Mon, Feb 2, 2026 at 6:39 AM yudhi s <[email protected]>
>> wrote:
>>
>>>
>>>
>>> On Mon, Feb 2, 2026 at 3:17 AM Peter J. Holzer <[email protected]> wrote:
>>>
>>>>
>>>> If you do have that many simultaneous accesses to the landing page, and
>>>> you can't speed up the query significantly (I take it you've seen the
>>>> suggestion to check whether there's an index on
>>>> APP_schema.txn_tbl.tran_date), then maybe you don't need to perform it
>>>> for every user? I don't know what the query is supposed to do, but
>>>> unless the "ent_id" is really a user id, it doesn't seem to be specific
>>>> to the user. So maybe you can cache the result for a minute or an hour
>>>> and show the same result to everybody who logs in during that time.
>>>>
>>>>
>>>>
>>>
>>> There was no index on column  tran_date  , I created one and it's
>>> making the query finish in  ~200ms, a lot faster than in the past. Below is
>>> the portion of the query and its plan which actually consumes most of the
>>> resource and time post the new index creation.
>>>
>>> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>>>
>>> 1) Now the part  which takes time is the "nested loop" join on the
>>> "ent_id"  column. Can we do anything to make it much better/faster?
>>>
>>> 2) Also another question I had was,  with this new index the table scan
>>> of txn_tbl is now fully eliminated by the "Index Scan Backward" even i have
>>> other columns from that table projected in the query, so how its getting
>>> all those column values without visiting table but just that index scan
>>> backward operation?
>>>
>>
>> Reading through EXPLAIN output isn't always a mystery.
>>
>> Search for "actual time" and you'll find row 53, which is the "deepest"
>> (most nested) row with the highest actual time.
>>
>> That tells you where the time is now spent, and what it's doing.
>>
>>
>>
> My apologies if i misunderstand the plan, But If I see,   it's spending
> ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below
> nested loop join. So my question was , is there any possibility to reduce
> the resource consumption or response time further here?  Hope my
> understanding is correct here.
>
> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual time=
> *6.009..147.695* rows=1049 loops=1)
> Join Filter: ((df.ent_id)::numeric = m.ent_id)
> Rows Removed by Join Filter: 513436
> Buffers: shared hit=1939
>

I don't see m.ent_id in the actual query.  Did you only paste a portion of
the query?

Also, casting in a JOIN typically brutalizes the ability to use an index.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 18:39  yudhi s <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: yudhi s @ 2026-02-02 18:39 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]> wrote:

>
>> My apologies if i misunderstand the plan, But If I see,   it's spending
>> ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below
>> nested loop join. So my question was , is there any possibility to reduce
>> the resource consumption or response time further here?  Hope my
>> understanding is correct here.
>>
>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual
>> time=*6.009..147.695* rows=1049 loops=1)
>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>> Rows Removed by Join Filter: 513436
>> Buffers: shared hit=1939
>>
>
> I don't see m.ent_id in the actual query.  Did you only paste a portion
> of the query?
>
> Also, casting in a JOIN typically brutalizes the ability to use an index.
>
>
> Thank you.
Actually i tried executing the first two CTE where the query was spending
most of the time  and teh alias has changed. Also here i have changed the
real table names before putting it here, hope that is fine.
However , i verified the data type of the ent_id column in "ent" its "int8"
and in table "txn_tbl" is "numeric 12", so do you mean to say this
difference in the data type is causing this high response time during the
nested loop join? My understanding was it will be internally castable
without additional burden. Also, even i tried creating an index on the
"(df.ent_id)::numeric"
its still reulting into same plan and response time.


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 19:31  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-02-02 19:31 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]> wrote:

> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
> wrote:
>
>>
>>> My apologies if i misunderstand the plan, But If I see,   it's spending
>>> ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below
>>> nested loop join. So my question was , is there any possibility to reduce
>>> the resource consumption or response time further here?  Hope my
>>> understanding is correct here.
>>>
>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual
>>> time=*6.009..147.695* rows=1049 loops=1)
>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>> Rows Removed by Join Filter: 513436
>>> Buffers: shared hit=1939
>>>
>>
>> I don't see m.ent_id in the actual query.  Did you only paste a portion
>> of the query?
>>
>> Also, casting in a JOIN typically brutalizes the ability to use an index.
>>
>>
>> Thank you.
> Actually i tried executing the first two CTE where the query was spending
> most of the time  and teh alias has changed.
>

We need to see everything, not just what you think is relevant.


> Also here i have changed the real table names before putting it here, hope
> that is fine.
> However , i verified the data type of the ent_id column in "ent" its
> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
> difference in the data type is causing this high response time during the
> nested loop join? My understanding was it will be internally castable
> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
> its still reulting into same plan and response time.
>

If you'd shown the "\d" table definitions like Adrian asked two days ago,
we'd know what indexes are on each table, and not have to beg you to
dispense dribs and drabs of information.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 20:43  yudhi s <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: yudhi s @ 2026-02-02 20:43 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <[email protected]> wrote:

> On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]>
> wrote:
>
>> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
>> wrote:
>>
>>>
>>>> My apologies if i misunderstand the plan, But If I see,   it's spending
>>>> ~140ms(140ms-6ms) i.e. almost all the time now, in performing the below
>>>> nested loop join. So my question was , is there any possibility to reduce
>>>> the resource consumption or response time further here?  Hope my
>>>> understanding is correct here.
>>>>
>>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual
>>>> time=*6.009..147.695* rows=1049 loops=1)
>>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>>> Rows Removed by Join Filter: 513436
>>>> Buffers: shared hit=1939
>>>>
>>>
>>> I don't see m.ent_id in the actual query.  Did you only paste a portion
>>> of the query?
>>>
>>> Also, casting in a JOIN typically brutalizes the ability to use an index.
>>>
>>>
>>> Thank you.
>> Actually i tried executing the first two CTE where the query was spending
>> most of the time  and teh alias has changed.
>>
>
> We need to see everything, not just what you think is relevant.
>
>
>> Also here i have changed the real table names before putting it here,
>> hope that is fine.
>> However , i verified the data type of the ent_id column in "ent" its
>> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
>> difference in the data type is causing this high response time during the
>> nested loop join? My understanding was it will be internally castable
>> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
>> its still reulting into same plan and response time.
>>
>
> If you'd shown the "\d" table definitions like Adrian asked two days ago,
> we'd know what indexes are on each table, and not have to beg you to
> dispense dribs and drabs of information.
>
>
I am unable to run "\d" from the dbeaver sql worksheet. However,  I have
fetched the DDL for the three tables and their selected columns, used in
the smaller version of the query and its plan , which I recently updated.

https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3

https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492

Regards
Yudhi


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-02 23:19  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-02-02 23:19 UTC (permalink / raw)
  To: yudhi s <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Mon, Feb 2, 2026 at 3:43 PM yudhi s <[email protected]> wrote:

>
> On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <[email protected]>
> wrote:
>
>> On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]>
>> wrote:
>>
>>> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
>>> wrote:
>>>
>>>>
>>>>> My apologies if i misunderstand the plan, But If I see,   it's
>>>>> spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the
>>>>> below nested loop join. So my question was , is there any possibility to
>>>>> reduce the resource consumption or response time further here?  Hope my
>>>>> understanding is correct here.
>>>>>
>>>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual
>>>>> time=*6.009..147.695* rows=1049 loops=1)
>>>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>>>> Rows Removed by Join Filter: 513436
>>>>> Buffers: shared hit=1939
>>>>>
>>>>
>>>> I don't see m.ent_id in the actual query.  Did you only paste a
>>>> portion of the query?
>>>>
>>>> Also, casting in a JOIN typically brutalizes the ability to use an
>>>> index.
>>>>
>>>>
>>>> Thank you.
>>> Actually i tried executing the first two CTE where the query was
>>> spending most of the time  and teh alias has changed.
>>>
>>
>> We need to see everything, not just what you think is relevant.
>>
>>
>>> Also here i have changed the real table names before putting it here,
>>> hope that is fine.
>>> However , i verified the data type of the ent_id column in "ent" its
>>> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
>>> difference in the data type is causing this high response time during the
>>> nested loop join? My understanding was it will be internally castable
>>> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
>>> its still reulting into same plan and response time.
>>>
>>
>> If you'd shown the "\d" table definitions like Adrian asked two days ago,
>> we'd know what indexes are on each table, and not have to beg you to
>> dispense dribs and drabs of information.
>>
>>
> I am unable to run "\d" from the dbeaver sql worksheet. However,  I have
> fetched the DDL for the three tables and their selected columns, used in
> the smaller version of the query and its plan , which I recently updated.
>
> https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3
>
> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>

Lines 30-32 are where most of the time and effort are taken.

I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to int8
(with a CHECK constraint to, well, constrain it to 12 digits, if really
necessary) is something I'd test.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-03 09:26  yudhi s <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: yudhi s @ 2026-02-03 09:26 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; +Cc: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <[email protected]> wrote:

> On Mon, Feb 2, 2026 at 3:43 PM yudhi s <[email protected]>
> wrote:
>
>>
>> On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <[email protected]>
>> wrote:
>>
>>> On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]>
>>> wrote:
>>>
>>>> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>> My apologies if i misunderstand the plan, But If I see,   it's
>>>>>> spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the
>>>>>> below nested loop join. So my question was , is there any possibility to
>>>>>> reduce the resource consumption or response time further here?  Hope my
>>>>>> understanding is correct here.
>>>>>>
>>>>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20) (actual
>>>>>> time=*6.009..147.695* rows=1049 loops=1)
>>>>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>>>>> Rows Removed by Join Filter: 513436
>>>>>> Buffers: shared hit=1939
>>>>>>
>>>>>
>>>>> I don't see m.ent_id in the actual query.  Did you only paste a
>>>>> portion of the query?
>>>>>
>>>>> Also, casting in a JOIN typically brutalizes the ability to use an
>>>>> index.
>>>>>
>>>>>
>>>>> Thank you.
>>>> Actually i tried executing the first two CTE where the query was
>>>> spending most of the time  and teh alias has changed.
>>>>
>>>
>>> We need to see everything, not just what you think is relevant.
>>>
>>>
>>>> Also here i have changed the real table names before putting it here,
>>>> hope that is fine.
>>>> However , i verified the data type of the ent_id column in "ent" its
>>>> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
>>>> difference in the data type is causing this high response time during the
>>>> nested loop join? My understanding was it will be internally castable
>>>> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
>>>> its still reulting into same plan and response time.
>>>>
>>>
>>> If you'd shown the "\d" table definitions like Adrian asked two days
>>> ago, we'd know what indexes are on each table, and not have to beg you to
>>> dispense dribs and drabs of information.
>>>
>>>
>> I am unable to run "\d" from the dbeaver sql worksheet. However,  I have
>> fetched the DDL for the three tables and their selected columns, used in
>> the smaller version of the query and its plan , which I recently updated.
>>
>> https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3
>>
>> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>>
>
> Lines 30-32 are where most of the time and effort are taken.
>
> I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to
> int8 (with a CHECK constraint to, well, constrain it to 12 digits, if
> really necessary) is something I'd test.
>
> --
>


Thank you so much.

After making the data types equal on both tables for the column ent_id the
plan now looks as below. The costing function sinow removed. So it must be
helping reduce CPU cycle consumption to some extent, But,  I still see
~100ms is spent in this step. Is there anything we can do to further drop
the response time here? Or it's the best time we can get here.

  ->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (*actual
time=6.406..107.946* rows=1049 loops=1)
              Join Filter: (*df.ent_id = m.ent_id*)
              Rows Removed by Join Filter: 514648
              Buffers: shared hit=1972


Also I do see in some other steps in the plan , the casting function is
getting used. For example in the below filter. Here txn_tbl_type_nm is
defined as Varchar(25) and still it's trying to cast it to Text. Can we do
anything to avoid these force casts as these must consume the CPU cycles?

    AND txn_tbl_dcsn.txn_tbl_txn_sts_tx NOT IN ('STATUS_A','STATUS_B')
    WHERE txn_tbl.txn_tbl_type_nm IN ('TYPE1','TYPE2','TYPE3')

  ->  Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df
 (cost=0.43..115879.87 rows=1419195 width=20) (actual time=0.019..20.377
rows=43727 loops=1)
Filter: *((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[])*)
Rows Removed by Filter: 17
Buffers: shared hit=1839

The plan is as below.

https://gist.github.com/databasetech0073/558377c1939a9291e7b72b1cbac7c9f9

Regards
Yudhi


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-03 15:59  Ron Johnson <[email protected]>
  parent: yudhi s <[email protected]>
  0 siblings, 1 reply; 24+ messages in thread

From: Ron Johnson @ 2026-02-03 15:59 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Tue, Feb 3, 2026 at 4:26 AM yudhi s <[email protected]> wrote:

> On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <[email protected]>
> wrote:
>
>> On Mon, Feb 2, 2026 at 3:43 PM yudhi s <[email protected]>
>> wrote:
>>
>>>
>>> On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <[email protected]>
>>> wrote:
>>>
>>>> On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]>
>>>> wrote:
>>>>
>>>>> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>> My apologies if i misunderstand the plan, But If I see,   it's
>>>>>>> spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the
>>>>>>> below nested loop join. So my question was , is there any possibility to
>>>>>>> reduce the resource consumption or response time further here?  Hope my
>>>>>>> understanding is correct here.
>>>>>>>
>>>>>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20)
>>>>>>> (actual time=*6.009..147.695* rows=1049 loops=1)
>>>>>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>>>>>> Rows Removed by Join Filter: 513436
>>>>>>> Buffers: shared hit=1939
>>>>>>>
>>>>>>
>>>>>> I don't see m.ent_id in the actual query.  Did you only paste a
>>>>>> portion of the query?
>>>>>>
>>>>>> Also, casting in a JOIN typically brutalizes the ability to use an
>>>>>> index.
>>>>>>
>>>>>>
>>>>>> Thank you.
>>>>> Actually i tried executing the first two CTE where the query was
>>>>> spending most of the time  and teh alias has changed.
>>>>>
>>>>
>>>> We need to see everything, not just what you think is relevant.
>>>>
>>>>
>>>>> Also here i have changed the real table names before putting it here,
>>>>> hope that is fine.
>>>>> However , i verified the data type of the ent_id column in "ent" its
>>>>> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
>>>>> difference in the data type is causing this high response time during the
>>>>> nested loop join? My understanding was it will be internally castable
>>>>> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
>>>>> its still reulting into same plan and response time.
>>>>>
>>>>
>>>> If you'd shown the "\d" table definitions like Adrian asked two days
>>>> ago, we'd know what indexes are on each table, and not have to beg you to
>>>> dispense dribs and drabs of information.
>>>>
>>>>
>>> I am unable to run "\d" from the dbeaver sql worksheet. However,  I have
>>> fetched the DDL for the three tables and their selected columns, used in
>>> the smaller version of the query and its plan , which I recently updated.
>>>
>>> https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3
>>>
>>> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>>>
>>
>> Lines 30-32 are where most of the time and effort are taken.
>>
>> I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to
>> int8 (with a CHECK constraint to, well, constrain it to 12 digits, if
>> really necessary) is something I'd test.
>>
>> --
>>
>
>
> Thank you so much.
>
> After making the data types equal on both tables for the column ent_id the
> plan now looks as below. The costing function sinow removed. So it must be
> helping reduce CPU cycle consumption to some extent, But,  I still see
> ~100ms is spent in this step. Is there anything we can do to further drop
> the response time here? Or it's the best time we can get here.
>
>   ->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (*actual
> time=6.406..107.946* rows=1049 loops=1)
>               Join Filter: (*df.ent_id = m.ent_id*)
>               Rows Removed by Join Filter: 514648
>               Buffers: shared hit=1972
>

Hmm.  What does pg_stat_user_tables say about when you last analyzed and
vacuumed APP_schema.txn_tbl and APP_schema.ent?

Beyond "aggressively keep those two tables analyzed, via reducing
autovacuum_analyze_scale_factor to something like 0.05, and adding
'vacuumdb -d mumble -j2 --analyze-only -t APP_schema.txn_tbl -t
APP_schema.ent' to crontab", I'm out of ideas.  An 85% speed improvement is
nothing to sneeze at, though.


> Also I do see in some other steps in the plan , the casting function is
> getting used. For example in the below filter. Here txn_tbl_type_nm is
> defined as Varchar(25) and still it's trying to cast it to Text. Can we do
> anything to avoid these force casts as these must consume the CPU cycles?
>
>     AND txn_tbl_dcsn.txn_tbl_txn_sts_tx NOT IN ('STATUS_A','STATUS_B')
>     WHERE txn_tbl.txn_tbl_type_nm IN ('TYPE1','TYPE2','TYPE3')
>
>   ->  Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df
>  (cost=0.43..115879.87 rows=1419195 width=20) (actual time=0.019..20.377
> rows=43727 loops=1)
> Filter: *((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[])*)
> Rows Removed by Filter: 17
> Buffers: shared hit=1839
>

There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
expected and normal.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-03 16:07  Adrian Klaver <[email protected]>
  parent: Ron Johnson <[email protected]>
  0 siblings, 0 replies; 24+ messages in thread

From: Adrian Klaver @ 2026-02-03 16:07 UTC (permalink / raw)
  To: Ron Johnson <[email protected]>; pgsql-generallists.postgresql.org <[email protected]>

On 2/3/26 07:59, Ron Johnson wrote:

> 
> 
> There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100% 
> expected and normal.

What Ron is saying is that there are varchar and char types, but they 
boil down to text per:

https://www.postgresql.org/docs/current/datatype-character.html

"text is PostgreSQL's native string data type, in that most built-in 
functions operating on strings are declared to take or return text not 
character varying. For many purposes, character varying acts as though 
it were a domain over text."

As to performance see:

"
Tip

There is no performance difference among these three types, apart from 
increased storage space when using the blank-padded type, and a few 
extra CPU cycles to check the length when storing into a 
length-constrained column. While character(n) has performance advantages 
in some other database systems, there is no such advantage in 
PostgreSQL; in fact character(n) is usually the slowest of the three 
because of its additional storage costs. In most situations text or 
character varying should be used instead.
"

> 
> -- 
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!

-- 
Adrian Klaver
[email protected]

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-04 15:48  [email protected]
  parent: yudhi s <[email protected]>
  2 siblings, 1 reply; 24+ messages in thread

From: [email protected] @ 2026-02-04 15:48 UTC (permalink / raw)
  To: [email protected]

Have you tried adding an index to txn_tbl.txn_type? 
And a vacuum on all tables? It seems the visibility map is outdated.

I'm using https://explain.dalibo.com to view the plan visually; it's more convenient.

You could use the option to periodically save the results of queries with common filters to another table, and then retrieve the results from that table when a user performs a query with their own filters.
You should also store the user's query results somewhere for a while to prevent excessive database access.

I imagine this is some kind of dashboard that each user is taken to after authenticating. It looks nice in presentations, but after a while in production, it can make the system unusable. I had to remove similar charts from the homepage of a system because after a year of work, they were taking a minute to load.

 On Saturday, January 31, 2026 at 08:30:33 AM GMT-5, yudhi s <[email protected]> wrote:
 Hello Experts,
 We have a "Select" query which is using three to five main transaction tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million rows in each of them(which is going to increase to have ~50-100million in future) and others(6-7) tables out of which some are master and some other small tables.

When we are running this query , and it's taking ~2-3seconds , however when we hit this query from 10-15 session at same time its causing CPU spike up to ~50-60% for the DB instance and this is increasing and touching 90% when we are increasing the hits further to 40-50 times concurrently.

This query is going to be called in the first page of an UI screen and is supposed to show the latest 1000 rows based on a certain transaction date. This query is supposed to allow thousands of users to hit this same query at the first landing page at the same time.

Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.

I have the following questions.

1)Why is this query causing a high cpu spike ,if there is any way in postgres to understand what part/line of the query is contributing to the high cpu time?
2)How can we tune this query to further reduce response time and mainly CPU consumption ? Is any additional index or anything will make this plan better further?
3) Is there any guidance or best practices exists , to create/design top N-queries for such UI scenarios where performance is an important factor?
4)And based on the CPU core and memory , is there any calculation by using which , we can say that this machine can support a maximum N number of concurrent queries of such type beyond which we need more cpu cores machines?
Below is the query and its current plan:-https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f
RegardsYudhi

^ permalink  raw  reply  [nested|flat] 24+ messages in thread

* Re: Top -N Query performance issue and high CPU usage
@ 2026-02-05 08:35  yudhi s <[email protected]>
  parent: [email protected]
  0 siblings, 0 replies; 24+ messages in thread

From: yudhi s @ 2026-02-05 08:35 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

On Wed, Feb 4, 2026 at 9:18 PM <[email protected]> wrote:

>
> Have you tried adding an index to txn_tbl.txn_type?
> And a vacuum on all tables? It seems the visibility map is outdated.
>
> I'm using https://explain.dalibo.com to view the plan visually; it's more
> convenient.
>
> You could use the option to periodically save the results of queries with
> common filters to another table, and then retrieve the results from that
> table when a user performs a query with their own filters.
> You should also store the user's query results somewhere for a while to
> prevent excessive database access.
>
> I imagine this is some kind of dashboard that each user is taken to after
> authenticating. It looks nice in presentations, but after a while in
> production, it can make the system unusable. I had to remove similar charts
> from the homepage of a system because after a year of work, they were
> taking a minute to load.
>
>
>  On Saturday, January 31, 2026 at 08:30:33 AM GMT-5, yudhi s <
> [email protected]> wrote:
>  Hello Experts,
>  We have a "Select" query which is using three to five main transaction
> tables (txn_tbl, txn_status, txn_decision, txn_sale, ath) holding ~2million
> rows in each of them(which is going to increase to have ~50-100million in
> future) and others(6-7) tables out of which some are master and some other
> small tables.
>
> When we are running this query , and it's taking ~2-3seconds , however
> when we hit this query from 10-15 session at same time its causing CPU
> spike up to ~50-60% for the DB instance and this is increasing and touching
> 90% when we are increasing the hits further to 40-50 times concurrently.
>
> This query is going to be called in the first page of an UI screen and is
> supposed to show the latest 1000 rows based on a certain transaction date.
> This query is supposed to allow thousands of users to hit this same query
> at the first landing page at the same time.
>
> Its postgres version 17.  The instance has 2-VCPU and 16GB RAM.
>
> I have the following questions.
>
> 1)Why is this query causing a high cpu spike ,if there is any way in
> postgres to understand what part/line of the query is contributing to the
> high cpu time?
> 2)How can we tune this query to further reduce response time and mainly
> CPU consumption ? Is any additional index or anything will make this plan
> better further?
> 3) Is there any guidance or best practices exists , to create/design top
> N-queries for such UI scenarios where performance is an important factor?
> 4)And based on the CPU core and memory , is there any calculation by using
> which , we can say that this machine can support a maximum N number of
> concurrent queries of such type beyond which we need more cpu cores
> machines?
> Below is the query and its current plan:-
> https://gist.github.com/databasetech0073/6688701431dc4bf4eaab8d345c1dc65f
> RegardsYudhi
>
>
>
As folks suggested , adding an index on "tran_date" and combining the CTE
to two, and making the data type equal for the "ent_id" has helped reduce
the response to a large extent. Now I am trying to see if we can reduce any
further. As most of the time(100-20=~80ms) is now on materialize loop which
is happening 43K times.

Also thinking if adding "txn_tbl_type_nm" column to the index i.e.
composite index on (tran_date,txn_tbl_type_nm) will be advisable , in cases
where , ~500K rows will be filtered  by the *txn_tbl_type_nm *filter
criteria (currently its just 17 rows getting filtered though for this case).

https://gist.github.com/databasetech0073/558377c1939a9291e7b72b1cbac7c9f9

-> Nested Loop (cost=263.20..1680202.56 rows=483106 width=20) (actual
time=6.421..111.220 rows=1000 loops=1)
Buffers: shared hit=6168
-> Nested Loop (cost=262.77..1342550.91 rows=579149 width=20) (*actual
time=6.406..107.946* rows=1049 loops=1)
Join Filter: (df.ent_id = m.ent_id)
Rows Removed by Join Filter: 514648
Buffers: shared hit=1972
-> Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df
(cost=0.43..115879.87 rows=1419195 width=20) (*actual time=0.019..20.377*
rows=43727 loops=1)
*Filter: ((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[]))*
*Rows Removed by Filter: 17*
Buffers: shared hit=1839
-> Materialize (cost=262.35..364.01 rows=58 width=8) (actual
time=0.000..0.001 rows=12 loops=43727)
Buffers: shared hit=133



Regards
Yudhi


^ permalink  raw  reply  [nested|flat] 24+ messages in thread

end of thread, other threads:[~2026-02-05 08:35 UTC | newest]

Thread overview: 24+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-01-31 13:30 Top -N Query performance issue and high CPU usage yudhi s <[email protected]>
2026-01-31 14:41 ` David Mullineux <[email protected]>
2026-01-31 19:39   ` yudhi s <[email protected]>
2026-01-31 16:14 ` Adrian Klaver <[email protected]>
2026-01-31 19:46   ` yudhi s <[email protected]>
2026-01-31 21:05     ` Ron Johnson <[email protected]>
2026-02-01 12:54       ` Luigi Nardi <[email protected]>
2026-02-01 21:47     ` Peter J. Holzer <[email protected]>
2026-02-01 21:56       ` Ron Johnson <[email protected]>
2026-02-02 04:17       ` yudhi s <[email protected]>
2026-02-02 11:39       ` yudhi s <[email protected]>
2026-02-02 13:34         ` Ron Johnson <[email protected]>
2026-02-02 13:53           ` yudhi s <[email protected]>
2026-02-02 15:00             ` Ron Johnson <[email protected]>
2026-02-02 18:39               ` yudhi s <[email protected]>
2026-02-02 19:31                 ` Ron Johnson <[email protected]>
2026-02-02 20:43                   ` yudhi s <[email protected]>
2026-02-02 23:19                     ` Ron Johnson <[email protected]>
2026-02-03 09:26                       ` yudhi s <[email protected]>
2026-02-03 15:59                         ` Ron Johnson <[email protected]>
2026-02-03 16:07                           ` Adrian Klaver <[email protected]>
2026-02-02 01:06     ` Adrian Klaver <[email protected]>
2026-02-04 15:48 ` [email protected]
2026-02-05 08:35   ` yudhi s <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox