Re: Top -N Query performance issue and high CPU usage

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Ron Johnson <[email protected]>
To: pgsql-generallists.postgresql.org <[email protected]>
Subject: Re: Top -N Query performance issue and high CPU usage
Date: Tue, 3 Feb 2026 10:59:51 -0500
Message-ID: <CANzqJaAE6bevjj43W-94Bx1xGB9OPbUtOyXV2kaGpju1wt3k+A@mail.gmail.com> (raw)
In-Reply-To: <CAEzWdqc3hdnZbHTLd1kXxR3FUdxGpM7=Eea9c8Hp7QsF+HKMtg@mail.gmail.com>
References: <CAEzWdqd0SPkZMYNaAbERdgczkfQqLmNV5JBMmF-F9s7KjxJ0gw@mail.gmail.com>
	<[email protected]>
	<CAEzWdqd6LAHs+FiFeJLqDTS-QBLq6+foE1-mgBC9AXVpFmVnZg@mail.gmail.com>
	<vecavrvgzoxkks66nw2gvt3vot5lwbcm7f65iopgjbw72v2lc6@qd5leh3coj7g>
	<CAEzWdqcAbi0GYp_K64oZTeUeN3YN7-eFQ2m2fZDRvmnJx5Lb5w@mail.gmail.com>
	<CANzqJaDf4kKc89e_9YGZ+BorPbViYgPZomo1ssQO9utOHeStCg@mail.gmail.com>
	<CAEzWdqc-2O8mWGdeDhnzKrp7-kwC99sqJ+ArWUS38WuHUKP-UQ@mail.gmail.com>
	<CANzqJaBucLq65V9OH_Ruah7S=g+5s-L8yFkjELdAerZszzcOXA@mail.gmail.com>
	<CAEzWdqd3dcJoDDaG8t1nsgrdy7Tw-EvD1zMCXHy_uOgkLFAZdQ@mail.gmail.com>
	<CANzqJaDHtROOVSEB_i6Lo+wFki-vNuQ6wDTNGT-hCWRsMBpvZQ@mail.gmail.com>
	<CAEzWdqddHgqsZCachpQMgZmRRRuTQ9HxvdP6=Lhr+TYtPL=w-A@mail.gmail.com>
	<CANzqJaDxOsp=QNqa5bK4JXKOt7uHfxNpcTPZGWuVUMpGjNUB4Q@mail.gmail.com>
	<CAEzWdqc3hdnZbHTLd1kXxR3FUdxGpM7=Eea9c8Hp7QsF+HKMtg@mail.gmail.com>

On Tue, Feb 3, 2026 at 4:26 AM yudhi s <[email protected]> wrote:

> On Tue, Feb 3, 2026 at 4:50 AM Ron Johnson <[email protected]>
> wrote:
>
>> On Mon, Feb 2, 2026 at 3:43 PM yudhi s <[email protected]>
>> wrote:
>>
>>>
>>> On Tue, Feb 3, 2026 at 1:01 AM Ron Johnson <[email protected]>
>>> wrote:
>>>
>>>> On Mon, Feb 2, 2026 at 1:39 PM yudhi s <[email protected]>
>>>> wrote:
>>>>
>>>>> On Mon, Feb 2, 2026 at 8:57 PM Ron Johnson <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>> My apologies if i misunderstand the plan, But If I see,   it's
>>>>>>> spending ~140ms(140ms-6ms) i.e. almost all the time now, in performing the
>>>>>>> below nested loop join. So my question was , is there any possibility to
>>>>>>> reduce the resource consumption or response time further here?  Hope my
>>>>>>> understanding is correct here.
>>>>>>>
>>>>>>> -> Nested Loop (cost=266.53..1548099.38 rows=411215 width=20)
>>>>>>> (actual time=*6.009..147.695* rows=1049 loops=1)
>>>>>>> Join Filter: ((df.ent_id)::numeric = m.ent_id)
>>>>>>> Rows Removed by Join Filter: 513436
>>>>>>> Buffers: shared hit=1939
>>>>>>>
>>>>>>
>>>>>> I don't see m.ent_id in the actual query.  Did you only paste a
>>>>>> portion of the query?
>>>>>>
>>>>>> Also, casting in a JOIN typically brutalizes the ability to use an
>>>>>> index.
>>>>>>
>>>>>>
>>>>>> Thank you.
>>>>> Actually i tried executing the first two CTE where the query was
>>>>> spending most of the time  and teh alias has changed.
>>>>>
>>>>
>>>> We need to see everything, not just what you think is relevant.
>>>>
>>>>
>>>>> Also here i have changed the real table names before putting it here,
>>>>> hope that is fine.
>>>>> However , i verified the data type of the ent_id column in "ent" its
>>>>> "int8" and in table "txn_tbl" is "numeric 12", so do you mean to say this
>>>>> difference in the data type is causing this high response time during the
>>>>> nested loop join? My understanding was it will be internally castable
>>>>> without additional burden. Also, even i tried creating an index on the "(df.ent_id)::numeric"
>>>>> its still reulting into same plan and response time.
>>>>>
>>>>
>>>> If you'd shown the "\d" table definitions like Adrian asked two days
>>>> ago, we'd know what indexes are on each table, and not have to beg you to
>>>> dispense dribs and drabs of information.
>>>>
>>>>
>>> I am unable to run "\d" from the dbeaver sql worksheet. However,  I have
>>> fetched the DDL for the three tables and their selected columns, used in
>>> the smaller version of the query and its plan , which I recently updated.
>>>
>>> https://gist.github.com/databasetech0073/e4290b085f8f974e315fb41bdc47a1f3
>>>
>>> https://gist.github.com/databasetech0073/344df46c328e02b98961fab0cd221492
>>>
>>
>> Lines 30-32 are where most of the time and effort are taken.
>>
>> I can't be certain, but changing APP_schema.ent.ent_id from NUMERIC to
>> int8 (with a CHECK constraint to, well, constrain it to 12 digits, if
>> really necessary) is something I'd test.
>>
>> --
>>
>
>
> Thank you so much.
>
> After making the data types equal on both tables for the column ent_id the
> plan now looks as below. The costing function sinow removed. So it must be
> helping reduce CPU cycle consumption to some extent, But,  I still see
> ~100ms is spent in this step. Is there anything we can do to further drop
> the response time here? Or it's the best time we can get here.
>
>   ->  Nested Loop  (cost=262.77..1342550.91 rows=579149 width=20) (*actual
> time=6.406..107.946* rows=1049 loops=1)
>               Join Filter: (*df.ent_id = m.ent_id*)
>               Rows Removed by Join Filter: 514648
>               Buffers: shared hit=1972
>

Hmm.  What does pg_stat_user_tables say about when you last analyzed and
vacuumed APP_schema.txn_tbl and APP_schema.ent?

Beyond "aggressively keep those two tables analyzed, via reducing
autovacuum_analyze_scale_factor to something like 0.05, and adding
'vacuumdb -d mumble -j2 --analyze-only -t APP_schema.txn_tbl -t
APP_schema.ent' to crontab", I'm out of ideas.  An 85% speed improvement is
nothing to sneeze at, though.


> Also I do see in some other steps in the plan , the casting function is
> getting used. For example in the below filter. Here txn_tbl_type_nm is
> defined as Varchar(25) and still it's trying to cast it to Text. Can we do
> anything to avoid these force casts as these must consume the CPU cycles?
>
>     AND txn_tbl_dcsn.txn_tbl_txn_sts_tx NOT IN ('STATUS_A','STATUS_B')
>     WHERE txn_tbl.txn_tbl_type_nm IN ('TYPE1','TYPE2','TYPE3')
>
>   ->  Index Scan Backward using txn_tbl_due_dt_idx on txn_tbl df
>  (cost=0.43..115879.87 rows=1419195 width=20) (actual time=0.019..20.377
> rows=43727 loops=1)
> Filter: *((txn_tbl_type_nm)::text = ANY ('{TYPE1,TYPE2,TYPE3}'::text[])*)
> Rows Removed by Filter: 17
> Buffers: shared hit=1839
>

There is no VARCHAR or CHAR; there is only TEXT.  Thus, this is 100%
expected and normal.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

view thread (24+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Top -N Query performance issue and high CPU usage
  In-Reply-To: <CANzqJaAE6bevjj43W-94Bx1xGB9OPbUtOyXV2kaGpju1wt3k+A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox