Re: Why is this query touching 4gb of buffers?

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Why is this query touching 4gb of buffers?
5+ messages / 2 participants
[nested] [flat]

* Re: Why is this query touching 4gb of buffers?
@ 2025-10-24 11:19 hubert depesz lubaczewski <[email protected]>
  2025-10-24 12:54 ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: hubert depesz lubaczewski @ 2025-10-24 11:19 UTC (permalink / raw)
  To: PostgreSQL General <[email protected]>

On Fri, Oct 24, 2025 at 01:01:48PM +0200, hubert depesz lubaczewski wrote:
> Hi,
> I have weird-ish case, that I can't grok, or at least explain in
> hand-wavy way.

A bit more info. Due to how the database is setup we have MANY "copies"
of the same table - same name, same columns, different schema, different
(but similar) data.

So, I tested the problem on another instance of this table. And there
was something really, really weird.

(this query has distinct on two columns, but it doesn't seem to be relevant, and tests whether having it, or not, showed not to generate any reasonable differences).

First run of the query generated:

                                                                                QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 HashAggregate  (cost=181382.51..181745.19 rows=36268 width=16) (actual time=27962.826..27962.826 rows=0 loops=1)
   Group Key: communication_channel_id, root_account_id
   Batches: 1  Memory Usage: 1561kB
   Buffers: shared hit=4624 read=117838 dirtied=486
   ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178322.57 rows=611988 width=16) (actual time=27962.567..27962.567 rows=0 loops=1)
         Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
         Buffers: shared hit=4624 read=117838 dirtied=486
 Planning:
   Buffers: shared hit=174
 Planning Time: 1.863 ms
 Execution Time: 27963.620 ms
(11 rows)

Then, immediately I reran it, without reindex, without analyze, without anything. And I got:

                                                                            QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 HashAggregate  (cost=181388.32..181751.00 rows=36268 width=16) (actual time=0.653..0.653 rows=0 loops=1)
   Group Key: communication_channel_id, root_account_id
   Batches: 1  Memory Usage: 1561kB
   Buffers: shared hit=424
   ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178328.27 rows=612009 width=16) (actual time=0.438..0.438 rows=0 loops=1)
         Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
         Buffers: shared hit=424
 Planning Time: 0.123 ms
 Execution Time: 1.237 ms
(9 rows)

Time is irrelevant, the point is that we are going down from ~120k buffers
"touched" to 424 buffers. What is going on?

I tested the same case on yet another setup, and ran simlar query
multiple times in a row on secondary, and got:
Buffers: shared hit=113849 read=198047 => Execution Time: 1359.661 ms
Buffers: shared hit=311896 => Execution Time: 246.143 ms

But when I ran the query on primary server tghere was very visiolble
change in buffers accesses:

Buffers: shared hit=114893 read=197776 dirtied=5528 => Execution Time: 75863.479 ms
Buffers: shared hit=775 => Execution Time: 2.360 ms

This didn't change buffers "touches" on secondary, though.

What am I missing?

Best regards,

depesz








^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Why is this query touching 4gb of buffers?
  2025-10-24 11:19 Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
@ 2025-10-24 12:54 ` Tom Lane <[email protected]>
  2025-10-24 12:56   ` Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Tom Lane @ 2025-10-24 12:54 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

hubert depesz lubaczewski <[email protected]> writes:
> First run of the query generated:
> ...
>    ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178322.57 rows=611988 width=16) (actual time=27962.567..27962.567 rows=0 loops=1)
>          Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
>          Buffers: shared hit=4624 read=117838 dirtied=486

> Then, immediately I reran it, without reindex, without analyze, without anything. And I got:

>    ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178328.27 rows=612009 width=16) (actual time=0.438..0.438 rows=0 loops=1)
>          Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
>          Buffers: shared hit=424

> Time is irrelevant, the point is that we are going down from ~120k buffers
> "touched" to 424 buffers. What is going on?

The first execution probably had to set hint bits on a whole lot
of recently-deleted rows.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Why is this query touching 4gb of buffers?
  2025-10-24 11:19 Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  2025-10-24 12:54 ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
@ 2025-10-24 12:56   ` hubert depesz lubaczewski <[email protected]>
  2025-10-24 13:01     ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: hubert depesz lubaczewski @ 2025-10-24 12:56 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Fri, Oct 24, 2025 at 08:54:06AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <[email protected]> writes:
> > First run of the query generated:
> > ...
> >    ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178322.57 rows=611988 width=16) (actual time=27962.567..27962.567 rows=0 loops=1)
> >          Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
> >          Buffers: shared hit=4624 read=117838 dirtied=486
> 
> > Then, immediately I reran it, without reindex, without analyze, without anything. And I got:
> 
> >    ->  Index Scan using index_some_table_pending on some_table  (cost=0.42..178328.27 rows=612009 width=16) (actual time=0.438..0.438 rows=0 loops=1)
> >          Index Cond: (send_at <= '2025-10-23 12:35:48'::timestamp without time zone)
> >          Buffers: shared hit=424
> 
> > Time is irrelevant, the point is that we are going down from ~120k buffers
> > "touched" to 424 buffers. What is going on?
> 
> The first execution probably had to set hint bits on a whole lot
> of recently-deleted rows.

But why it doesn't happen/help on secondary?

Subsequent runs on secondary still have to "touch" hundreds of thousands
of pages", even if I'll do the run on primary that would reset hint
bits.

So, on primary - reruning the query fixes the "how many pages we touch"
- but on secondary, the number generally doesn't change, at least
withint 15-20 minute window.

Best regards,

depesz

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Why is this query touching 4gb of buffers?
  2025-10-24 11:19 Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  2025-10-24 12:54 ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
  2025-10-24 12:56   ` Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
@ 2025-10-24 13:01     ` Tom Lane <[email protected]>
  2025-10-24 13:18       ` Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Tom Lane @ 2025-10-24 13:01 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

hubert depesz lubaczewski <[email protected]> writes:
> On Fri, Oct 24, 2025 at 08:54:06AM -0400, Tom Lane wrote:
>> The first execution probably had to set hint bits on a whole lot
>> of recently-deleted rows.

> But why it doesn't happen/help on secondary?

IIRC, secondaries aren't authorized to update hint bits for
themselves, they have to wait for the primary to do it and then
propagate the new data.  There might also be some question of
what the oldest open transaction is ...

			regards, tom lane



Received: from malur.postgresql.org ([217.196.149.56])
	by arkaria.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.94.2)
	(envelope-from <[email protected]>)
	id 1vCHPz-00BZNK-FI
	for [email protected]; Fri, 24 Oct 2025 13:01:18 +0000
Received: from localhost ([127.0.0.1] helo=malur.postgresql.org)
	by malur.postgresql.org with esmtp (Exim 4.94.2)
	(envelope-from <[email protected]>)
	id 1vCHPy-00GTUC-5N
	for [email protected]; Fri, 24 Oct 2025 13:01:17 +0000
Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29])
	by malur.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.94.2)
	(envelope-from <[email protected]>)
	id 1vCHPx-00GTU4-RP
	for [email protected]; Fri, 24 Oct 2025 13:01:16 +0000
Received: from sss.pgh.pa.us ([68.162.161.243])
	by magus.postgresql.org with esmtps  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.96)
	(envelope-from <[email protected]>)
	id 1vCHPu-003zl9-1r
	for [email protected];
	Fri, 24 Oct 2025 13:01:16 +0000
Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1])
	by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 59OD1BRM1755688;
	Fri, 24 Oct 2025 09:01:11 -0400
From: Tom Lane <[email protected]>
To: [email protected]
cc: PostgreSQL General <[email protected]>
Subject: Re: Why is this query touching 4gb of buffers?
In-reply-to: <[email protected]>
References: <[email protected]> <[email protected]> <[email protected]> <[email protected]>
Comments: In-reply-to hubert depesz lubaczewski <[email protected]>
	message dated "Fri, 24 Oct 2025 14:56:12 +0200"
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <[email protected]>
Date: Fri, 24 Oct 2025 09:01:11 -0400
Message-ID: <[email protected]>
List-Id: <pgsql-general.lists.postgresql.org>
List-Help: <https://lists.postgresql.org/manage/;
List-Subscribe: <https://lists.postgresql.org/manage/;
List-Post: <mailto:[email protected]>
List-Owner: <mailto:[email protected]>
List-Archive: <https://www.postgresql.org/list/pgsql-general;
Archived-At: <https://www.postgresql.org/message-id/1755687.1761310871%40sss.pgh.pa.us;
Precedence: bulk

hubert depesz lubaczewski <[email protected]> writes:
> On Fri, Oct 24, 2025 at 08:54:06AM -0400, Tom Lane wrote:
>> The first execution probably had to set hint bits on a whole lot
>> of recently-deleted rows.

> But why it doesn't happen/help on secondary?

IIRC, secondaries aren't authorized to update hint bits for
themselves, they have to wait for the primary to do it and then
propagate the new data.  There might also be some question of
what the oldest open transaction is ...

			regards, tom lane







^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Why is this query touching 4gb of buffers?
  2025-10-24 11:19 Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  2025-10-24 12:54 ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
  2025-10-24 12:56   ` Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
  2025-10-24 13:01     ` Re: Why is this query touching 4gb of buffers? Tom Lane <[email protected]>
@ 2025-10-24 13:18       ` hubert depesz lubaczewski <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: hubert depesz lubaczewski @ 2025-10-24 13:18 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Fri, Oct 24, 2025 at 09:01:11AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <[email protected]> writes:
> > On Fri, Oct 24, 2025 at 08:54:06AM -0400, Tom Lane wrote:
> >> The first execution probably had to set hint bits on a whole lot
> >> of recently-deleted rows.
> 
> > But why it doesn't happen/help on secondary?
> 
> IIRC, secondaries aren't authorized to update hint bits for
> themselves, they have to wait for the primary to do it and then
> propagate the new data.  There might also be some question of
> what the oldest open transaction is ...

OK. So tested this idea on yet another "setup".

We have single primary, and two streaming replicas. Before test I ran:
select now() - min(xact_start), now() - pg_last_xact_replay_timestamp() from pg_stat_activity

on both replicas, and got:

    ?column?     │    ?column?
═════════════════╪═════════════════
 00:00:00.003007 │ 00:00:00.003673
(1 row)

and

 ?column? │     ?column?
══════════╪══════════════════
 00:00:00 │ -00:00:00.006129
(1 row)

Then I ran the problematic query on replica 1 (the first one), and got
numbers:

Buffers: shared hit=21107
Execution Time: 18.621 ms

Subsequent runs on the replica showed the same buffers usage, and
similar time.

Then I ran this query twice on primary, and noticed improvement:
Buffers: shared hit=569 read=20927 dirtied=498
Execution Time: 2596.283 ms

and on 2nd run:

Buffers: shared hit=391
Execution Time: 2.015 ms

Awesome. Well, mostly.

Now, I waited ~ 1 minute, checked replication lag, and oldest transactions
replies, where all lags were < 1s, and longest transaction across
primary/secondary1/secondary2 was ~ 4s.

Then I re-ran the query on secondary 1 and got:
Buffers: shared hit=21107
Execution Time: 18.403 ms

Interestingly, on one "setup" running vacuum analyze of the table helped
execution on secondary, but on this one, it doesn't.

Best regards,

depesz

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2025-10-24 13:18 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-10-24 11:19 Re: Why is this query touching 4gb of buffers? hubert depesz lubaczewski <[email protected]>
2025-10-24 12:54 ` Tom Lane <[email protected]>
2025-10-24 12:56   ` hubert depesz lubaczewski <[email protected]>
2025-10-24 13:01     ` Tom Lane <[email protected]>
2025-10-24 13:18       ` hubert depesz lubaczewski <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox