Re: log XLogPrefetch stats at end of recovery

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Jakub Wartak <[email protected]>
To: Lakshmi N <[email protected]>
To: Thomas Munro <[email protected]>
Cc: Bharath Rupireddy <[email protected]>
Cc: SATYANARAYANA NARLAPURAM <[email protected]>
Cc: [email protected]
Subject: Re: log XLogPrefetch stats at end of recovery
Date: Tue, 24 Mar 2026 09:53:44 +0100
Message-ID: <CAKZiRmz3n24HZVzQX4=25y4tMZ9PjYQjmbGhJkrziAQjdHru5g@mail.gmail.com> (raw)
In-Reply-To: <CA+3i_M9s6T-jOsvSZk5y1HVvyz_RVbiEQZGdwPRQrw+zZcZ5Eg@mail.gmail.com>
References: <CA+3i_M-+WVyrRcfnxa4gt+KQUzH-LTQ8D2JCGE_O0WzJJtZfUg@mail.gmail.com>
	<CAHg+QDfLh-yb+OacN6qPYF6H3SPDmVJUiors7vkSMq4yxfhNrA@mail.gmail.com>
	<CALj2ACWovfe_A43LP5fi0mJenZ-i99_ux68YCzX6D2CO5mB-gg@mail.gmail.com>
	<CAKZiRmwuKVHaQa_3PirbBqzoCKciuqcn2Tn4Df4d8icq4e3DCA@mail.gmail.com>
	<CA+3i_M9s6T-jOsvSZk5y1HVvyz_RVbiEQZGdwPRQrw+zZcZ5Eg@mail.gmail.com>

On Tue, Mar 24, 2026 at 9:28 AM Lakshmi N <[email protected]> wrote:
>
> Hi all,
>
> Thank you for your feedback. Please see the attached patch.
>
> On Mon, Mar 23, 2026 at 4:30 AM Jakub Wartak <[email protected]> wrote:
>>
>> On Sun, Mar 22, 2026 at 1:43 AM Bharath Rupireddy
>> <[email protected]> wrote:
>>
>> Hi,
>>
>> > On Sat, Mar 21, 2026 at 1:16 AM SATYANARAYANA NARLAPURAM
>> > <[email protected]> wrote:
>> > >
>> > > > While investigating a long recovery, I noticed that XLogPrefetch stats were not logged at the end of recovery. This log message will be useful to understand how effective XLogPrefetch was during recovery. Adding a patch to address this.
>> > >
>> > > Applied this patch and validated the log message. This log message appears to be useful to me, particularly while doing fleet wide analysis.
>> > >
>> > > 2026-03-20 23:33:13.756 PDT [2265441] LOG:  XLogPrefetcher stats: prefetch=14, hit=6, skip_init=5, skip_new=28, skip_fpw=18, skip_rep=996
>> >
>> > This looks useful to understand how the prefetch helped during long recoveries.
>> >
>> > > I am wondering if we can periodically log this in standby mode as well, not just before promoting?
>> >
>> > Timer-based startup progress messaging allows logging such things
>> > (ereport_startup_progress API). There was an attempt to enable "redo
>> > in progress" for standbys, but that seemed to flood the standby logs
>> > even at the default progress interval of 10 sec.
>> >
>> > Having said that, the prefetcher stats could be added to the existing
>> > ereport_startup_progress("redo in progress xxx") message that works
>> > for crash recoveries—however, I don't prefer doing a bunch of atomic
>> > reads every progress interval of 10 sec.
>>
>> > Therefore, logging at the end of recovery looks good to me.
>>
>> +1 from me too to only of logging at the end of recovery (so -1 to logging
>> every now and then). If someone is interested in current state (or progress
>> over time) I think he can query pg_stat_recovery_prefetch view already, even
>> today, right?
>>
>> > I reviewed the patch. I have the following comment:
>> >
>> > + elog(LOG, "XLogPrefetcher stats: prefetch=%lu, hit=%lu,
>> > skip_init=%lu, skip_new=%lu, skip_fpw=%lu, skip_rep=%lu",
>> >
>> > XLogPrefetcher is an internal data structure name, how about "redo
>> > prefetch stats: xxxx" to be consistent with other redo log messages?
>>
>> +1
>
>
> please find the attached patch addressing this.

Hi Lakshmi,

The pg_stat_get_recovery_prefetch view has 3 additional columns, but they are
showing current situation, maybe this $patch could be enhanced so that it also
calculates averages for some of them? I mean perhaps not just hit ratio would be
nice, but also the e.g. __average__ wal distance or average look ahead
(block distance).
I'm not sure maybe please also wait for input from others if they find
it a good idea
or not.

Also if we are adding such pretty cryptic fields as "skip_fpw" to log,
we would need some
place in documentaiton to explain their meaining probably.
monitoring.sgml already
explains them, so maybe just link to that. I mean if you look what we
have today (below),
they are little less crypting and have different format, not
"skip_fpw=123" but more
like "17 removed" so maybe "17 skipped due to FPW". Maybe

redo done at 0/75C7B290 system usage: CPU: user: 1.02 s, system: 0.51
s, elapsed: 1.53 s
checkpoint complete: end-of-recovery fast wait: wrote 16289 buffers
(99.4%), wrote 3 SLRU buffers; 0 WAL file(s) added, 17 removed, 0
recycled; write=0.049 s, sync=0.191 s, total=0.264 s; sync files=10,
longest=0.187 s, average=0.020 s; distance=287186 kB, estimate=287186
kB; lsn=0/75C7B2C8, redo lsn=0/75C7B2C8

so instead of like:
redo prefetch stats: prefetch=%lu, hit=%lu, skip_init=%lu,
skip_new=%lu, skip_fpw=%lu, skip_rep=%lu"

something like below ones:
redo prefetch stats: done %lu prefetches, %lu hit, %lu zero-initated, ..
redo prefetch stats: done %lu prefetches, (%d% hit ratio), %lu
zero-initated, .. or something like that

-J.

view thread (11+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: log XLogPrefetch stats at end of recovery
  In-Reply-To: <CAKZiRmz3n24HZVzQX4=25y4tMZ9PjYQjmbGhJkrziAQjdHru5g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox