Re: [WIP] Pipelined Recovery

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Imran Zaheer <[email protected]>
To: [email protected]
Cc: Xuneng Zhou <[email protected]>
Cc: Zsolt Parragi <[email protected]>
Cc: Jakub Wartak <[email protected]>
Cc: Hayato Kuroda (Fujitsu) <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: [WIP] Pipelined Recovery
Date: Wed, 8 Apr 2026 13:46:04 +0500
Message-ID: <CA+UBfa=qDfWB90w5AsmX4f3PbeeM++GbaoVagd9ff-DKQDLvWA@mail.gmail.com> (raw)
In-Reply-To: <CAAAe_zCxg2NTG_i1erLQQr8Wn+6SQ3EMOmp+N4J58Xxb21g2BQ@mail.gmail.com>
References: <CA+UBfa=vDV8wbmAV0pgrx-FuJh+x8YOW23vJ90Jzr=14rV+9jA@mail.gmail.com>
	<OS9PR01MB12149A4E7927072A215AEED69F565A@OS9PR01MB12149.jpnprd01.prod.outlook.com>
	<CA+UBfakmkdtauuRsOVXFqhFVJt0nTdEadx94tJn+qG0Pe8Wjfw@mail.gmail.com>
	<CAN4CZFM7FV0VTNkujD=Mb7tNa+jkmEfnX7carvj95fY6Tp11FQ@mail.gmail.com>
	<CA+UBfamW6NuuMMQTDRPDQ0a9fWN_u2OvjEF98u3CfYKTBcOZMw@mail.gmail.com>
	<CA+UBfa=Dv-2tLSEKHJ0YFFH8PCTHxnX9rtVZeV8gd8q1a-GmYA@mail.gmail.com>
	<CA+UBfa=PKdShpSTTTSHwXdGPZnm2rGMKPjERNOdS0SG9t9CT3Q@mail.gmail.com>
	<CABPTF7WVW2x4XitXttHwCamSZcBn=Q+wLjf+M+MuEbZSAxqdDw@mail.gmail.com>
	<CAAAe_zCxg2NTG_i1erLQQr8Wn+6SQ3EMOmp+N4J58Xxb21g2BQ@mail.gmail.com>

>
> Hi Xuneng, Imran, and everyone,
>

Hi Henson and Xuneng.

Thanks for explaining the approaches to Xuneng.

>
> The two approaches target different bottlenecks. The current patch
> parallelizes WAL decoding, which keeps the redo path single-threaded
> and avoids the Hot Standby visibility problem entirely.
>

You are right both approaches
target different bottlenecks. Pipeline patch aims to improve overall
cpu throughput
and to save CPU time by offloading the steps we can safely do in parallel with
out causing synchronization problems.

> One thing I am curious about in the current patch: WAL records are
> already in a serialized format on disk. The producer decodes them and
> then re-serializes into a different custom format for shm_mq. What is
> the advantage of this second serialization format over simply passing
> the raw WAL bytes after CRC validation and letting the consumer decode
> directly? Offloading CRC to a separate core could still improve
> throughput at the cost of higher total CPU usage, without needing the
> custom format.
>

Thanks. You are right there was no need to serialize the decoded record again.
I was not aware that we already have continuous bytes in memory. In my
next patch
I will remove this extra serialization step.

> Koichi's approach parallelizes redo (buffer I/O) itself, which attacks
> a larger cost — Jakub's flamegraphs show BufferAlloc ->
> GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 — but at
> the expense of much harder concurrency problems.
>
> Whether the decode pipelining ceiling is high enough, or whether the
> redo parallelization complexity is tractable, seems like the central
> design question for this area.

I still have to investigate the problem related to `GetVictimBuffer` that
Jakub mentioned. But I was trying that how can we safely offload the work done
 by `XLogReadBufferForRedoExtended` to a separate
pipeline worker, or maybe we can try prefetching the buffer header so
the main redo
loop doesn't have to spend time getting the buffer

Thanks for the feedback. That was helpful.


Regards,
Imran Zaheer

view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: [WIP] Pipelined Recovery
  In-Reply-To: <CA+UBfa=qDfWB90w5AsmX4f3PbeeM++GbaoVagd9ff-DKQDLvWA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox