Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFU80-005BWN-0U for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Apr 2026 09:44:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFU7y-00Cly9-1N for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Apr 2026 09:44:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFU7x-00Clxz-38 for pgsql-hackers@lists.postgresql.org; Wed, 22 Apr 2026 09:44:14 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wFU7v-00000002EJp-2CVt for pgsql-hackers@postgresql.org; Wed, 22 Apr 2026 09:44:12 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-671c24f23b1so7592255a12.0 for ; Wed, 22 Apr 2026 02:44:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776851050; cv=none; d=google.com; s=arc-20240605; b=MHqCrlTM862XbzWjn+eXullnt0/X7Ysx46S0HWMW/kP2QZ88dvvZ37nwoZilGaCgpJ aQOYZOtDAoWMHHLQmlWldupUUToU4fN0d5wbOja4CXlW5ZbAI0dj0UeviSaHD9QloPqn Vbnzck0/zc4z8qvAblW+LnZ7kh3XlPp0SXpGvh1x40wRXUyxwuP573bz0gnlm5f+ZVR9 g3NHbJZvrMdnpDnCh97RsqrzJzE2Sq9yAgn7CKkzN5KeD2sCl7HECLNZuCMH8ImsSXgq AbGljwxyOEabZbF9K4+Ec9J5R1kmBrVxiNdHUhW1FWY0jGR98/jgRkW1NUqE8XjQpAoS iorw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=QxlW6ASQPfRcMb5AYTczIs2tKnBe97Bkzdv3trredW4=; fh=AEqD8acZ84GZjUjOGmP1QDyt92lqLKw3r2kjzScFCyA=; b=QuL97gNwnclRKLzqXcnIZQgbWbxsFKqPRCApYKJinQBevWrF81wnszOUr+QggrJnZB mr8BNg8n45HbnInOoOelu2b1ipUkBPfF0AzGD0Adkg1MzEAHtkShvB43/CEVhMfSTfEx xd4Tpdrz/mjngXlnn2ModkocdViQsMZqhvppDAT/8BibyxcnbtJa+HwsPM9ajX1Cy7XL DzwAcsL1AK1UI3nFivAW1+b13M5BknZF1TKN9p/iH4x3UU7VGfa5fgeEX6sUHNEJ4YEw QrNIm4TVtokQ4SXHZKB2dSYfF7QQHrHCIR769oabrMH1OyAuj3RzG61w0bmGSFcOHVxs erAw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776851050; x=1777455850; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QxlW6ASQPfRcMb5AYTczIs2tKnBe97Bkzdv3trredW4=; b=FUvlPXL/TPRZW+0x3BxRmWVm7YU9N2I90YejjOfafOJako1aEK1sPXVDsSGkvnEG3d RL07nH1FFBQ4Pxj3gVLoYJmpXS9EhWV9zCbUq6oY7ObHXiRxGKH8YqdAYIwTbNUuDxpx upHK05KWFGL0/60h+5viFj/xO7pmX7YM8BOocJcnlz3Fe0gFDLzcaaMf7KZn4KvPbt7B W/z/cp64vS0hToaxaUiTacq0Ji0udK8mXr1tAjIlX4Z7Bfp2XS3QthPuNeosn3H2oBR0 aaS0Lq3xv0Njth5DXgUUUXlESIyB0NYHuvp8B17jwkpuXT9E82mLgMBEioqqHkpPMXzc FItg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776851050; x=1777455850; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QxlW6ASQPfRcMb5AYTczIs2tKnBe97Bkzdv3trredW4=; b=IK4KQEbNkQumIKJ8GkwyzXysStVvZssIAZub7X24FLUUD9A42GGIgJ8jgYfYnvAmTw wmy+V+EmL/cMJBQiN1KbEfFGRhQcRLYn8DbjK6AY28tXlkNbKRS1k78bfFQ5/XCr+RzR IdbTiOnlpFBGzm1/yZpOzLZzN+i2IzvtPTPSeHJjOfmT24t5NPbBeYaw/y52DvaJHam4 8J66NDEBfDaY55KW//ymk1SuvG31JNBZ4amBczm561OWbVh7rJfJuRnBeGjAmzfS0IrO RyJQES5y6W89a0A0cN7sCRO2KWxyKy/mKEhmFeEId0WXXrCFr9y/CQIck94KsRciEclz GZtQ== X-Forwarded-Encrypted: i=1; AFNElJ+yrlHTRGMaN5X8c466nz9Wv1nXKV9x2Gum2dvmD+lVBiQdkvEjUJYH9rPCsGgH6YI7wcede/rFd3IbWOCH@postgresql.org X-Gm-Message-State: AOJu0Ywc+b5MZwm4V3rj2P86okv6j5285CRCC0NpT9zt0l184xys1/PF AWXimoA9q1G2KlYUNHnJcxdgWhXi6QBhliysdr/IzXbAQdYCYSSf3eUHPHsOoQLgNf3Rya05mtU p1RVbdqoX281SoTsVyMQHCyyXrO0kRSQ= X-Gm-Gg: AeBDiesynnFstTPl2h8dcdn9RqLN4882anQ/eUNvQxZKqYIknzelwfDFWHPipB0g+hV wKL9po9P+XvEHJhuaI63pQfC3u44pqh63lEloAu8FZGHzhKxV0O7mMPFZQkarcLxTtIKLFbL6/C qlqGGtk4hHK3tdserBsSlgC51NkaPKHlRT6bdRCjkWHHM6fI8tyuzwX4COeBtJ1tgfZQquJXrh0 8AKo6hkQuhNtOdjLvDhwf6Fy1IekwRU0wRG1d6IfWxcYsmUALO370AmRBhJ61jcVGqMWDLXucTi J7njNYftYptwFGCBg20yscActtxNd+9AqFx5fW09d1D1d+ovbb7YV2+jRjVn/GVtK1IP8RsXHGv UOkAPlwxO4aGjLgQ/ X-Received: by 2002:a05:6402:c46:b0:670:d548:da79 with SMTP id 4fb4d7f45d1cf-672bfd87175mr9880175a12.3.1776851049592; Wed, 22 Apr 2026 02:44:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Xuneng Zhou Date: Wed, 22 Apr 2026 17:43:56 +0800 X-Gm-Features: AQROBzDFfssHUoPlTCiQmgyAGFd6567ob62_34nOF7r_zgOXIPpmiZcMGWQpl6E Message-ID: Subject: Re: [WIP] Pipelined Recovery To: Imran Zaheer , assam258@gmail.com Cc: Zsolt Parragi , Jakub Wartak , "Hayato Kuroda (Fujitsu)" , pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Henson, Imran, On Wed, Apr 8, 2026 at 7:14=E2=80=AFPM Imran Zaheer = wrote: > > Hi > > I am uploading the new version with the following fixes > > * Rebased version. > * Skip serialization of decoded records. As pointed out by Henson, > there was no need to serialize the records again > for the sh_mq. We can simply pass the continuous bytes with minor > pointer fixing to the sh_mq > > This time I am uploading the benchmarking results to drive and > attaching the link here. Otherwise my mail will get holded for > moderation (My guess is overall attachment size is greater than 1MB thats= why). > > I am still not sure whether my testing approach is good enough. > Because sometimes I am not able to get the same performance > improvement > with the pgbench builtin scripts as I got with the custom sql scripts. > Maybe pgbench is not creating enough WAL to test on > or maybe I am missing something. > > Benchmarks: https://drive.google.com/file/d/1Y4SYVnrFEQRE5T2r87rrTr7SWC9m= 19Si/view?usp=3Dsharing > > Thanks & Regards > Imran Zaheer > > Imran Zaheer > > On Wed, Apr 8, 2026 at 1:46=E2=80=AFPM Imran Zaheer wrote: > > > > > > > > Hi Xuneng, Imran, and everyone, > > > > > > > Hi Henson and Xuneng. > > > > Thanks for explaining the approaches to Xuneng. > > > > > > > > The two approaches target different bottlenecks. The current patch > > > parallelizes WAL decoding, which keeps the redo path single-threaded > > > and avoids the Hot Standby visibility problem entirely. > > > > > > > You are right both approaches > > target different bottlenecks. Pipeline patch aims to improve overall > > cpu throughput > > and to save CPU time by offloading the steps we can safely do in parall= el with > > out causing synchronization problems. > > > > > One thing I am curious about in the current patch: WAL records are > > > already in a serialized format on disk. The producer decodes them and > > > then re-serializes into a different custom format for shm_mq. What is > > > the advantage of this second serialization format over simply passing > > > the raw WAL bytes after CRC validation and letting the consumer decod= e > > > directly? Offloading CRC to a separate core could still improve > > > throughput at the cost of higher total CPU usage, without needing the > > > custom format. > > > > > > > Thanks. You are right there was no need to serialize the decoded record= again. > > I was not aware that we already have continuous bytes in memory. In my > > next patch > > I will remove this extra serialization step. > > > > > Koichi's approach parallelizes redo (buffer I/O) itself, which attack= s > > > a larger cost =E2=80=94 Jakub's flamegraphs show BufferAlloc -> > > > GetVictimBuffer -> FlushBuffer dominating in both p0 and p1 =E2=80=94= but at > > > the expense of much harder concurrency problems. > > > > > > Whether the decode pipelining ceiling is high enough, or whether the > > > redo parallelization complexity is tractable, seems like the central > > > design question for this area. > > > > I still have to investigate the problem related to `GetVictimBuffer` th= at > > Jakub mentioned. But I was trying that how can we safely offload the wo= rk done > > by `XLogReadBufferForRedoExtended` to a separate > > pipeline worker, or maybe we can try prefetching the buffer header so > > the main redo > > loop doesn't have to spend time getting the buffer Thanks for your clarification! I'll try to review this patch later. -- Best, Xuneng