Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0LDs-001pFN-1e for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 15:11:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w0LDq-009cKR-2j for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Mar 2026 15:11:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w0LDq-009cKH-1a for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 15:11:43 +0000 Received: from mail-dl1-x1235.google.com ([2607:f8b0:4864:20::1235]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w0LDo-000000029XY-0JHJ for pgsql-hackers@lists.postgresql.org; Wed, 11 Mar 2026 15:11:42 +0000 Received: by mail-dl1-x1235.google.com with SMTP id a92af1059eb24-1273349c56bso17936232c88.0 for ; Wed, 11 Mar 2026 08:11:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773241898; cv=none; d=google.com; s=arc-20240605; b=VFe1osCCf0XXzoytoHqI1GnQfzZFtuVmHSD2+Jc87VwEyKp+a+ibYotSSM+LefG3L2 y2vjsTBkxuzOtyuGvSq/ycWaHFLaKZbBwqf2dguU60iVrz5MsDxF1xQDqVsmXG4YpjYZ cGF71pxET8kmvj0nT2BJTy+iW/nzK3F2uYfBO4u3aZslR8vAevLNE8UMpqOetzyWOY/w 8sqKhAWscLM3UBqjDlRoX7Rddutid6hN4IOUXcWusZlby7eDEH9M7Bv1l5o4fFgAE2zW 30kK6wst6BsFJGqxnMooME4zTzowpdezPtuzZfoFSnl1bE5wVvC0Mg/xHt3zW7m5H+sv ojpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oS7S5bqayg6uZhznbDBSy3aGpNL+EKHQfF+xaOzD8pk=; fh=nPk/JnoBnIY7E1AmVysB4wKoQ55jaDaPyj1mOcXAJ8E=; b=LOQk33LA43goc8HG0zSvT5cnSWvv4XTG/mlAIPm4CztKbi9ts4yyIxPKWZ/avFy9V7 pdPJKetz1acECe7UqPalbnY4HN9Y1ReVzozyaxtpn7iCs4P6UkvbLVIZKLynPoknZKd5 AXM99KDfzXEBE71eFfcWwbGj85YQwQPX0wiRTCUUzEy1lbRRGQqFR9FPk7QHBKDpO6Pl KWdSzU5kaI+6dyOiPABaPlPxwdIPLSx+QYDMswElKd3ynq6yUvllT91OHfTZ8nlo9/wF pKWAqMGBDevj1GiSvCga7/8sc+6Ff3WwvqMMGXsSmt+qdKVBPphOd5ZHNQdgP0PCd2n8 ROGA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773241898; x=1773846698; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oS7S5bqayg6uZhznbDBSy3aGpNL+EKHQfF+xaOzD8pk=; b=YIyiIMTt8jS7wq62T+vu9hdmQ5kBp1GdqPyKcO0SL+UtRdPdPtqu+Y2mT4CJLZJWf9 cie6rg1vsCoEkiVD+e8Q1rugOMM01ZyqyjnoMVgMktMaea2kYmaOgc2QW2TlBuQXxQSA oDb8R6jXxBxPsp1pA0OiY2f1zUw29/XwMYoR9U0iGafuVXGDvwWN5/zPm/7Vd8zwyitA FIm/eSc7JlPtgEVGmLarCNjVDeeG7QD9CYvEgf1GHa7S7fcKGygkaVUoigKURpxSS7zU MjQZmZRN0LpCXOgUCc0vtIxH3/Wsp6LUH+mYCMRuF5Phu9FUj2NEX3Lu2I4GqdtlZQ/S U8ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773241898; x=1773846698; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=oS7S5bqayg6uZhznbDBSy3aGpNL+EKHQfF+xaOzD8pk=; b=ckyDsZ2nxyI/1SZ3+O33dEflaSjzf2kyO3WBhaPjK3DOgiRBDryALpY2lucyrVzXiC BCqY65HDjFZVa9rUbIjL0N6sBMVsWR4dGRBX9xt5QVkbdRZSwflY/f8rzwzGA9C1Qq7x vJXwW5sd7c4alxp/fWi52j8wkWbUbfy5tbTuP0QNwQggoehFBQm9w9Y+umG8OZL8p2LS nenmkfm71vDvi63C95s+K2c+Mp46M6cPWc9J8m2O9XPqlZDH+/ZyDTt0tMfWMywmrSOu zOBwHQpDZNbIVEAYRwzm8Y7gJ6MD0g/vj2Y53+zxa+Fn82BZ6sk8pf5MrU9mC7GitjmR Itwg== X-Forwarded-Encrypted: i=1; AJvYcCX8a6bPWX0GqC/FnrP9kfkoL2caST7gRZ0rYKTK9m5ENwWFqpLQryuD4qsPV4ZK5mwdP++Wr5q4mBhv8YGA@lists.postgresql.org X-Gm-Message-State: AOJu0YwSKtpbh/4sPKCSyNKRxXf5vk08BB2ElokSCINPFyuEk5iyxTl/ Gc2g0rPmAuZuN9ALr7YfmKipuLBQIYaSzDnClYrkSaNQ6uo6FgNv2N14vpF9S1wd/AlCrNGVPQZ s5J4rgP/HslhbDzybrgi/WOiYFSi4vug= X-Gm-Gg: ATEYQzxXfD6OfH1HUfYmai0tq9+mwqUkAQ+bshqQWgBndPLtRMPGnuNfyqHbAZyUZuL z4hRf1gspMyOqzOzDpqhRTfSoMkiy2T+m0iu6DArZmbKQ7qWmDRQvzmjUZ20GgR1dhL4NxfCt9d StjeDBDgwWCUqxtnYJ4S6/nTT3DVgr40glMQ7dziPvGLnAQdl1nl8bdhU2zbiGAhESjG229S33S ImxM5/rXsiftuGihYWT3a6qwVjicbxNHf//Fpi7xOVvhASaZ4SeWz0ZfxPQSrh1BfyTm7P5ddY6 Rk1kH868PxwHgJ0TB+gHPPPXGzkbm5o7cN/cqP1nN0yUJgOpT7Z4E0ybnaM1Oo82+MziPE5dPRv E+NlTkOHd X-Received: by 2002:a05:7022:425:b0:119:e569:f609 with SMTP id a92af1059eb24-128e779f026mr1242365c88.2.1773241898025; Wed, 11 Mar 2026 08:11:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Xuneng Zhou Date: Wed, 11 Mar 2026 23:11:23 +0800 X-Gm-Features: AaiRm507WUG3ePW0MW071zALyRF4r3kjsATTjPHetDXx4fYemvQNschzMBShmNA Message-ID: Subject: Re: Streamify more code paths To: Andres Freund Cc: Michael Paquier , pgsql-hackers , Nazir Bilal Yavuz Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Mar 11, 2026 at 9:37=E2=80=AFAM Xuneng Zhou = wrote: > > Hi Andres, > > On Wed, Mar 11, 2026 at 7:04=E2=80=AFAM Andres Freund wrote: > > > > Hi, > > > > On 2026-03-10 19:28:29 +0900, Michael Paquier wrote: > > > On Tue, Mar 10, 2026 at 02:06:12PM +0800, Xuneng Zhou wrote: > > > > Here=E2=80=99s v5 of the patchset. The wal_logging_large patch has = been > > > > removed, as no performance gains were observed in the benchmark run= s. > > > > > > Looking at the numbers you are posting, it is harder to get excited > > > about the hash, gin, bloom_vacuum and wal_logging. > > > > It's perhaps worth emphasizing that, to allow real world usage of direc= t IO, > > we'll need streaming implementation for most of these. Also, on windows= the OS > > provided readahead is ... not aggressive, so you'll hit IO stalls much = more > > frequently than you'd on linux (and some of the BSDs). > > > > It might be a good idea to run the benchmarks with debug_io_direct=3Dda= ta. > > That'll make them very slow, since the write side doesn't yet use AIO a= nd thus > > will do a lot of synchronous writes, but it should still allow to evalu= ate the > > gains from using read stream. > > > > > > The other thing that's kinda important to evaluate read streams is to t= est on > > higher latency storage, even without direct IO. Many workloads are not= at all > > benefiting from AIO when run on a local NVMe SSD with < 10us latency, b= ut are > > severely IO bound when run on a cloud storage disk with 0.5ms - 4ms lat= ency. > > > > > > To be able to test such higher latencies locally, I've found it quite u= seful > > to use dm_delay above a fast disk. See [1]. > > Thanks for the tips! I currently don=E2=80=99t have access to a machine o= r > cloud instance with slower SSDs or HDDs that have higher latency. I=E2=80= =99ll > try running the benchmark with debug_io_direct=3Ddata and dm_delay, as > you suggested, to see if the results vary. > > > > > > The worker method seems more efficient, may show that we are out of n= oise > > > level. > > > > I think that's more likely to show that memory bandwidth, probably due = to > > checksum computations, is a factor. The memory copy (from the kernel pa= ge > > cache, with buffered IO) and the checksum computations (when checksums = are > > enabled) are parallelized by worker, but not by io_uring. > > > > > > Greetings, > > > > Andres Freund > > > > > > [1] > > > > https://docs.kernel.org/admin-guide/device-mapper/delay.html > > > > Assuming /dev/md0 is mounted to /srv, and a delay of 1ms should be > > introduced for it: > > > > umount /srv && dmsetup create delayed --table "0 $(blockdev --getsz /= dev/md0) delay /dev/md0 0 1" /dev/md0 && mount /dev/mapper/delayed /srv/ > > > > To update the amount of delay to 3ms the following can be used: > > dmsetup suspend delayed && dmsetup reload delayed --table "0 $(blockd= ev --getsz /dev/md0) delay /dev/md0 0 3" /dev/md0 && dmsetup resume delayed > > > > (I will often just update the delay to 0 for comparison runs, as that > > doesn't require remounting) > With debug_io_direct=3Ddata and dm_delay, the results look quite promising! medium size / io_uring gin_vacuum_medium base=3D 1619.9ms patch=3D 301.8ms 5.37x ( 81.4%) (reads=3D1571=E2=86=92947, io_time=3D1524.86=E2=86=92207.48ms) The average runtime increases significantly after adding the manual device delay, so it will take some time to complete all the test runs. I was also busy with something else today... Once the runs are finished, I=E2=80=99ll share the results and the script to reproduce them. --=20 Best, Xuneng