Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umd5E-00170t-Tj for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 18:53:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umd5C-008tvL-U6 for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 18:53:51 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umd5B-008tvD-P4 for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 18:53:50 +0000 Received: from fhigh-b8-smtp.messagingengine.com ([202.12.124.159]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1umd58-000mFZ-31 for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 18:53:49 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfhigh.stl.internal (Postfix) with ESMTP id 26DAC7A00D2; Thu, 14 Aug 2025 14:53:44 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-03.internal (MEProxy); Thu, 14 Aug 2025 14:53:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1755197624; x=1755284024; bh=s4BbuqNNHU 0NxouxcfyR9+Bv4FtoxcB/1Dh5AF79rrw=; b=j5O729sZUrWxiYxbddnJb7ZmzD GPYVLqBdBm2FLZYxprLifreNbkUoJ4RozFsZs4Hmth0ts+fUTjRybbJKEOfrKMoe aJtGc3nLZX6DfKQJk2cLMWo2GCbslEzidDuNoUKSgdEdKgRsgkGQfHC/yq8Hcbtt nUE3cXxdKtgOCbXVC/wbkCYINJOw+12oiJBqEPXkw1T1Sf+jNRJbLMvOXk9zInK8 6A6QLheUkxJqZvc0JuDQL6SznEGZA9yyBkWLN97kzvUr4pEsEQsnS9g8Ccx13M9k jMQexaj4OQllwlwFXx78RlG/Lbqx66y2L/KmQsqGzn9vQFkwunaTgnTP/o0Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1755197624; x=1755284024; bh=s4BbuqNNHU0NxouxcfyR9+Bv4FtoxcB/1Dh 5AF79rrw=; b=T16ofcmfH8ZuHIQNQqvNq+Lx8PAlJDCiqEhXAz/nQF2VSn8Vz/D QlzyJbS+9LKiaCAroCj5fGWwTnPOG+y+5rU79CV1VmjKsJep7iu7UP/WhZ0oz0Ky cLq8toizXmb3woyn/FqmWgV599Qd+duAtF321+4M3gdYF9EDr/44haaq0z4mrbkq EgBBPsf/JM0C9zOwI5i81RIhEyGKRHys9NiRf/2uIaPdaL1T+a5aYJMfOZ5azfEy QgYfp+sb3iJc3G/cnza3CTH8WwzaayPpA0WTMZwk9ZmEB87eg3u7YFhaVVbGMNDo Bzf4qYx2pm9/nYoFcRPhtliy4YWVc3C/h6w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgddugedukeehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtsfdttddtvdenucfhrhhomheptehnughrvghs ucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrfgrth htvghrnhepfeffgfelvdffgedtveelgfdtgefghfdvkefggeetieevjeekteduleevjefh ueegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprg hnughrvghssegrnhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopedutddpmhhouggv pehsmhhtphhouhhtpdhrtghpthhtohepphhgsegsohifthdrihgvpdhrtghpthhtohepkh hnihiihhhnihhksehgrghrrhgvthdrrhhupdhrtghpthhtohepsgihrghvuhiikedusehg mhgrihhlrdgtohhmpdhrtghpthhtohepughilhhiphgsrghlrghuthesghhmrghilhdrtg homhdprhgtphhtthhopehmvghlrghnihgvphhlrghgvghmrghnsehgmhgrihhlrdgtohhm pdhrtghpthhtoheprhhosggvrhhtmhhhrggrshesghhmrghilhdrtghomhdprhgtphhtth hopehthhhomhgrshdrmhhunhhrohesghhmrghilhdrtghomhdprhgtphhtthhopehpghhs qhhlqdhhrggtkhgvrhhssehlihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrghdprhgtph htthhopehgkhhokhholhgrthhoshesphhrohhtohhnmhgrihhlrdgtohhm X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 14 Aug 2025 14:53:42 -0400 (EDT) Date: Thu, 14 Aug 2025 14:53:42 -0400 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <6wyxbnry2unm3kbcu2sabhzhs7baoedlg77xqm42chpofjq45g@igst42zpl7ok> References: <6cb6109d-71d6-490c-8056-d8885081b008@vondra.me> <7c308720-5cd4-4508-8cc0-6e55a6352731@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-08-14 14:44:44 -0400, Peter Geoghegan wrote: > On Thu Aug 14, 2025 at 1:57 PM EDT, Peter Geoghegan wrote: > > The only interesting thing about the flame graph is just how little > > difference there seems to be (at least for this particular perf event > > type). > > I captured method_io_uring.c DEBUG output from running each query in the > server log, in the hope that it would shed some light on what's really going > on here. I think that it just might. > > I count a total of 12,401 distinct sleeps for the sequential/slow backwards > scan test case: > > $ grep -E "wait_one with [1-9][0-9]* sleeps" sequential.txt | head > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.278 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.279 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.279 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.279 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > 2025-08-14 14:35:03.279 EDT [2516983][client backend] [[unknown]][0/1:0] DEBUG: 00000: wait_one with 1 sleeps > $ grep -E "wait_one with [1-9][0-9]* sleeps" sequential.txt | awk '{ total += $11 } END { print total }' > 12401 > > But there are only 3 such sleeps seen when the random backwards scan query is > run -- which might begin to explain the mystery of why it runs so much faster: > > $ grep -E "wait_one with [1-9][0-9]* sleeps" random.txt | awk '{ total += $11 } END { print total }' > 104 I think this is just an indicator of being IO bound. That message is output whenever we have to wait for IO to finish. So if one workload prints that a 12k times and another 104 times, that's because the latter didn't have to wait for IO to complete, because it already had completed by the time we needed the IO to have finished to continue. Factors potentially leading to slower IO: - sometimes random IO *can* be faster for SSDs, because it allows different flash chips to work concurrently, rather than being bound by the speed of one one flash chip - it's possible that with your SSD the sequential IO leads to more IO combining. Larger IOs always have a higher latency than smaller IOs - but obviously fewer IOs are needed. The increased latency may be bad enough for your access pattern to trigger more waits. It's *not* necessarily enough to just lower io_combine_limit, the OS also can do combining. I'd see what changes if you temporarily reduce /sys/block/nvme6n1/queue/max_sectors_kb to a smaller size. Could you show iostat for both cases? Greetings, Andres Freund