Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1utsW6-00FMwd-3R for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Sep 2025 18:47:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1utsW3-00AuSa-Iv for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Sep 2025 18:47:32 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1utsW2-00AuSS-KW for pgsql-hackers@lists.postgresql.org; Wed, 03 Sep 2025 18:47:31 +0000 Received: from fout-a3-smtp.messagingengine.com ([103.168.172.146]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1utsW0-000NKp-3D for pgsql-hackers@lists.postgresql.org; Wed, 03 Sep 2025 18:47:30 +0000 Received: from phl-compute-07.internal (phl-compute-07.internal [10.202.2.47]) by mailfout.phl.internal (Postfix) with ESMTP id DC84FEC0352; Wed, 3 Sep 2025 14:47:27 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-07.internal (MEProxy); Wed, 03 Sep 2025 14:47:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1756925247; x=1757011647; bh=NpOK1e9r31onwwiI2BX+y01CvTSj7Oo0csueuL5BH3c=; b= JG/DbNm6eVh4kJeKn9vB8TkZvsnaDXLmQPjiIOTzBX7ug+q5kJCF7bNXMhvQaydW 1eJfuDpM9gGCK45R0UcBi7VNoyDt27lkYj6By35vM3/RfnDeUs0eS6hDJn1QeM1Y IviuwxgknL+rCkFkIgwhLLU8i0gKbMQ+cpYGwyRaLFyHy094Z/txHPDgzjJeYzFH ++v4SbyAEdXbENtZXHcVYRw/p2StYQwo6K+MfbB9yk1AUby0Job7ZVitH9qXZQTf b7JZPAsZ7ecD3+bc/Uyx6+sNJMCjHj1D0kdEbQPEMuVOZ1Ak66vSfPlDl9s1LQ6z +2YxmDLfml2GNGalHM8PmQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1756925247; x= 1757011647; bh=NpOK1e9r31onwwiI2BX+y01CvTSj7Oo0csueuL5BH3c=; b=Z wUUbyvKa7qbH9aFZp5FIs9eaMBUZQnpiSsbbR51ICy94Dqa9xxNR7UNoE2ofj8qP VALWmB+IEyYsyCowjQbg6u9RggjuLq9vGN7VB/aPtVmNSZuWxKqEfb+tqknbnUWd 0LimIamNi7Lmmhk8H59dnyrh43z6bA6vVJMUT+eHeqvYvz8U+eIiagJ4vlvMhQDi nX9wyc7TmpGyf9ZF5LKZF8NOwMzTHZnfqXnOM5AEa6y4Z8nXzZTFLSu/S3Hrp8KJ 1U/87ZnOM+M3TB0xs0Rn8DgENHH/rPDJAkpEFC5703TPA/KQJGTX6LEEkIc7r9ed zkBk9051JGZ13DjDzT+AA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdefkeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceurghi lhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurh epfffhvfevuffkfhggtggugfgjsehtkefstddttdejnecuhfhrohhmpeetnhgurhgvshcu hfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtffrrghtth gvrhhnpedtleelvdfgjedvffeiueekfeeuleffhfegfffhgfffkeevueehieehhfeigffh vdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnh gurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepuddtpdhmohguvgep shhmthhpohhuthdprhgtphhtthhopehpghessghofihtrdhivgdprhgtphhtthhopehknh hiiihhnhhikhesghgrrhhrvghtrdhruhdprhgtphhtthhopegshigrvhhuiiekudesghhm rghilhdrtghomhdprhgtphhtthhopeguihhlihhpsggrlhgruhhtsehgmhgrihhlrdgtoh hmpdhrtghpthhtohepmhgvlhgrnhhivghplhgrghgvmhgrnhesghhmrghilhdrtghomhdp rhgtphhtthhopehrohgsvghrthhmhhgrrghssehgmhgrihhlrdgtohhmpdhrtghpthhtoh epthhhohhmrghsrdhmuhhnrhhosehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhq lhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhgpdhrtghpth htohepghhkohhkohhlrghtohhssehprhhothhonhhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 3 Sep 2025 14:47:26 -0400 (EDT) Date: Wed, 3 Sep 2025 14:47:25 -0400 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: References: <1c9302da-c834-4773-a527-1c1a7029c5a3@vondra.me> <6d59c277-c440-4d1f-a46e-157958c06a5f@vondra.me> <5pltwb73d7cynsxo2yb54ygjk7haviatkrx43mnzihc6kkield@ahnstpgof46i> <931afce3-8c86-4c96-9861-0ffa17c6560f@vondra.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, I spent a fair bit more time analyzing this issue. On 2025-08-28 21:10:48 -0400, Andres Freund wrote: > On 2025-08-28 19:57:17 -0400, Peter Geoghegan wrote: > > On Thu, Aug 28, 2025 at 7:52 PM Tomas Vondra wrote: > > I'm not sure that Thomas'/your patch to ameliorate the problem on the > > read stream side is essential here. Perhaps Andres can just take a > > look at the test case + feature branch, without the extra patches. > > That way he'll be able to see whatever the immediate problem is, which > > might be all we need. > > It seems caused to a significant degree by waiting at low queue depths. If I > comment out the stream->distance-- in read_stream_start_pending_read() the > regression is reduced greatly. > > As far as I can tell, after that the process is CPU bound, i.e. IO waits don't > play a role. Indeed the actual AIO subsystem is unrelated, from what I can tell: I hacked up read_stream.c/bufmgr.c to do readahead even if the buffer is in shared_buffers. With that, the negative performance impact of doing enable_indexscan_prefetch=1 is of a similar magnitude even if the table is already entirely in shared buffers. I.e. actual IO is unrelated. I compared perf stat -ddd output for enable_indexscan_prefetch=0 with enable_indexscan_prefetch=1. The only real difference is a substantial (~3x) increase in branch misses. I then took a perf profile to see where all those misses are from. The first souce is: > I see a variety for increased CPU usage: > > 1) The private ref count infrastructure in bufmgr.c gets a bit slower once > more buffers are pinned The problem mainly seems to be that the branches in the loop at the start of GetPrivateRefCountEntry() are entirely unpredictable in this workload. I had an old patch that tried to make it possible to use SIMD for the search, by using a separate array for the Buffer ids - with that gcc generates fairly crappy code, but does make the code branchless. Here that substantially reduces the overhead of doing prefetching. Afterwards it's not a meaningful source of misses anymore. > 3) same issue with the resowner tracking This one is much harder to address: a) The "key" we are searching for is much wider (16 bytes), making vectorization of the search less helpful b) because we search up to owner->narr instead of a fixed-length, the compiler wouldn't be able to auto-vectorize anyway c) the branch-misses are partially caused by ResourcOwnerForget() "scrambling" the order in the array when forgetting an element I don't know how to fix this right now. I nevertheless wanted to see how big the impact of this is, so I just neutered ResourceOwner{Remember,Forget}{Buffer,BufferIO} - that's obviously not correct, but suffices to see that the performance difference reduces substantially. But not completely, unfortunately. > But there's some additional difference in performance I don't yet > understand... I still don't think I fully understand why the impact of this is so large. The branch misses appear to be the only thing differentiating the two cases, but with resowners neutralized, the remaining difference in branch misses seems too large - it's not like the sequence of block numbers is more predictable without prefetching... The main increase in branch misses is in index_scan_stream_read_next... Greetings, Andres Freund