Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rai7D-00B5Ox-U9 for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 20:13:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rai7B-00FIXc-PQ for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 20:13:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rai79-00FIXT-Kk for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 20:13:49 +0000 Received: from wout5-smtp.messagingengine.com ([64.147.123.21]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rai75-006vvl-Oz for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 20:13:46 +0000 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 388F032001C6; Thu, 15 Feb 2024 15:13:40 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Thu, 15 Feb 2024 15:13:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1708028019; x=1708114419; bh=J2yTBIFxW0bS4P9gt9xGjnH8UV5R649v5MyMxfYhFCc=; b= DcQmSduH5RcujL2hlvEsUOf2Brai9jaV5OhieND4Ll5LiX26LLPaan3o+e8P1kgT tYENXWhTDcy8UKF4BEGLdHIqaqq0hQyPP8mdRiahx2rdDpYP7VS5Xe/IDI8S5MB9 dcK8tKnK0vqrLuds1J04exQhnRHhJE8TzU+PDO71AbpJmYMago+bXQtCqtyFKS/z 6xKIE6ZzpdqqZfic2JX3Dcn81XQQl7Fh9PB1EcIV+t8KT586Ic7F52i9vsD6rueV He7COic3zpIXPk8rKM5IR6wEj9fY2pmEZw6cp001EDARFvjsQI5FkmVwQQhKg0aK eEkPiHj65bYGr1yT5a4k8g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1708028019; x= 1708114419; bh=J2yTBIFxW0bS4P9gt9xGjnH8UV5R649v5MyMxfYhFCc=; b=D LyWJBJ3h9Po9efo0U5F3o1b5L1khyJtX5stNlqyVEioa42HVYL+4fpYU3Cq6Ck1Y ZyAxM27whGCVeIioCa/JWLUeXjECZjX0jnvCpCQeWfLJOv0oEsHrG6+M2S33Zxbz bAfQ0SEJWF3qY+IqsRvswtMhi/JdDQluMx7QPon2eqZM60NqdoTsicSvE3zLuzCk pmK3drH+NO8y8RddIyYp00BlFfxtIWkQyFyQA84AdmtcU3nZdrH+BIwgMrwwJp+E H1iTVJu9XSzubsTDNfnHhuqBAgncWn1ZSdjVY5qayF9x4uak34dx9+6CXsYx6vk7 eSIJh2gjtCm2ufQ2T7MRw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvddtgddufeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggugfgjsehtkeertddttdejnecuhfhrohhmpeetnhgu rhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtf frrghtthgvrhhnpefhieetgfelveettdffjeevudejkeejudfgtdevffejledvtdefkeeg gfejhffggeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegrnhgurhgvshesrghnrghrrgiivghlrdguvg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 15 Feb 2024 15:13:38 -0500 (EST) Date: Thu, 15 Feb 2024 12:13:37 -0800 From: Andres Freund To: Peter Geoghegan Cc: Tomas Vondra , Melanie Plageman , Robert Haas , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Subject: Re: index prefetching Message-ID: <20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de> References: <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2024-02-15 12:53:10 -0500, Peter Geoghegan wrote: > On Thu, Feb 15, 2024 at 12:26 PM Tomas Vondra > wrote: > > I may be missing something, but it seems fairly self-evident to me an > > entry at the beginning of an index page won't get prefetched (assuming > > the page-at-a-time thing). > > Sure, if the first item on the page is also the first item that we > need the scan to return (having just descended the tree), then it > won't get prefetched under a scheme that sticks with the current > page-at-a-time behavior (at least in v1). Just like when the first > item that we need the scan to return is from the middle of the page, > or more towards the end of the page. > > It is of course also true that we can't prefetch the next page's > first item until we actually visit the next page -- clearly that's > suboptimal. Just like we can't prefetch any other, later tuples from > the next page (until such time as we have determined for sure that > there really will be a next page, and have called _bt_readpage for > that next page.) > > This is why I don't think that the tuples with lower page offset > numbers are in any way significant here. The significant part is > whether or not you'll actually need to visit more than one leaf page > in the first place (plus the penalty from not being able to reorder > the work across page boundaries in your initial v1 of prefetching). To me this your phrasing just seems to reformulate the issue. In practical terms you'll have to wait for the full IO latency when fetching the table tuple corresponding to the first tid on a leaf page. Of course that's also the moment you had to visit another leaf page. Whether the stall is due to visit another leaf page or due to processing the first entry on such a leaf page is a distinction without a difference. > > That's certainly true / helpful, and it makes the "first entry" issue > > much less common. But the issue is still there. Of course, this says > > nothing about the importance of the issue - the impact may easily be so > > small it's not worth worrying about. > > Right. And I want to be clear: I'm really *not* sure how much it > matters. I just doubt that it's worth worrying about in v1 -- time > grows short. Although I agree that we should commit a v1 that leaves > the door open to improving matters in this area in v2. I somewhat doubt that it's realistic to aim for 17 at this point. We seem to still be doing fairly fundamental architectual work. I think it might be the right thing even for 18 to go for the simpler only-a-single-leaf-page approach though. I wonder if there are prerequisites that can be tackled for 17. One idea is to work on infrastructure to provide executor nodes with information about the number of tuples likely to be fetched - I suspect we'll trigger regressions without that in place. One way to *sometimes* process more than a single leaf page, without having to redesign kill_prior_tuple, would be to use the visibilitymap to check if the target pages are all-visible. If all the table pages on a leaf page are all-visible, we know that we don't need to kill index entries, and thus can move on to the next leaf page Greetings, Andres Freund