Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q7Pdt-0005cT-Oz for pgsql-hackers@arkaria.postgresql.org; Fri, 09 Jun 2023 00:06:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1q7Pds-00009s-Kj for pgsql-hackers@arkaria.postgresql.org; Fri, 09 Jun 2023 00:06:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q7Pdr-00009U-Ia for pgsql-hackers@lists.postgresql.org; Fri, 09 Jun 2023 00:06:12 +0000 Received: from wout2-smtp.messagingengine.com ([64.147.123.25]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1q7Pdm-00178y-5E for pgsql-hackers@lists.postgresql.org; Fri, 09 Jun 2023 00:06:11 +0000 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id E8755320014C; Thu, 8 Jun 2023 20:06:02 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Thu, 08 Jun 2023 20:06:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1686269162; x=1686355562; bh=W4 WOC5+wIJNoCA3v6wtnMPfxMmkIOoZOG8X4sKBm7x0=; b=hhkzfs7KpBiHNA2gJf nktaJ4c1z5nod4f1MhaNi0N8pHWKQ9uwTJLl6mqCHgTguMcVmuQ9BIffOhu8fZam +BZTinmfOh0IhfyN6JB74QLTEa+XKMT1/fIfzuGvrw0gjoWvkWuXzw02qHj8lI/T KEBVGfvk4igLvp3I+9frZxQcarSzfb6hX86YqmzntOXZx5Yht0RxtEtiOn3hp5jP cTfzRsf834siwlwNZHxQY9yf2KfMHio0k6s4PJqxilvskOfkdJLEvQyM6GbirMps i2WbGAGvJ5mb9wdN40dPlCvU34msAfCTbQeUTwncpMuGSJZZbLA304J1t3L9m0m5 L/LA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1686269162; x=1686355562; bh=W4WOC5+wIJNoC A3v6wtnMPfxMmkIOoZOG8X4sKBm7x0=; b=RtxxtIWnysps+OQ84oz9i+hDkaDBV qfsbxZpPo9155qfUNwNs8juBetlMly0JmnRPlnnVHlw+lTnxls8CMOZzJqc/UmVC Lrn2qLlRtDxNNxqbqfBAuZHSt1eEEaTr01Js+3ZdljX78rK46Hed+Rvjc/6pSOjR 1mwtheh8JZnHOKhMRTFXZA9liVmfDWwvU7cr5wBqLg5ro5SFUCDG+O7iRueSuHBj KhtHhnokP7POlIQWt2bQH10a2HGH6GrovqwyoLTa7ujNQAaqa9pJWaBMVFSpgFMs 3HC2j4q7u1V9fpwWLnERJg5/SozsapoYaITV6cqE9cILiuojouVhFRBzQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgedtjedgfedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdtredttddtvdenucfhrhhomheptehnughr vghsucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrf grthhtvghrnhepvdfffeevhfetveffgeeiteefhfdtvdffjeevhfeuteegleduheetvedu ieettddunecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomh eprghnughrvghssegrnhgrrhgriigvlhdruggv X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 8 Jun 2023 20:06:02 -0400 (EDT) Date: Thu, 8 Jun 2023 17:06:00 -0700 From: Andres Freund To: Tomas Vondra Cc: PostgreSQL Hackers , Georgios Subject: Re: index prefetching Message-ID: <20230609000600.syqy447e6metnvyj@awork3.anarazel.de> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2023-06-08 17:40:12 +0200, Tomas Vondra wrote: > At pgcon unconference I presented a PoC patch adding prefetching for > indexes, along with some benchmark results demonstrating the (pretty > significant) benefits etc. The feedback was quite positive, so let me > share the current patch more widely. I'm really excited about this work. > 1) pairing-heap in GiST / SP-GiST > > For most AMs, the index state is pretty trivial - matching items from a > single leaf page. Prefetching that is pretty trivial, even if the > current API is a bit cumbersome. > > Distance queries on GiST and SP-GiST are a problem, though, because > those do not just read the pointers into a simple array, as the distance > ordering requires passing stuff through a pairing-heap :-( > > I don't know how to best deal with that, especially not in the simple > API. I don't think we can "scan forward" stuff from the pairing heap, so > the only idea I have is actually having two pairing-heaps. Or maybe > using the pairing heap for prefetching, but stashing the prefetched > pointers into an array and then returning stuff from it. > > In the patch I simply prefetch items before we add them to the pairing > heap, which is good enough for demonstrating the benefits. I think it'd be perfectly fair to just not tackle distance queries for now. > 2) prefetching from executor > > Another question is whether the prefetching shouldn't actually happen > even higher - in the executor. That's what Andres suggested during the > unconference, and it kinda makes sense. That's where we do prefetching > for bitmap heap scans, so why should this happen lower, right? Yea. I think it also provides potential for further optimizations in the future to do it at that layer. One thing I have been wondering around this is whether we should not have split the code for IOS and plain indexscans... > 4) per-leaf prefetching > > The code is restricted only prefetches items from one leaf page. If the > index scan needs to scan multiple (many) leaf pages, we have to process > the first leaf page first before reading / prefetching the next one. > > I think this is acceptable limitation, certainly for v0. Prefetching > across multiple leaf pages seems way more complex (particularly for the > cases using pairing heap), so let's leave this for the future. Hm. I think that really depends on the shape of the API we end up with. If we move the responsibility more twoards to the executor, I think it very well could end up being just as simple to prefetch across index pages. > 5) index-only scans > > I'm not sure what to do about index-only scans. On the one hand, the > point of IOS is not to read stuff from the heap at all, so why prefetch > it. OTOH if there are many allvisible=false pages, we still have to > access that. And if that happens, this leads to the bizarre situation > that IOS is slower than regular index scan. But to address this, we'd > have to consider the visibility during prefetching. That should be easy to do, right? > Benchmark / TPC-H > ----------------- > > I ran the 22 queries on 100GB data set, with parallel query either > disabled or enabled. And I measured timing (and speedup) for each query. > The speedup results look like this (see the attached PDF for details): > > query serial parallel > 1 101% 99% > 2 119% 100% > 3 100% 99% > 4 101% 100% > 5 101% 100% > 6 12% 99% > 7 100% 100% > 8 52% 67% > 10 102% 101% > 11 100% 72% > 12 101% 100% > 13 100% 101% > 14 13% 100% > 15 101% 100% > 16 99% 99% > 17 95% 101% > 18 101% 106% > 19 30% 40% > 20 99% 100% > 21 101% 100% > 22 101% 107% > > The percentage is (timing patched / master, so <100% means faster, >100% > means slower). > > The different queries are affected depending on the query plan - many > queries are close to 100%, which means "no difference". For the serial > case, there are about 4 queries that improved a lot (6, 8, 14, 19), > while for the parallel case the benefits are somewhat less significant. > > My explanation is that either (a) parallel case used a different plan > with fewer index scans or (b) the parallel query does more concurrent > I/O simply by using parallel workers. Or maybe both. > > There are a couple regressions too, I believe those are due to doing too > much prefetching in some cases, and some of the heuristics mentioned > earlier should eliminate most of this, I think. I'm a bit confused by some of these numbers. How can OS-level prefetching lead to massive prefetching in the alread cached case, e.g. in tpch q06 and q08? Unless I missed what "xeon / cached (speedup)" indicates? I think it'd be good to run a performance comparison of the unpatched vs patched cases, with prefetching disabled for both. It's possible that something in the patch caused unintended changes (say spilling during a hashagg, due to larger struct sizes). Greetings, Andres Freund