Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q85Ig-0007jd-CX for pgsql-hackers@arkaria.postgresql.org; Sat, 10 Jun 2023 20:35:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1q85Ie-0004hY-W3 for pgsql-hackers@arkaria.postgresql.org; Sat, 10 Jun 2023 20:35:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q85Ie-0004hM-MR for pgsql-hackers@lists.postgresql.org; Sat, 10 Jun 2023 20:35:04 +0000 Received: from wout5-smtp.messagingengine.com ([64.147.123.21]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1q85Ib-001TJP-JX for pgsql-hackers@lists.postgresql.org; Sat, 10 Jun 2023 20:35:04 +0000 Received: from compute2.internal (compute2.nyi.internal [10.202.2.46]) by mailout.west.internal (Postfix) with ESMTP id 743553200201; Sat, 10 Jun 2023 16:34:58 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute2.internal (MEProxy); Sat, 10 Jun 2023 16:34:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1686429298; x=1686515698; bh=4N 9o92ZD0aniM8bISAxFQIGgGpkvLRUjuyzoFTyvYnM=; b=nVfVp3tZBDVY8n9Ssf Kif36HqplzF2d5t1p7xPdaPH2m4waArFA4qf+CpyjUjlmi/LslpfzfAdx+5h0y+6 ya8WcX3TJe9BptvzcbQbWOORE1sfBUaAA5W+yGzNAJzxmNKNxHqOPHXtWPm4ufDN r//eYspCpmkeYWd7VJR/rl3tHQnlZQstErm57mvp8BMGNuGnan7ZEuFzaUaSUDrh ge6Z8qOOILM5oyBmQQnISIvvoqEpTufZCIfU1ApiqmkYvg3B07Dr/CZ9ZTRiXwUZ 4ppAx/EYqtLQyXJSl503dN4f03baJThs4jJZbquGjr23CLYqhXpeyVkqFKblOIiv OMcg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; t=1686429298; x=1686515698; bh=4N9o92ZD0aniM 8bISAxFQIGgGpkvLRUjuyzoFTyvYnM=; b=wq5ySAELyXSDYHD9kIw3F4C63WE9M pA/nckbt0MBa2blrBeLRardV9x6vo42XulKRPfyNHX9KvjSzMAQlsLq7WRCeoTQM xR4Eoqo2lOS1mpFhs9xtQtXATANWssld5uP1yuNBevDYme9m0YYmUm040yM+bhTj g77LNNXnu9sjQzv9rabzc2p+vQ7NzGauv38bAkd5+50BOmTKzc3xsI7JpTPX+fGE 4pPEAy5Ac8TINlWigKkA++nYjKzUYsivwLyR9TV10t2u3LtpbT8JNLwWN8ixKyvX +a4gKnX0b8Dia542ePPt5NN20rP+3NP9kCTgzMUTS8q7f3BmmooSxxYqw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrgedutddgudehfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpeffhffvvefukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgu rhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucggtf frrghtthgvrhhnpedvffefvefhteevffegieetfefhtddvffejvefhueetgeeludehteev udeitedtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegrnhgurhgvshesrghnrghrrgiivghlrdguvg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 10 Jun 2023 16:34:57 -0400 (EDT) Date: Sat, 10 Jun 2023 13:34:56 -0700 From: Andres Freund To: Tomas Vondra Cc: PostgreSQL Hackers , Georgios Subject: Re: index prefetching Message-ID: <20230610203456.5gancfekm4pj4pbs@awork3.anarazel.de> References: <20230609000600.syqy447e6metnvyj@awork3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2023-06-09 12:18:11 +0200, Tomas Vondra wrote: > > > >> 2) prefetching from executor > >> > >> Another question is whether the prefetching shouldn't actually happen > >> even higher - in the executor. That's what Andres suggested during the > >> unconference, and it kinda makes sense. That's where we do prefetching > >> for bitmap heap scans, so why should this happen lower, right? > > > > Yea. I think it also provides potential for further optimizations in the > > future to do it at that layer. > > > > One thing I have been wondering around this is whether we should not have > > split the code for IOS and plain indexscans... > > > > Which code? We already have nodeIndexscan.c and nodeIndexonlyscan.c? Or > did you mean something else? Yes, I meant that. > >> 4) per-leaf prefetching > >> > >> The code is restricted only prefetches items from one leaf page. If the > >> index scan needs to scan multiple (many) leaf pages, we have to process > >> the first leaf page first before reading / prefetching the next one. > >> > >> I think this is acceptable limitation, certainly for v0. Prefetching > >> across multiple leaf pages seems way more complex (particularly for the > >> cases using pairing heap), so let's leave this for the future. > > > > Hm. I think that really depends on the shape of the API we end up with. If we > > move the responsibility more twoards to the executor, I think it very well > > could end up being just as simple to prefetch across index pages. > > > > Maybe. I'm open to that idea if you have idea how to shape the API to > make this possible (although perhaps not in v0). I'll try to have a look. > > I'm a bit confused by some of these numbers. How can OS-level prefetching lead > > to massive prefetching in the alread cached case, e.g. in tpch q06 and q08? > > Unless I missed what "xeon / cached (speedup)" indicates? > > > > I forgot to explain what "cached" means in the TPC-H case. It means > second execution of the query, so you can imagine it like this: > > for q in `seq 1 22`; do > > 1. drop caches and restart postgres Are you doing it in that order? If so, the pagecache can end up being seeded by postgres writing out dirty buffers. > 2. run query $q -> uncached > > 3. run query $q -> cached > > done > > So the second execution has a chance of having data in memory - but > maybe not all, because this is a 100GB data set (so ~200GB after > loading), but the machine only has 64GB of RAM. > > I think a likely explanation is some of the data wasn't actually in > memory, so prefetching still did something. Ah, ok. > > I think it'd be good to run a performance comparison of the unpatched vs > > patched cases, with prefetching disabled for both. It's possible that > > something in the patch caused unintended changes (say spilling during a > > hashagg, due to larger struct sizes). > > > > That's certainly a good idea. I'll do that in the next round of tests. I > also plan to do a test on data set that fits into RAM, to test "properly > cached" case. Cool. It'd be good to measure both the case of all data already being in s_b (to see the overhead of the buffer mapping lookups) and the case where the data is in the kernel pagecache (to see the overhead of pointless posix_fadvise calls). Greetings, Andres Freund