Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rafve-00Autb-1m for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 17:53:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rafvb-00EoYy-0B for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 17:53:43 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rafva-00EoYo-EF for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 17:53:42 +0000 Received: from mail-vs1-xe31.google.com ([2607:f8b0:4864:20::e31]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rafvW-006uwU-Uh for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 17:53:40 +0000 Received: by mail-vs1-xe31.google.com with SMTP id ada2fe7eead31-4676a37e2c4so386687137.1 for ; Thu, 15 Feb 2024 09:53:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1708019618; x=1708624418; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qQ6pvAyC7v9qYPVDHK5J+1KeR4dvqbPBPriYgnx47bw=; b=axxcNfwVTyWUnXZGl44mogBnCSZB/EvRzlQfxN6u6Igh25ghYeBRF5oRyDbN+ZPlk6 WdtCpvugfPpqCLecKelDN+i0771fRR1ulrs5KVcr/0wdjB7/snJA4JQTL00ATyX9soTl 3xlW/MaLI3TGdix9/2hAa0fcQj3v0ZvRmNPpOlrueaBTX5LSqVfzXLCyNxeI8jw6OWPZ Y/XILnNPGPcwC0jlv/baiTCmlym6b2nybMERmYA2ONpl3Y4B3e1pSvL0WEuEumXU/Y5M ROs6v9O+V6q1COEJMQvFTUChRTmgYEcMNFIt9AhuM4SvD6zTak7eBkWGLeK6r2ul5gwO Rglg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708019618; x=1708624418; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qQ6pvAyC7v9qYPVDHK5J+1KeR4dvqbPBPriYgnx47bw=; b=mgobiiMzmZ80oOZakAzllqI+jxwBW6GZrQaoCiwQWX+nHSkNm68oY6FlAIqASK9EqA zE+3nVtCoW0wxw4lobhvhx3uBJll4ikNKjOGm3bCDAQK8pnnjQ5BuDYasfsJWJtpstmQ p0M2Wq3cA1X+nBr62yapfmdCmspNmqOYgV4EsCc55Y3gp/3lYE0QTqVS93wLcAzxmqtT orP7ly2Ycyw3d8oyOmfacw1yFmwYVFX7OneqEOqm/apl5UJT5vTw1az8K5+qZPIfdc2q a2vZsuWLSZiwru6vYDZiCG6SmoTauumMxWuaBxFO7gwqJGEoRARIq95Bn1FFIsGZMxbj 6dTg== X-Forwarded-Encrypted: i=1; AJvYcCUX1EYPaCPWabgh928VhHP+eNtn082C3vQDEBS0grzh961OOWUz9ZSVKqfFtUhU7WEHKMCMlxnZMLbOtWAxqZELbR4NCdpb3EH0r6ZIZqEhYmBI X-Gm-Message-State: AOJu0YzkvBPx34wm9/Yesan4bNgc+iFYXPcva3P6sECH7d5yDLZAXwuK 4Z4VxcExzLHfjHGfg7CM1FA91tjKZ+rMKMB1tPRpOp2Oeliw6g1xad0aA1xXPGGC7lukuflYjTB OuUwJlz/Y9CgyT2FZL9uh+eRcGLsaF1XmAKXTAw== X-Google-Smtp-Source: AGHT+IHLJNof3eqkHR0mB69F9wIOrR790O0bMsFWzbhWB9HxahVlugWKy5uX3zFJN7dIoBg80xJBhq0v53rmFUwzMIE= X-Received: by 2002:a05:6102:50a3:b0:46c:fb37:393b with SMTP id bl35-20020a05610250a300b0046cfb37393bmr3351154vsb.11.1708019617701; Thu, 15 Feb 2024 09:53:37 -0800 (PST) MIME-Version: 1.0 References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com> <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> In-Reply-To: From: Peter Geoghegan Date: Thu, 15 Feb 2024 12:53:10 -0500 Message-ID: Subject: Re: index prefetching To: Tomas Vondra Cc: Melanie Plageman , Robert Haas , Andres Freund , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Feb 15, 2024 at 12:26=E2=80=AFPM Tomas Vondra wrote: > I may be missing something, but it seems fairly self-evident to me an > entry at the beginning of an index page won't get prefetched (assuming > the page-at-a-time thing). Sure, if the first item on the page is also the first item that we need the scan to return (having just descended the tree), then it won't get prefetched under a scheme that sticks with the current page-at-a-time behavior (at least in v1). Just like when the first item that we need the scan to return is from the middle of the page, or more towards the end of the page. It is of course also true that we can't prefetch the next page's first item until we actually visit the next page -- clearly that's suboptimal. Just like we can't prefetch any other, later tuples from the next page (until such time as we have determined for sure that there really will be a next page, and have called _bt_readpage for that next page.) This is why I don't think that the tuples with lower page offset numbers are in any way significant here. The significant part is whether or not you'll actually need to visit more than one leaf page in the first place (plus the penalty from not being able to reorder the work across page boundaries in your initial v1 of prefetching). > If I understand your point about boundary cases / suffix truncation, > that helps us by (a) picking the split in a way to minimize a single key > spanning multiple pages, if possible and (b) increasing the number of > entries that fit onto a single index page. More like it makes the boundaries between leaf pages (i.e. high keys) align with the "natural boundaries of the key space". Simple point queries should practically never require more than a single leaf page access as a result. Even somewhat complicated index scans that are reasonably selective (think tens to low hundreds of matches) don't tend to need to read more than a single leaf page match, at least with equality type scan keys for the index qual. > That's certainly true / helpful, and it makes the "first entry" issue > much less common. But the issue is still there. Of course, this says > nothing about the importance of the issue - the impact may easily be so > small it's not worth worrying about. Right. And I want to be clear: I'm really *not* sure how much it matters. I just doubt that it's worth worrying about in v1 -- time grows short. Although I agree that we should commit a v1 that leaves the door open to improving matters in this area in v2. > One case I've been thinking about is sorting using index, where we often > read large part of the index. That definitely seems like a case where reordering work/desynchronization of the heap and index scans might be relatively important. > > I think that there is no question that this will need to not > > completely disable kill_prior_tuple -- I'd be surprised if one single > > person disagreed with me on this point. There is also a more nuanced > > way of describing this same restriction, but we don't necessarily need > > to agree on what exactly that is right now. > > > > Even for the page-at-a-time approach? Or are you talking about the v2? I meant that the current kill_prior_tuple behavior isn't sacred, and can be revised in v2, for the benefit of lifting the restriction on prefetching. But that's going to involve a trade-off of some kind. And not a particularly simple one. > Yeah. The basic idea was that by moving this above index AM it will work > for all indexes automatically - but given the current discussion about > kill_prior_tuple, locking etc. I'm not sure that's really feasible. > > The index AM clearly needs to have more control over this. Cool. I think that that makes the layering question a lot clearer, then. -- Peter Geoghegan