Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raiNX-00B6jp-8p for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 20:30:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1raiNV-00FTX4-6R for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 20:30:41 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raiNU-00FTWw-P1 for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 20:30:40 +0000 Received: from mail-vk1-xa36.google.com ([2607:f8b0:4864:20::a36]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1raiNN-006w30-Ow for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 20:30:39 +0000 Received: by mail-vk1-xa36.google.com with SMTP id 71dfb90a1353d-4b978e5e240so523470e0c.0 for ; Thu, 15 Feb 2024 12:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1708029032; x=1708633832; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KJlp/jZw1LpPXPhc6ZE7zkTqK6tV9zqp0e331aivai8=; b=hkZhsTDeXz4k0FzO9JSV1JsSTXofs5Wk+aTkE0pi83Lt2hasii+Bxt0niA5kiGIqj7 CI9bIcnwDLL1SNrYWRc7P9MJIJpthlqe92dEIw/I8gXwTf1mYT0J54DGOvM3WgwX0bu8 it8YilR9KOKsHq+C98tn3IXsb0zuTmv0wejpgcmLAdVVEcHkpvEh2yZIcqdoohzJzdrW LqIsbr84dNoszOSQ6gcWX5uGSAeq+iebYhKjStH+ANbyzD55PjkNl6MOJDSlNCglpsKF NBK+4fD3W9y8z/FVf9nBd7LjjKnvEXrX96gTFstS6P0+WXqrWx8Tt1bSeKZQrpYXZDvR 1UBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708029032; x=1708633832; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KJlp/jZw1LpPXPhc6ZE7zkTqK6tV9zqp0e331aivai8=; b=i7rZ/j4y5gUFRrKnZD6WBCbKMdGkLJ1GCTihT2TUQutS+8Q02SVaEI8SLTuaNq7NeQ U5AvlTqb77ypr+VaxG4iiahNdVeKaSnm2X+FsMnJY0hJR+b8QqqvOwe3Ipn23suWiPQw WrH5FKXumXH2tPHDtoPIIoP2MDjDeYVwcWv7LAckaQjxDX5Av5vnhvp7hrSsZGasrmfw ouiLe/7jf8618Y7VF93MuF635NdWqDvS1Lw2/eUxdvNrzJl05pwQlD6X1Nj0cGSaAFOj 0zlsiKoQ7ILM1rC5t8jJ7kpvtgGyV3iVzsj9E5JfmY3xHUeJ3ouVTHUMafM3D/LkIqZA 3GCA== X-Forwarded-Encrypted: i=1; AJvYcCU9JbcOAhoiXhLapHw0rL2zPVVIKPBrmhlaZ83gymkEo00UQzKMR3bzAPytCSfM2bH4CbWlh7edjInGJtxUemVNE/M2FYz2I7oYaToLh21OqgDd X-Gm-Message-State: AOJu0YyPXpPkEZVO2WHRrMEBpOL0kRst6MJ40uBYyBkyDmoah+9qd2AS HU55vtJru4NNnyGD2LBg89hICEtw/odcAuAfTyuMkRa4hL2z+5vtrGZZE1E4xwpi2jux+1Z1oye +wOkqcpm6lXlli4HX3/CYvlDBdMzLe+xslXwZBg== X-Google-Smtp-Source: AGHT+IGBDwezJSWQsDg7shpOHeI3O5e8dPwN8tQ+3rVlLUlCnoztMp/1iqFKYJDdEtb0bvfecXogvrPz6cVehmb9wEw= X-Received: by 2002:a1f:4b07:0:b0:4c0:285e:79a with SMTP id y7-20020a1f4b07000000b004c0285e079amr2501451vka.3.1708029032579; Thu, 15 Feb 2024 12:30:32 -0800 (PST) MIME-Version: 1.0 References: <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> <20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de> In-Reply-To: <20240215201337.7amzw3hpvng7wphb@awork3.anarazel.de> From: Peter Geoghegan Date: Thu, 15 Feb 2024 15:30:06 -0500 Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Melanie Plageman , Robert Haas , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Feb 15, 2024 at 3:13=E2=80=AFPM Andres Freund = wrote: > > This is why I don't think that the tuples with lower page offset > > numbers are in any way significant here. The significant part is > > whether or not you'll actually need to visit more than one leaf page > > in the first place (plus the penalty from not being able to reorder > > the work across page boundaries in your initial v1 of prefetching). > > To me this your phrasing just seems to reformulate the issue. What I said to Tomas seems very obvious to me. I think that there might have been some kind of miscommunication (not a real disagreement). I was just trying to work through that. > In practical terms you'll have to wait for the full IO latency when fetch= ing > the table tuple corresponding to the first tid on a leaf page. Of course > that's also the moment you had to visit another leaf page. Whether the st= all > is due to visit another leaf page or due to processing the first entry on= such > a leaf page is a distinction without a difference. I don't think anybody said otherwise? > > > That's certainly true / helpful, and it makes the "first entry" issue > > > much less common. But the issue is still there. Of course, this says > > > nothing about the importance of the issue - the impact may easily be = so > > > small it's not worth worrying about. > > > > Right. And I want to be clear: I'm really *not* sure how much it > > matters. I just doubt that it's worth worrying about in v1 -- time > > grows short. Although I agree that we should commit a v1 that leaves > > the door open to improving matters in this area in v2. > > I somewhat doubt that it's realistic to aim for 17 at this point. That's a fair point. Tomas? > We seem to > still be doing fairly fundamental architectual work. I think it might be = the > right thing even for 18 to go for the simpler only-a-single-leaf-page > approach though. I definitely think it's a good idea to have that as a fall back option. And to not commit ourselves to having something better than that for v1 (though we probably should commit to making that possible in v2). > I wonder if there are prerequisites that can be tackled for 17. One idea = is to > work on infrastructure to provide executor nodes with information about t= he > number of tuples likely to be fetched - I suspect we'll trigger regressio= ns > without that in place. I don't think that there'll be regressions if we just take the simpler only-a-single-leaf-page approach. At least it seems much less likely. > One way to *sometimes* process more than a single leaf page, without havi= ng to > redesign kill_prior_tuple, would be to use the visibilitymap to check if = the > target pages are all-visible. If all the table pages on a leaf page are > all-visible, we know that we don't need to kill index entries, and thus c= an > move on to the next leaf page It's possible that we'll need a variety of different strategies. nbtree already has two such strategies in _bt_killitems(), in a way. Though its "Modified while not pinned means hinting is not safe" path (LSN doesn't match canary value path) seems pretty naive. The prefetching stuff might present us with a good opportunity to replace that with something fundamentally better. --=20 Peter Geoghegan