Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rafV4-00Ash3-UY for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 17:26:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rafV3-00Ehou-IS for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 17:26:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rafV3-00Ehom-7M for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 17:26:17 +0000 Received: from mail-lj1-x230.google.com ([2a00:1450:4864:20::230]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rafUz-007Mrx-9K for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 17:26:15 +0000 Received: by mail-lj1-x230.google.com with SMTP id 38308e7fff4ca-2d0b4ea773eso14486891fa.0 for ; Thu, 15 Feb 2024 09:26:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1708017972; x=1708622772; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=WonOh/2WPVBZO4AI7sHRvDYsY0qcmoTLnp/uvmqPIs4=; b=L+kwmhEuEnf9fbvem6boHHEmxNbkumoFMiLZa0jO4I2Grz1xCN7rNk2bul8eYAXZcM srG6rWbMVi/N97kRZETu9kZtEh5OzlYIp324+G7WY27oEtkTP3YfaYEMk8HELBF+nhuB F/+fpQyYcbrRJYOupVR/XLOiV/0US5qF91r0ML5rZQwCWQvr78YHkr1laJBGFYIyPMjN XAT8dCBbFk2QaKxxeUVowM2YCVCUoScykLgZkVoDT5bqn8Cour5S5Fz5CZ1wl3HzoAFA A+/hrj69f+wwbcbI4lSiLFVY3hNn58yZW6B6Rna33j89Pl4l1H3ZTVx+JcHBl9TP684l dgNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708017972; x=1708622772; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WonOh/2WPVBZO4AI7sHRvDYsY0qcmoTLnp/uvmqPIs4=; b=hgFpfM6rPgjkAWEdADEhniJEG2Ybw6+/w4xFa3H9cDQgBu5unkNNOHTb9R9lncFfFi 1h9Z6JahmizOJOpT9Bq8+CpJQD5epAXnAHFbABDIDJ5/UYI0QMdFJB9TeG7dnpNW6wW8 YR5bUXqH7eTbCafiwUBNkHnFo4PSpctfUQSjLqhxXdWN0OlbW7uTyusH2Ewx59FlHnZ5 o9D+78QOsc4GWk86kuLeF9H/s/hT2AGRYUUfbHwS0W2YuFphPh+SLPmUSiSymuhzitAj FzxTtAtCK7O/uPDBwXp1ylb9+QareE+MrFfneJ/RW5GQgORuxFOdxHPlxP8yFxv+fqhe m6uA== X-Forwarded-Encrypted: i=1; AJvYcCWJzc7lruWAoE86OuI7xsYc7WV75L+0x5WQtaO/eKGktUIadr4qBlnn4BPiOq0gxuqhmPee07tUBHNqukMK+Hmng7iP1V1ysORlCpVUw7STMVKL X-Gm-Message-State: AOJu0YzPGDNllcWUfI/RUQxH9MqVrM6Zl4QHxR8X5PC76d2Mi3NHn8IK lPP9WlEGjLSz2S7DJZp0U7FsQ/TPK0diihx/D3H3nDwPhBkLAYUaKNhgs/cc5A== X-Google-Smtp-Source: AGHT+IFvzJK22AXjCniQ9T6zXCKojD8CfGOjtngE+usuIEkFqrWe8/ihe5kWmTjNjzVzXEiluhzRIg== X-Received: by 2002:a2e:8950:0:b0:2d1:26f0:8167 with SMTP id b16-20020a2e8950000000b002d126f08167mr1907330ljk.35.1708017971668; Thu, 15 Feb 2024 09:26:11 -0800 (PST) Received: from [10.137.0.18] (ip-86-49-229-30.bb.vodafone.cz. [86.49.229.30]) by smtp.gmail.com with ESMTPSA id bh8-20020a05600005c800b0033b888a0a63sm2571193wrb.0.2024.02.15.09.26.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 15 Feb 2024 09:26:11 -0800 (PST) Message-ID: Date: Thu, 15 Feb 2024 18:26:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: index prefetching Content-Language: en-US To: Peter Geoghegan Cc: Melanie Plageman , Robert Haas , Andres Freund , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com> <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/15/24 17:42, Peter Geoghegan wrote: > On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra > wrote: >> On 2/15/24 00:06, Peter Geoghegan wrote: >>> I suppose that it might be much more important than I imagine it is >>> right now, but it'd be nice to have something a bit more concrete to >>> go on. >>> >> >> This probably depends on which corner cases are considered important. >> >> The page-at-a-time approach essentially means index items at the >> beginning of the page won't get prefetched (or vice versa, prefetch >> distance drops to 0 when we get to end of index page). > > I don't think that's true. At least not for nbtree scans. > > As I went into last year, you'd get the benefit of the work I've done > on "boundary cases" (most recently in commit c9c0589f from just a > couple of months back), which helps us get the most out of suffix > truncation. This maximizes the chances of only having to scan a single > index leaf page in many important cases. So I can see no reason why > index items at the beginning of the page are at any particular > disadvantage (compared to those from the middle or the end of the > page). > I may be missing something, but it seems fairly self-evident to me an entry at the beginning of an index page won't get prefetched (assuming the page-at-a-time thing). If I understand your point about boundary cases / suffix truncation, that helps us by (a) picking the split in a way to minimize a single key spanning multiple pages, if possible and (b) increasing the number of entries that fit onto a single index page. That's certainly true / helpful, and it makes the "first entry" issue much less common. But the issue is still there. Of course, this says nothing about the importance of the issue - the impact may easily be so small it's not worth worrying about. > Where you might have a problem is cases where it's just inherently > necessary to visit more than a single leaf page, despite the best > efforts of the nbtsplitloc.c logic -- cases where the scan just > inherently needs to return tuples that "straddle the boundary between > two neighboring pages". That isn't a particularly natural restriction, > but it's also not obvious that it's all that much of a disadvantage in > practice. > One case I've been thinking about is sorting using index, where we often read large part of the index. >> It certainly was a great improvement, no doubt about that. I dislike the >> restriction, but that's partially for aesthetic reasons - it just seems >> it'd be nice to not have this. >> >> That being said, I'd be OK with having this restriction if it makes v1 >> feasible. For me, the big question is whether it'd mean we're stuck with >> this restriction forever, or whether there's a viable way to improve >> this in v2. > > I think that there is no question that this will need to not > completely disable kill_prior_tuple -- I'd be surprised if one single > person disagreed with me on this point. There is also a more nuanced > way of describing this same restriction, but we don't necessarily need > to agree on what exactly that is right now. > Even for the page-at-a-time approach? Or are you talking about the v2? >> And I don't have answer to that :-( I got completely lost in the ongoing >> discussion about the locking implications (which I happily ignored while >> working on the PoC patch), layering tensions and questions which part >> should be "in control". > > Honestly, I always thought that it made sense to do things on the > index AM side. When you went the other way I was surprised. Perhaps I > should have said more about that, sooner, but I'd already said quite a > bit at that point, so... > > Anyway, I think that it's pretty clear that "naive desynchronization" > is just not acceptable, because that'll disable kill_prior_tuple > altogether. So you're going to have to do this in a way that more or > less preserves something like the current kill_prior_tuple behavior. > It's going to have some downsides, but those can be managed. They can > be managed from within the index AM itself, a bit like the > _bt_killitems() no-pin stuff does things already. > > Obviously this interpretation suggests that doing things at the index > AM level is indeed the right way to go, layering-wise. Does it make > sense to you, though? > Yeah. The basic idea was that by moving this above index AM it will work for all indexes automatically - but given the current discussion about kill_prior_tuple, locking etc. I'm not sure that's really feasible. The index AM clearly needs to have more control over this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company