Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raeox-00ApqM-5I for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 16:42:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1raeov-00EKib-8G for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Feb 2024 16:42:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1raeou-00EKiS-QK for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 16:42:44 +0000 Received: from mail-vk1-xa2e.google.com ([2607:f8b0:4864:20::a2e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1raeom-006uLL-AZ for pgsql-hackers@lists.postgresql.org; Thu, 15 Feb 2024 16:42:42 +0000 Received: by mail-vk1-xa2e.google.com with SMTP id 71dfb90a1353d-4c01c53efe5so470349e0c.2 for ; Thu, 15 Feb 2024 08:42:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1708015354; x=1708620154; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fGcQZtml5UovwDPoeTnkCxc0FYIaG7EyNJ/kGr/JSg0=; b=gDFiaeGXghR0yc9FASjK6KrTa9lUsLHIzK7QAI2lh17rhQWh/pLppgOGJCq/qd5aE6 FD40P3J6LCx5aRbvXBKWF/bhYMdFMtEo5p8V6GzeMEM32epFW0e39lWoz7CjwTfxNFex IGq88vJGvL8iKvUwmdsA3HJMo2qaILUA09m8b1zrpSkCxTKgh4kvUSZvoERMEujq05Nt SLU+tDBamOQCC1BIQLkHyroZEJ2pM8gzcQ85cHoQ9t4tVXYUW3G+oYpWK0E1Ft8tUf28 BANAFsJ8VM2/U+c7EEdVOdfnRMAGagIqixwahhMwDTKQpn2O5xTBfXq6IMbVbhoEIDXp Ufjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708015354; x=1708620154; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fGcQZtml5UovwDPoeTnkCxc0FYIaG7EyNJ/kGr/JSg0=; b=mDIpWu+MW1guAy5eMOb/acUG1qsbuz9xOBmSeESo3auL6AfOX/x2HCtcDOb0SL/V0E QpCOwbG9FOdupCyrYywZyJx0mKCWI77AiQhxf+fNwBWyReMKT3J91gaErbqpAOF8ybDH jxXNfII75TmNpIzwB9ggI8tlLY2TBQrusUylUBuk2gCvrfpEX7wCHBfDL/FseOJ6HCcQ UivTqVI5WefVR6ULAcPX3hFjDbFrYJ22cByVBofOVeebVltCZfWLZQnWXQ+dJnhdHnON pZvAqvd3r+PM2SAqfjuWw/WHFlF/ddEMCPJwyavGlBXdZ9MwMK1LKh1gUWmsIn2iy7by +KIA== X-Forwarded-Encrypted: i=1; AJvYcCU0xq8OTQ96Wf9ntd99zRQ1RxBpCPa0h+pknqmOkKkxS5Uln+UXh5tdBQKN0mAqw/nWYgprBW6e2jJFN9WAjmTagAK/KZRguXCMcyDp9vKqQq+z X-Gm-Message-State: AOJu0Yw+KKqYOKwf4ll7sQSUaDdeJJ5sQ9jcHho4l0+6DEaUBwvDhYip u1p/O1rlgIp7jV2rd1zC4IFHQaV/5se1akk6hBYdcI3IVE4uLSwgMEcKnIga1Jn4Sv3c34f9FGb +hUx/CO4tdVj7d9unP14AJDrkJlr6GoWyCxze/Q== X-Google-Smtp-Source: AGHT+IEGCsnOSDconnCuvsLug9RAvLv+D9JFK/FxEfhiwdOA5yPn8zZKI3LzH1yp+kfTJQmPAT/SSl2SEx5MZ/Vd8es= X-Received: by 2002:a1f:df04:0:b0:4c0:9edd:2347 with SMTP id w4-20020a1fdf04000000b004c09edd2347mr2314676vkg.5.1708015354327; Thu, 15 Feb 2024 08:42:34 -0800 (PST) MIME-Version: 1.0 References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <482ec3ff-52ad-415d-96fd-f3832a894023@enterprisedb.com> <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> <4867452a-b853-4813-a6da-9bb06a336f8b@enterprisedb.com> <4f5f16ef-df1e-4e09-9b3f-2e0961ab5117@enterprisedb.com> <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> In-Reply-To: <4736207c-8ea6-40cb-ac52-41af00b58bbc@enterprisedb.com> From: Peter Geoghegan Date: Thu, 15 Feb 2024 11:42:07 -0500 Message-ID: Subject: Re: index prefetching To: Tomas Vondra Cc: Melanie Plageman , Robert Haas , Andres Freund , PostgreSQL Hackers , Georgios , Thomas Munro , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Feb 15, 2024 at 9:36=E2=80=AFAM Tomas Vondra wrote: > On 2/15/24 00:06, Peter Geoghegan wrote: > > I suppose that it might be much more important than I imagine it is > > right now, but it'd be nice to have something a bit more concrete to > > go on. > > > > This probably depends on which corner cases are considered important. > > The page-at-a-time approach essentially means index items at the > beginning of the page won't get prefetched (or vice versa, prefetch > distance drops to 0 when we get to end of index page). I don't think that's true. At least not for nbtree scans. As I went into last year, you'd get the benefit of the work I've done on "boundary cases" (most recently in commit c9c0589f from just a couple of months back), which helps us get the most out of suffix truncation. This maximizes the chances of only having to scan a single index leaf page in many important cases. So I can see no reason why index items at the beginning of the page are at any particular disadvantage (compared to those from the middle or the end of the page). Where you might have a problem is cases where it's just inherently necessary to visit more than a single leaf page, despite the best efforts of the nbtsplitloc.c logic -- cases where the scan just inherently needs to return tuples that "straddle the boundary between two neighboring pages". That isn't a particularly natural restriction, but it's also not obvious that it's all that much of a disadvantage in practice. > It certainly was a great improvement, no doubt about that. I dislike the > restriction, but that's partially for aesthetic reasons - it just seems > it'd be nice to not have this. > > That being said, I'd be OK with having this restriction if it makes v1 > feasible. For me, the big question is whether it'd mean we're stuck with > this restriction forever, or whether there's a viable way to improve > this in v2. I think that there is no question that this will need to not completely disable kill_prior_tuple -- I'd be surprised if one single person disagreed with me on this point. There is also a more nuanced way of describing this same restriction, but we don't necessarily need to agree on what exactly that is right now. > And I don't have answer to that :-( I got completely lost in the ongoing > discussion about the locking implications (which I happily ignored while > working on the PoC patch), layering tensions and questions which part > should be "in control". Honestly, I always thought that it made sense to do things on the index AM side. When you went the other way I was surprised. Perhaps I should have said more about that, sooner, but I'd already said quite a bit at that point, so... Anyway, I think that it's pretty clear that "naive desynchronization" is just not acceptable, because that'll disable kill_prior_tuple altogether. So you're going to have to do this in a way that more or less preserves something like the current kill_prior_tuple behavior. It's going to have some downsides, but those can be managed. They can be managed from within the index AM itself, a bit like the _bt_killitems() no-pin stuff does things already. Obviously this interpretation suggests that doing things at the index AM level is indeed the right way to go, layering-wise. Does it make sense to you, though? --=20 Peter Geoghegan