Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9cor-001b4P-2D for pgsql-hackers@arkaria.postgresql.org; Mon, 06 Apr 2026 05:48:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w9cop-006kTz-2Q for pgsql-hackers@arkaria.postgresql.org; Mon, 06 Apr 2026 05:48:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9cop-006kTr-0h for pgsql-hackers@lists.postgresql.org; Mon, 06 Apr 2026 05:48:15 +0000 Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w9com-00000000o3L-2eEl for pgsql-hackers@lists.postgresql.org; Mon, 06 Apr 2026 05:48:14 +0000 Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-4838c15e3cbso26957095e9.3 for ; Sun, 05 Apr 2026 22:48:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775454491; cv=none; d=google.com; s=arc-20240605; b=YvUcRNlEux7m2/jAQqmsAAdW8xGTyfFSg/tt/rSFyIa8oaMSzdIspDk6zKpTF2CMl0 nH/Ni71XHUL6cI90y/M4M/9pl+S9GDpYU2OwIS1M61Ny7wASpVhTcvt/BnyEDHI4ydTW CRrSDRN2Ef4Trddk3JCT9ORQzvaS0NFeKPJdbza+u73RnlgrIHfh4FVCxvUrW99agyMN vEvmVD91OA//cj8HT0ftWgiHcjh5wBuhx2rNNS35NAghM+6kiRmNAYeEGgYv9ui64GGF MncomHBVK7cYBC5cJuJqRoaiekpZ2iwSfsm3YyHOXTWfhV6ez7ohnuL6wyOg+fG5hil+ qpgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=8zqTlEcptm6SWq3F3lcpgIvYsB75/ufnvx66v7Wr39c=; fh=156Sskh5lFzgLpqDunY1ZWwVpmVmT9hOOPqHaLx2Zx4=; b=ALNxi3dNymrW9OYvWs1imuT+YOC9K8Qux8T41I1Ls0mFqeNak7rZfUS9ev2vHk+CJw X7FsefSmqbZEAbftIWukL1+pTCwWJBPEpG2gh6uvuAXcphD5WUa8MNyZ+CL6mujG2s5B r9KZAerb1zneSz/q2blclzQ3pziIz5fsWxo7mkZG89K3cBvD7KObrjZfqJ303FUrtpMx vszftXVHZhoQKGyKClQggVCEH+iu2/ism5fGZNqsOMtToEsO3hz9OoAokRx3/USw/ZYV SxLw0yWYkSnGWqBDpPhGlomBjgGhetTAjaXTZMxTdqPwNePeMVLVOleawTbcFldASeef n2cw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20251104.gappssmtp.com; s=20251104; t=1775454491; x=1776059291; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8zqTlEcptm6SWq3F3lcpgIvYsB75/ufnvx66v7Wr39c=; b=cq3B5wxtROIw/i78IFkDhCRNU6/pIwu5sJXOasXH7HIPzuFjYvDPU8v9ws6wQsYpav Fr/05xXj5YDQP6V0qc8m5DKvuCPLc/1E823TL6S8LoXNaE8FF/2+E6MRWiEi8mnedFLJ +tCeUD+3ynJ5fLnv6jaTVKSsMuCNLiaQU7nkr3qgzlTOvmcWss5h4nqBFRCpkckgf+++ iJNaWzqWPKYilTJ0DCGCsRp+ZyVSW6e48/wHKStUAVY6mDX0fzxhN7JlH9OsJUq5ALJA DsJ+JMest7SkTgeT9uF6EtPLJ+OftmY/3u6f8N48GOXR3UDE+SiQtkexPDBe8IaAcFR+ ByEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775454491; x=1776059291; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=8zqTlEcptm6SWq3F3lcpgIvYsB75/ufnvx66v7Wr39c=; b=F0BwOvYhjptpdWmqvniTofmYUg3DesdwmKzL0clBLwDt8rM8tIvCMMOxitifCAC/3r 7Ubw3nE91Iq6p78FirNcn+X0nM0qSx36O/EWkaY97CCuNxTV0czydzIwgkAMT0pcPntt rq/Em2rhBbIKIFWPDRNa7inH48mHXeZwfW4RLuI0UKZFWA+sHzr/kcRVtkzireOHNwiL obJ7s1smawHlQYY9Vpz4oZq8A4npjIcjpnp++8q3Zn63zhrw5t+IQ9qg3oLbLiMknHUf DT0Wg/qZWUL80zZPfjLXUhxHsjrwi0XKuLxgVabXzT3Z6d688GdOqUtEdHK6jnBAPjIj DHwg== X-Forwarded-Encrypted: i=1; AJvYcCUS3X6HYzi74T7bcBsZ0atwlpYNGcXTILB+4/X+YOOHFVRMCAZ+40D4T7V5BgpX6WFVPG3L72kaXr7+V5oP@lists.postgresql.org X-Gm-Message-State: AOJu0YwtOr7PpdfkWLexueXkR2jW7pbCF2yvu3vG1vNxgfrVUOLAq2bB l39KhyEGBZT6MT2O8pkZQMzPkXR50Jg6CkLSU8R+mHm3i6AgBr8BIlud+9F2X9KMSdnPZz1L3D9 Sxnk6hs4vM6x0kCxR5n9OY6rwc2a9HxwFLBfVIZK6qA== X-Gm-Gg: AeBDieu1SJqbtIeJUbxXSoyV/I7VOzxHbzn0sY4ViRZLEr+iyltwP6j0pqIkEVtJQNx 22sVmPIF7RW0Q6+FfZh/E17/37ZR7ES6L+0x4VcHLKYVSOIWcRh4h1Gj2Ol8MlHwktFfVspO+lU 3qZWgauGPNuynQJsbYgpgpmgswBuSsI0URIajfkc2jL8CblPhjB8bq/eefz0IAkVhvJj9toIjPp JFmrM/Doh7TUoebTcdKiIJwauJSWjg2ksxu5pb4fhuN+jX8LFzdRUOfjl/qDcmMq8pTEDElkD/Y h/W+yS1eRTNifOsS75Ye6eIiv4cW4hstgnb0ty/Otp13jiJvQ9xJFQ== X-Received: by 2002:a05:600c:1d1d:b0:485:3ec6:e634 with SMTP id 5b1f17b1804b1-4889977600bmr157594185e9.15.1775454490544; Sun, 05 Apr 2026 22:48:10 -0700 (PDT) MIME-Version: 1.0 References: <6sphk3ycctmbihlrykts7uj6mjakop6wrq2dhe3vnlmrnldz2f@uuwmkd6jjrxa> In-Reply-To: <6sphk3ycctmbihlrykts7uj6mjakop6wrq2dhe3vnlmrnldz2f@uuwmkd6jjrxa> From: Peter Geoghegan Date: Mon, 6 Apr 2026 01:47:43 -0400 X-Gm-Features: AQROBzC1eMgUfbCI1N-byNXeUjN41GQPv454KG6x19JFSf3s9dMpErGxTZkVvfU Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Alexandre Felipe , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sun, Apr 5, 2026 at 7:01=E2=80=AFPM Andres Freund w= rote: > > It's unclear if you're asking (or at least suggesting) that I should > > invent a new table_index_getnext_slot-like shim, used only during > > index-only scans, that doesn't require passing a slot (but does > > require getting a table slot some other way, likely at scan startup). > > Why would it require creating a tableam shaped slot internally? It seems > rather silly to store a tuple fetched to verify visibility into a slot, w= here > it is never used. Storing in a tuple requires an indirect things like > IncrBufferRefCount(), which isn't that cheap. > > heapam_index_getnext_slot(index_only=3D>1) > -> heapam_index_fetch_heap_item() > -> heapam_index_fetch_tuple() The fact is that nodeIndexOnlyscan.c calls index_fetch_heap right now, on master, which uses a TupleTableSlot. Yes, using a slot is suboptimal. But it's nothing new. And this is the first time I'm hearing about this being a problem. A couple of days before feature freeze. > So I guess yes, I am suggesting that we have a separate > table_index_getnext_ios() or such, which is only usable for index only sc= ans. What if we offered such an interface but still used a slot internally? That way we'd have the API you're asking for, without any significant changes to heap_hot_search_buffer. The caller doesn't need to know about the table slot's usage. I get that making the caller use a slot interface with a slot that they don't exactly need is a bit hokey. We can fix that part, without seriously risking not getting the patch into 19. > > Actually, I don't know if we really *should* treat the selfuncs.c > > caller any different (except as needed to avoid spurious assertion > > failures). > I don't think it's worth assuming that that's the code in the core > infrstructure. To be clear, I agree. > scan->heapRelation->rd_tableam->index_fetch_batch_init(scan, batc= h, new_alloc); > > that's a lot of indirection to go through. > > > Which is what then made me think of embedding something like a > TableIndexFetchRoutine (with batch_init, reset, markpos, restrpos, end > members) in IndexScanDesc. That way table_index_{fetch_reset,batch_Init,.= ..}() > wouldn't need this level of indirection. But that's the same level of indirection as before? Except it's once per batch instead of once per tuple. > Not having the same code in multiple table AMs would make it easier to wr= ite > them, which would be nice in itself, but I am more concerned with ending = up > with copies of heapam_index_getnext_scanbatch_pos() in various AMs, us fi= nding > problems with heapam_index_getnext_scanbatch_pos() in a minor release, an= d > having a harder time adjusting things due to the copied code. I'll need to study this further. > > What do you want to do about it? index_getnext_tid isn't adding too > > much right now, I'd say. > > It does indeed not add a whole lot. > I would move it into a tableam_util_fetch_next_tid() (or such) helper, I > think. That also makes it easier to get rid of the table_index_fetch_res= et() > and move the pgstat_count_index_tuples(scan->indexRelation, 1) into one p= lace. I'll put it into indexbatch.h (next to the function that calls amgetbatch), and make sure that the amgettuple path is consistent in terms of how it calls pgstat_count_index_tuples(), and how the fetch/index scan gets reset. I don't think that we should be resetting here -- even if it's the historical behavior. I'm absolutely sure that whatever we do shouldn't vary based on whether amgetbatch or amgettuple was used (that was not intended). > > > Can'tt the *2 lead to computing a xs_vm_items bigger than scan->maxit= emsbatch? > > > > Yes, it can. > > But we prevent it from overflow by a Max()/Min() in the loop headers. Right. > > > I'd set scan->batchcache[i] to NULL after freeing. > > > > I tried that out, but I *think* it regressed pgbench SELECT by about > > ~1.5% of total TPS (I'd need more thorough testing to confirm this). > > Are you sure we need this? > > That seems surprising. But no, I don't think we need this. I'll revisit, might have been wrong. > It's perhaps ok, I just was a bit surprised to see the callback inside a > function with that name. I'd perhaps just name it > tableam_util_release_batch() (and rename batch_free accordingly). Will do. > I'd probably just assert !kill_prior_tuple. Will do. --=20 Peter Geoghegan