Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vy934-00HUEN-1Z for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 13:47:30 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vy930-00HPdn-1z for pgsql-hackers@arkaria.postgresql.org; Thu, 05 Mar 2026 13:47:27 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vy930-00HPde-0g for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 13:47:26 +0000 Received: from mail-ej1-x635.google.com ([2a00:1450:4864:20::635]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vy92y-00000000XcG-3TCo for pgsql-hackers@lists.postgresql.org; Thu, 05 Mar 2026 13:47:25 +0000 Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-b932fe2e1a7so1078890866b.1 for ; Thu, 05 Mar 2026 05:47:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772718443; cv=none; d=google.com; s=arc-20240605; b=irBs0MNrfZEpVhPBiGG6uQBzBvnpT68+ZzR7WGDGv6DAuX+U61MrE4xlmeg57GU3BO nq0uCwT5UNvs+Kvr1oEH7Xe5sAE3/5wp+vLyzktcnhceF9mrRwY+WYvsHhESL1yblboY D6+a26vadAQyT3D714wMV1C1xrmFoyYbA5GXI7BeLqzH7YKHjF4oUi2RJs+CpmvhPfn4 T9Nhf722j1jw+ZcIB5Z3L9uZb9HddvikjcUowxxlhZqVauO7wHK2RmZO0dQSwuPLvHM6 BV0gWWU1jRZ3LAtLltfWZ0coTnzCIcALuUyfdNyaEEx19kf8ZECbEPMvk+p5+VcWtqgR C2jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=OSErWGwAAKyBKOzCAU6fw9zXm4BzPT63DUjSZ+iE1r0=; fh=1iDZL2dWVkRDeGftEAXih1CtyeUomEAxo+2K6XmMKaQ=; b=SQG4kdHwynZvxJAv7Dn1jPaaVDG3GhDvk9Z25QNywAerocz0n5Mf1MveLwGtuITNY/ 5sfaqTA3Y0MxSUNaLpRrL6nVAcax7aPXVN63o7sKvvZTjB8/JzvCJumqFPGlriXRzlNo XoI0ZaRYkP0xqZVTBJRKzjQGt1YgpatfERZ7ygicBmuiGET1Of9oKaWsmF78/cp3aTfx cF2YW4mlje6rM+KX00oa9POc5wfhaFvcAjVgbMxPCyEY69X27M3CiITILINpmRktj1iY 8+5XZyxA519RDgNtJjZdq0I0us4LpS+tseO51M6q+9QStT0SArL7+ZMfP8As2PL6Saq3 aQLQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772718443; x=1773323243; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OSErWGwAAKyBKOzCAU6fw9zXm4BzPT63DUjSZ+iE1r0=; b=iBlVw2knUpcnX/TfQrTapK19n09qB3G6fvWMwCo/lrBadSQn7mXT0lpwWHg2p4CqK/ 06vcckuqs59Ny72EPkIec89PhRNfUZVVGMzuLXx/e87t7aHB179zZyFbiUAH40Hgu/Ut eypv7rozGtqqHBwcGflR/dzykJs9BiGVT/dDkLkE4gRKCzk13rvtFjIDCih3XcQ5N8lg zjBM2KYa4BMwVJ/uzdPZIsKP8QJniKXsOZhA1yBEJ85r100nvBiWrTkyapxVTFopDvdy 2RY9nNNVpOo6Njcw/gZS1cQ9gB1dSe1jikGXz45mmNI7vFsciA8ZwVDcNrknYsIyoSvb 9YHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772718443; x=1773323243; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=OSErWGwAAKyBKOzCAU6fw9zXm4BzPT63DUjSZ+iE1r0=; b=bNeKS5V48gp5x7IDx9AB4wXoM14zrGt1ZJ8ZOheDnSWWW6Uhj8HtHLvDw6udrgEfLI 59JBTAYZLkCXEA31iSgGxNpgIxQerRyEQAQ1n49JjpkmyKWnmGg3Yh8vaJvEYJKbhsU0 Z5R+nagHVUAthueiwtlDjp09PkBhfCTZDHM8u0Q8R21v3NxeJSWGCPmvd0lDWGEHVdBg cLdfnIs/JDOl49JcX1ANGlsEP7JFMosyD21c2C0UbED4tVP6N3Bb5gQf1lFvIANCnoyr Rl6FM5k+G/41C/zj6Eq7807d0quglbEEKAdz0/TjXftOx6wnFQT4E19v+C4uCDtX5eQ7 QeRA== X-Forwarded-Encrypted: i=1; AJvYcCVsnp42M0xdXZ0Zcs+NxRyJB7qwOMOsj1tzPv9tpo3szhygjannkO/CJumzij7iW9S+gVJZ9xDT6niH4zZ8@lists.postgresql.org X-Gm-Message-State: AOJu0YyPwgTdKJamlfVzQ4diokCJX6QaQlVowT6dcEkD7rfIgKCBpjVE rFxv0KWPp6cReWWm+TeK3ySjSQ7/kxsxemrZw/DxP9Ed7B2JE3dKu71rhy92U8Zl0GAARVL+4yL jD5qbbcKUKLEcIOBw3moz8yrRRqiMEAk= X-Gm-Gg: ATEYQzw3E520Utv1GwFZR9VJYK0uN25+kCRjgwlGjmTrAvkm526IABqC8TB3VtdTotw 0ecoGJTg2IxqZR7Btx4TO/8+sOLuNbEjiXysBH1Tf6ZqAdF+P09frWfdPjZJ4MU/ASWP9mZA4Nd M3v4vnPVOEB9Do1irSxi3gghRBWdiBboGJCwcJxYVFF/usWHX/2j3ObJgxaU95z4hQzRRht+rtk y1kzkkIwFic0qTVhN3fsYitl8V7yWreONbvf+S9gSveWWmTBKXS4K1cXqTu6PobxicvnTqxmZvB 92SIGo6R+I4NUQSDCt5yyh8u6WqdEevisxLBZOsNqidhnVsBGw== X-Received: by 2002:a17:907:608d:b0:b94:2025:313 with SMTP id a640c23a62f3a-b942025203dmr13588066b.32.1772718443025; Thu, 05 Mar 2026 05:47:23 -0800 (PST) MIME-Version: 1.0 References: <7e707787-272a-4c52-b5f1-5ac990514ecc@vondra.me> <3cbwjhwkomjv7jifau4yhb357gfnckut3sdrlbmhwzesd3kngj@affs2mpxg4gh> In-Reply-To: From: Alexandre Felipe Date: Thu, 5 Mar 2026 13:47:10 +0000 X-Gm-Features: AaiRm51Sjj0hYS0OAs9foQtvSJY7nPtVC9_oe5ElF5wSGnsEeBvMKWZMcHVSZ34 Message-ID: Subject: Re: index prefetching To: Andres Freund Cc: Tomas Vondra , Peter Geoghegan , Thomas Munro , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: multipart/alternative; boundary="00000000000015574e064c472da7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000015574e064c472da7 Content-Type: text/plain; charset="UTF-8" Thank you Andres, I see, It combines an array that is fast for few buffers and a hash that in theory scales well for very large number of buffers. And avoids using an array that would be fast but would multiply the memory usage by the number of backends. > Index prefetching patch: > uncorrelated: 228.936 ms > correlated: 71.684 ms I did some tests > Possible improvements to refcount tracking: > > - increase REFCOUNT_ARRAY_ENTRIES - there's a very significant cliff at 8 > right now, and with vectorized lookup it might not hurt too much to go to 16 > or so Yes, that is true, but only up to 16, the index prefetch test I was doing was getting to 90 or so, and that was clipped by max_pinned_buffers. Also, I noticed a commit 3 months ago that removed the mid-loop return that effectively will add the first few pins right to left instead of left to right. Maybe this works well with vectorisation, but I see an optimization for the for the (pin/unpin)+ sequence, what about the pin(pin/unpin)+ sequence. The previous code would always find the buffers on the first or the second iteration, the new implementation will have to go to the 7th or 8th iteration, (or I am not missing something very important). > - To make the cliff at REFCOUNT_ARRAY_ENTRIES smaller, replace dynahash with > simplehash. That should reduce the perf penalty a good bit. This is also true, even remove the refcount array completely. > Unfortunately it's not just the refcount tracking, it's also resowner > management that gets more expensive. I didn't read this sentence until I came back to reply. It is exactly what I noticed. Once we fix the reference counting the resowner still puts a floor. And that is even more important when a buffer is pinned multiple times because the resowner will add one entry to the buffer for each pin. There is another problem, ResOwnerReleaseBuffer unlocks buffers, even if it is not the owner of the lock. I think this deserves a separate thread. --00000000000015574e064c472da7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Thank you Andres,

I see, It combines an array that is fast for few buff= ers and a hash that in theory
scales well for very large number o= f buffers. And avoids using an array that
would be fast but would= multiply the memory usage by the number of backends.

<= div>> Index prefetching patch:
> uncorrelated: 228.936 ms
> = correlated:=C2=A0 =C2=A071.684=C2=A0 ms


=
I did some tests
> Possible improvements to refcount tracking:>=C2=A0
> - increase REFCOUNT_ARRAY_ENTRIES - there's a very= significant cliff at 8
>=C2=A0 right now, and with vectorized lookup= it might not hurt too much to go to 16
>=C2=A0 or so

Y= es, that is true, but only up to 16, the index prefetch test I was doing wa= s
getting to 90 or so, and that was clipped by max_pinned_buffers= .
Also, I noticed a commit 3 months ago that removed the mid-loop= return
that effectively will add the first few pins right to lef= t instead of left to right.

Maybe this works well = with vectorisation, but I see an optimization for the
for the (pi= n/unpin)+ sequence, what about the pin(pin/unpin)+ sequence.
The previou= s code would always find the buffers on the first or the second
i= teration, the new implementation will have to go to the 7th or 8th iteratio= n,
(or I am not missing something very important).

=
> - To make the cliff at REFCOUNT_ARRAY_ENTRIES smaller, repl= ace dynahash with
>=C2=A0 simplehash. That should reduce the perf pen= alty a good bit.

This is also true, even remove the refcount = array completely.

> Unfortunately it's not just the refcount = tracking, it's also resowner
> management that gets more expensiv= e.

I didn't read this sentence until I came back to reply.=C2=A0= It is exactly what I noticed.
Once we fix the reference counting = the resowner still puts a floor.
And that is even more important = when a buffer is pinned multiple times
because the resowner will = add one entry to the buffer for each pin.

There is another problem,= =C2=A0ResOwnerReleaseBuffer unlocks buffers,
even if it is not th= e owner of the lock.

I think this deserves a separ= ate thread.


--00000000000015574e064c472da7--