Re: Addressing buffer private reference count scalability issue

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Andres Freund <[email protected]>
To: Alexandre Felipe <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Addressing buffer private reference count scalability issue
Date: Sun, 8 Mar 2026 13:09:32 -0400
Message-ID: <krknshnvus4qhehtoqtwnroemgxqwlfmykark6umd6hf64xnku@ibxx734ds3ga> (raw)
In-Reply-To: <CAE8JnxNTETEUiAOF31=_yo=pvyAi9npOeJfcTvEJJbi4vomtYA@mail.gmail.com>
References: <CAE8JnxNTETEUiAOF31=_yo=pvyAi9npOeJfcTvEJJbi4vomtYA@mail.gmail.com>

Hi,

On 2026-03-08 16:09:07 +0000, Alexandre Felipe wrote:
>    2.
> 
>    Refactoring reference counting: Before starting to change code and
>    potentially breaking things I considered prudent to isolate it to limit the
>    damage. This code was part of a 8k+ LOC file.

Not allowing, at least the fast paths, to be inlined will make the most common
cases measurably slower, I've tried that.


>    3.
> 
>    Using simplehash: Simply replacing the HTAB for a simplehash, and adding
>    a new set of macros SH_ENTRY_EMPTY, SH_MAKE_EMPTY, SH_MAKE_IN_USE.

Yea, we'll need to that. Peter Geoghegan has a patch for it as well.


> Here I assume that the buffer buffer sequence is independent enough from the
> array size, so I use the buffer as the hash key directly, omitting a hash
> function call.

I doubt that that's good enough. The hash table code relies on the bits being
well mixed, but you won't get that with buffer ids.


>    4.
> 
>    Compact PrivateRefCountEntry: The original implementation used a 4-byte
>    key and 8-byte value. Reference count uses 32 bits, and it is unreasonable
>    to expect one backend to pin the same buffer 1 billion times. The lock mode
>    uses 32 bits but can only assume 4 values. So I packed them in one single
>    uint32, giving 30 bits for count and 2 bits for lock mode. This makes the
>    entries 8-byte long, on 64-bit CPUs this represents more than a 1/3
>    reduction in memory. This makes the array aligned with the 64-bit words,
>    copying one entry can be completed in one instruction, and every entry will
>    be aligned.
>    5.

I'm rather sceptical that the overhead of needing to shift and mask is worth
it. I suspect we'll also want to add more state for each entry (e.g. I think
it may be worth getting rid of the io-in-progress resowner).



>    REFCOUNT_ARRAY_ENTRIES=0: since the simplehash is basically some array
>    lookup, it is worth trying to remove it completely and keep only the hash.
>    For small values we would be trading a few branches for a buffer % SIZE,
>    for the use case of prefetch where pin/unpin in a FIFO fashion, it will
>    save an 8-entry array lookup, and some extra data moves.

I doubt that that's ok, in the vast majority of access we will have 0-2
buffers pinned. And even when we have pinned more buffers, it's *exceedingly*
common to access the same entry repeatedly (e.g. pin, lock, unlock, unpin),
adding a few cycles to each of those repeated accesses will quickly show up.


> From 56bfdd6d7296397a7b3f0b282e239fdc86d6580d Mon Sep 17 00:00:00 2001
> From: Alexandre Felipe <[email protected]>
> Date: Fri, 6 Mar 2026 17:15:44 +0000
> Subject: [PATCH 4/5] Compact PrivateRefCountEntry
> 
> The previous implementation used an 8-bite (64-bit) entry to store
> a uint32 count and an uint32 lock mode. That is fine when we store
> the data separate from the key (buffer). But in the simplehash
> {key, value} are stored together, so each entry is 12-bytes.
> This is somewhat awkward as we have to either pad the entry to 16-bytes,
> or the access will be an alternating aligned/misaligned addreses.
> 
> However, we are probably not going to need even 16-bits for the count
> and 2-bits is enough for the lock mode. So we can easily pack these
> int to a single uint32.

I wouldn't want to rely on a 16bit pin counter anyway.


> Incrementing/decrementing the count continue the same, just using
> 4 instead of 1, lock mode access will require one or two additional
> bitwise operations.
> 
> No bit-shifts are required.

I don't know how that last sentence can be true, given that:

> -struct PrivateRefCountEntry
> +#define PRIVATEREFCOUNT_LOCKMODE_MASK	0x3
> +#define ONE_PRIVATE_REFERENCE			4	/* 1 << 2 */
> +
> +#define PrivateRefCountGetLockMode(d)	((BufferLockMode)((d) & PRIVATEREFCOUNT_LOCKMODE_MASK))
> +#define PrivateRefCountSetLockMode(d, m) ((d) = ((d) & ~PRIVATEREFCOUNT_LOCKMODE_MASK) | (m))
> +#define PrivateRefCountGetRefCount(d)	((int32)((d) >> 2))
> +#define PrivateRefCountIsZero(d)		((d) < ONE_PRIVATE_REFERENCE)

Involves shifts at least in PrivateRefCountGetRefCount()..


Greetings,

Andres Freund

view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Addressing buffer private reference count scalability issue
  In-Reply-To: <krknshnvus4qhehtoqtwnroemgxqwlfmykark6umd6hf64xnku@ibxx734ds3ga>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox