public inbox for [email protected]
help / color / mirror / Atom feedFrom: Andres Freund <[email protected]>
To: Yura Sokolov <[email protected]>
Cc: Melanie Plageman <[email protected]>
Cc: Noah Misch <[email protected]>
Cc: Heikki Linnakangas <[email protected]>
Cc: Kirill Reshke <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: [email protected]
Cc: Thomas Munro <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Michael Paquier <[email protected]>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: Tue, 31 Mar 2026 18:05:46 -0400
Message-ID: <dsmizugtxgrp7zgkndnr6gl5sszy7ckbjmffsacfgstajrpjid@kjoodzobjcex> (raw)
In-Reply-To: <[email protected]>
References: <jtg5cu4n6h5lib3kzx66ju4yhh6kmviaud7oq6dtut6c4q4rdi@xwsfoagt3c2b>
<k5j77f3q6ztihnjnx2nqxzyor6fbj2qxcbhzuxhkh2yy63jyfg@p72phigar3n4>
<5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d>
<[email protected]>
<[email protected]>
<mheeefrtikvgjnjsenocvo3afj7vlpr5rljkzevssxs77n2zdt@sugqtjhryydo>
<CAAKRu_Z=xrBWhAB5wRG8x-gnjNa7ZnW-sSsEbBfR1C8gwuTD8g@mail.gmail.com>
<yn4f45z6wl6wstuwha3qjsicrc6uvymkldtqz2ttchz7zdue3y@n6fll5dcter4>
<q6trx7a5vd3j5tddamwxahrojopboov6eeue2jkjmfpkhdfgjj@w3hapotwxroo>
<[email protected]>
Hi,
On 2026-03-31 19:02:33 +0300, Yura Sokolov wrote:
> 27.03.2026 23:00, Andres Freund wrote:
> > On 2026-03-25 18:35:55 -0400, Andres Freund wrote:
> >> Running it through valgrind and then will work on reading through one more
> >> time and pushing them.
> >
> > And done.
> >
> > Phew, this project took way longer than I'd though it'd take.
>
> In addition to bug with BM_IO_ERROR [1] , I found race condition in
> PinBuffer in this lines of code:
>
> if (unlikely(skip_if_not_valid && !(old_buf_state & BM_VALID)))
> return false;
>
> /*
> * We're not allowed to increase the refcount while the buffer
> * header spinlock is held. Wait for the lock to be released.
> */
> if (old_buf_state & BM_LOCKED)
> old_buf_state = WaitBufHdrUnlocked(buf);
>
> While we waited for buffer header for being unlocked, it may become
> invalid, isn't it?
> Therefore, check related to skip_if_not_valid have to happen after waiting.
Yea, that does seem wrong. Not sure how it ended up that way.
I think it may be better to add a continue after the WaitBufHdrUnlocked(), so
that we restart the loop, rather than moving the skip_if_not_valid check.
> ....
>
> Another question: previously we had to wait for buffer for being unlocked
> because UnlockBufHdr wrote to buf->state unconditionally, therefore our pin
> increment could be lost.
> Now UnlockBufHdr and UnlockBufHdrExt does proper atomic operations and
> preserves concurrent changes. Are we still need to wait?
Yes.
> Most of time PinBuffer is called protected by BufTable's partition LWLock,
> therefore buffer may not be changed in dramatic way.
I don't think the partition locks are sufficient protection for everything. We
have a few places in the code that want to be able to modify the buffer state
depending on whether the buffer is already pinned, and I don't think all of
them currently hold the relevant buffer mapping partition's lock. If pinning
were not to wait for an existing header lock, such checks would not easily be
doable.
Perhaps we could fix all the relevant places by acquiring the partition lock
in a few more places. But I think that'd be going in exactly the opposite
direction we should to. The partition locks are quite contended locks and we
should work on getting rid of them eventually. Building them into the
protection model seems quite unwise.
I think many of the places that currently do rely on the buffer header
spinlock can be converted to CAS loops.
I'm not really sure how much that's worth though - the WaitBufHdrLocked() in
PinBuffer() is pretty hard to hit in realistic workloads. What would be really
nice, is to be able to replace the CAS() with an atomic add (since those are
considerably faster), but that's not really possible regardless of the need
for WaitBufHdrLocked(), because we can't just add BUF_USAGECOUNT_ONE, as that
would allow increasing the usage count too far.
I would like to eventually narrow the definition of the buffer header spinlock
to just be about the "identity" of the buffer, which then would only be needed
by things like DropRelationBuffers() and when changing the buffer's identity.
> But call in ReadRecentBuffer is the exception. It is not protected by
> partition lock and have to make additional checks. That is why you
> introduced skip_if_not_valid.
>
> Does optimization of ReadRecentBuffer pays for WaitBufHdrUnlocked?
As mentioned above, I don't think it's just ReadRecentBuffer that relies on
the buffer header spinlock preventing new pins.
Greetings,
Andres Freund
view thread (83+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
In-Reply-To: <dsmizugtxgrp7zgkndnr6gl5sszy7ckbjmffsacfgstajrpjid@kjoodzobjcex>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox