public inbox for [email protected]
help / color / mirror / Atom feedFrom: Alexander Korotkov <[email protected]>
To: Andres Freund <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers
Date: Sat, 8 Mar 2025 14:12:13 +0200
Message-ID: <CAPpHfdv5y63auGJ_QGJ7VDA1z7cS+YcfUtgAxGit5c2EApbMBA@mail.gmail.com> (raw)
In-Reply-To: <CAPpHfduxVxkZpCaYRv_whvNyPxCTSyEgXR02sbo=mhGF0MDQEg@mail.gmail.com>
References: <CAPpHfdudejddHLLw=Uk_7pZ=1g2SZXfwubSgmWUhJ7fBUtR9rg@mail.gmail.com>
<2muwyx6a5vojkg7iegknhnkcch3lfxptsxk7icwuh7szkvvu2y@vc3ukkfvnu6i>
<CAPpHfdvucbQ9w7_eJYefUUp6w-WZUes+SRfq+GKfhukhs7kLqg@mail.gmail.com>
<oc4iicdkwyhdf5o5vbwsl7jdlqnds37xtf27wuxvhy3abxoo6i@4ek3xp5j6niy>
<CAPpHfdvp=_1NRF4YFFp9Oii7mRR5V4b-C2aukbuLQjWMjynYrw@mail.gmail.com>
<aw3hirtizbn42fkl57bjeafzws3b2bvhknimbxyoi23i43sajb@i65p2ubb6zte>
<CAPpHfdvp3vNVp5_Rx1RtwLqhjbzzUxwkqzvsh=1E3A+iePvWBg@mail.gmail.com>
<vwtct75cykxo3rxjipye2bfvkaftncps4ycdock5vbvnwqtte5@h2hklzn36ckm>
<CAPpHfduxVxkZpCaYRv_whvNyPxCTSyEgXR02sbo=mhGF0MDQEg@mail.gmail.com>
Hi, Andres!
On Fri, Mar 7, 2025 at 7:54 PM Alexander Korotkov <[email protected]>
wrote:
> On Fri, Mar 7, 2025 at 7:46 PM Andres Freund <[email protected]> wrote:
> > On 2025-03-07 19:44:20 +0200, Alexander Korotkov wrote:
> > > On Fri, Mar 7, 2025 at 7:38 PM Andres Freund <[email protected]>
wrote:
> > > > On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote:
> > > > > On Fri, Mar 7, 2025 at 7:07 PM Andres Freund <[email protected]>
wrote:
> > > > > > What is the access pattern and the observed problems with it
that made you
> > > > > > look at the disassembly?
> > > > >
> > > > > Check this code.
> > > > >
> > > > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx],
NewPageEndPtr);
> > > >
> > > > > /*
> > > > > * Try to advance XLogCtl->InitializedUpTo.
> > > > > *
> > > > > * If the CAS operation failed, then some of previous
pages are not
> > > > > * initialized yet, and this backend gives up.
> > > > > *
> > > > > * Since initializer of next page might give up on
advancing of
> > > > > * InitializedUpTo, this backend have to attempt
advancing until it
> > > > > * find page "in the past" or concurrent backend
succeeded at
> > > > > * advancing. When we finish advancing
XLogCtl->InitializedUpTo, we
> > > > > * notify all the waiters with
XLogCtl->InitializedUpToCondVar.
> > > > > */
> > > > > l2: while
(pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo,
> > > > > &NewPageBeginPtr, NewPageEndPtr))
> > > > > {
> > > > > NewPageBeginPtr = NewPageEndPtr;
> > > > > NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ;
> > > > > nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr);
> > > > >
> > > > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) !=
> > > > > NewPageEndPtr)
> > > > > {
> > > > > /*
> > > > > * Page at nextidx wasn't initialized yet, so we
cann't move
> > > > > * InitializedUpto further. It will be moved by
backend
> > > > > which
> > > > > * will initialize nextidx.
> > > > > */
> > > > >
> > > > > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar);
> > > > > break;
> > > > > }
> > > > > }
> > > > >
> > > > > Consider the following execution order with process 1 (p1) and
process 2
> > > > > (p2).
> > > >
> > > > On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote:
> > > > > Sorry, I messed this up.
> > > > > The correct sequence is following.
> > > > >
> > > > > 1. p1 executes l1
> > > > > 2. p1 executes l2 with failure
> > > > > 3. p2 executes l2 with success
> > > > > 4. p2 execute l3, but doesn't see the results of step 1, because 3
> > > > > didn't provide enough of memory barrier
> > > >
> > > > Did you mean because 2) didn't provide enough of a memory barrier?
Because 3)
> > > > does, right?
> > >
> > > Yes, exactly.
> > >
> > > > You could get in exactly same the situation if the p1 is scheduled
out by the
> > > > OS after step 1, no?
> > >
> > > No. In that case, p1 will execute l2 with success. p1 executes l2
> > > with failure only because it goes before p2 executes l2.
> >
> > In my scenario p1 will not execute l2 because p2 gets scheduled before
it can
> > do so. So p1 cant yet execute l2 before p2 executes l2.
>
> In my scenario p1.l2 will be successful if executed after p2.l2 and
> failed if executed before p2.l2. Imagine initial value is 0. p1
> tries to change 1=>2, while p2 tries to change 0 => 1.
My explanation was cumbersome form the beginning. Let me come with more
clear example.
Let us have a shared memory variable v, and shared memory flag f. The
initial state is as following.
v = 0
f = false
Two processes p1 and p2 runs the following programs in pseudocode. Atomic
compare-exchange operation is represented as CAS(variable, oldval,
newval). In this example we don't need to update oldval on failure or even
know whether operation is successful.
p1:
CAS(v, 0, 1)
if f:
CAS(v, 1, 2)
p2:
f = true
CAS(v, 1, 2)
I think if CAS() implements full barrier semantics, the invariant is that v
eventually get the value 2. If CAS(v, 1, 2) by p2 goes before CAS(v, 0, 1)
by p1, p1 will reliably see f == true and apply CAS(v, 1, 2).
If CAS() don't implement full barrier on failure this invariant gets broken
(that is CAS() by p2 fails, but p1 see f == false).
I'm not an expert in formal specifications of memory models. But I'm quite
surprised we're discussing whether memory barrier on compare-exchange
failure might matter. For me at least the fact
that __atomic_compare_exchange_n() have failure_memorder argument is a
quite an evidence of that.
------
Regards,
Alexander Korotkov
Supabase
view thread (20+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers
In-Reply-To: <CAPpHfdv5y63auGJ_QGJ7VDA1z7cS+YcfUtgAxGit5c2EApbMBA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox