Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbRP-007at1-Tw for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:24:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqbRO-00CjYt-D6 for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:24:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbRO-00CjYk-1W for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 17:24:54 +0000 Received: from mail-ej1-x634.google.com ([2a00:1450:4864:20::634]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tqbRK-001XOi-0y for pgsql-hackers@postgresql.org; Fri, 07 Mar 2025 17:24:53 +0000 Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-abf42913e95so314920166b.2 for ; Fri, 07 Mar 2025 09:24:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741368290; x=1741973090; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OH8q1DAQHKUm71/RwM60uSvzcG/7U2E0ldA0HwnwKwE=; b=YkmlGQvLZUoj/+Q+Rpt1QERG8iqFaeTJT9uROSkt6bnV5/ChJy8tIKi34KgPxmML4w iXl0t0ZPzPMtKxifeznW3WhKGbD2paidGm1ysWQ8thgInQ4MqFoUwnD0FMZjZvMwhHph JexK1gF52FleY/zHdYBHi4uHGRwQr/5SBGaSbknb8/hskEjF2jzUYif7jp2sTPTdH9iC PKW3501ZqEbTCF6flq/rJLZrSm8WS88LOV8em5T6oKzur2dT+JfFLs/u7ZQAh+Uwjimb mmtRnKCnfziMLxIpLfCNizq9xOdq+x7f8c0Lgsg14CDROY27UrHDB2nkFcN3haO3TA1r A1Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741368290; x=1741973090; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OH8q1DAQHKUm71/RwM60uSvzcG/7U2E0ldA0HwnwKwE=; b=SEx0B8hVmMENZkhSJMCtpfq1J2TkmRJzr406zjPaSSX+b8U6mNvCFV5Clgw4HR3zII pUK5M2c1Hb9LH8CkbrbZQY3/fOtyRS4B0EfD3DIRlscTeJYZKzrVU0IkxXW/qT8yMlDO wCe800EeU/nUcyriJoTJ3v908L51yqJdYLb6Ut/qCprhVYLCd/SLeXxYHYVVK913+5+X nKMYKQyWqQsgyyg/hr3l42f6Hx77cQNq372DgXDSKBwZ7yOQWuQSH+4wbDSbZCiBj0nD Ocqzg6K7SjUTG+oUCceEbzwTZITxoXxXKdvmj2W/D1Zbnk/b6eXrWIXEST0aJsDUEkwi gSwQ== X-Gm-Message-State: AOJu0YxAoroIrpfHPvYi+Qomrgsi5QdIUfGWljJEYdcQgw/KHHssDro0 8CxCeS/kIAPld1JzfkO17Z+OUfXtvesmgBTySmw/C8iRX8TF7lm1ZA9sqla521WEOKuqZcAtMfp 2430jvnHqDfCwTUo08XwBjZS+2RQE55jN X-Gm-Gg: ASbGncspxU5ftVkP/5JXq1A3v0CtyvXI69d73/MZlU+HQKicRqFW2OSJHqFuXnvd2pl 1iyVLqKpUVMejksrYBs8bo9SNUvrDdJoZmGOG2E+yYWpW/6weNx3TXQfkZ/6z1VzieUEMQy5DEH 8funnKxm+Y7VqROJMCJJ+fKrwZ2g== X-Google-Smtp-Source: AGHT+IEX+/evB7BIr9x4GYIINdBsu2ELX3eUc1K/lAOtn7CuyZu7moYruk6AsvX143oR9LO1D+O21Q6PF2Te8eDCcLk= X-Received: by 2002:a17:907:7213:b0:abf:3f82:1218 with SMTP id a640c23a62f3a-ac252fba0fdmr448667666b.46.1741368290341; Fri, 07 Mar 2025 09:24:50 -0800 (PST) MIME-Version: 1.0 References: <2muwyx6a5vojkg7iegknhnkcch3lfxptsxk7icwuh7szkvvu2y@vc3ukkfvnu6i> In-Reply-To: From: Alexander Korotkov Date: Fri, 7 Mar 2025 19:24:39 +0200 X-Gm-Features: AQ5f1Jp9qHxmrMZLk6UjjnQ0vokuhP1y6LfRT6BKyGx_bl6s-510ZqdcaHNZeZo Message-ID: Subject: Re: pg_atomic_compare_exchange_*() and memory barriers To: Andres Freund Cc: pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Mar 7, 2025 at 7:15=E2=80=AFPM Alexander Korotkov wrote: > > On Fri, Mar 7, 2025 at 7:07=E2=80=AFPM Andres Freund = wrote: > > On 2025-03-07 18:39:42 +0200, Alexander Korotkov wrote: > > > On Fri, Mar 7, 2025 at 6:02=E2=80=AFPM Andres Freund wrote: > > > > > > > > On 2025-03-07 17:47:08 +0200, Alexander Korotkov wrote: > > > > > While investigating a bug in the patch to get rid of WALBufMappin= gLock, I > > > > > found that the surrounding pg_atomic_compare_exchange_u64() fixes= the > > > > > problem for me. > > > > > > > > That sounds more likely to be due to slowing down things enough to = make a race > > > > less likely to be hit. Or a compiler bug. > > > > > > > > > > > > > That doesn't feel right because, according to the > > > > > comments, both pg_atomic_compare_exchange_u32() and > > > > > pg_atomic_compare_exchange_u64() should provide full memory barri= er > > > > > semantics. So, I decided to investigate this further. > > > > > > > > > > In my case, the implementation of pg_atomic_compare_exchange_u64(= ) is based > > > > > on __atomic_compare_exchange_n(). > > > > > > > > > > static inline bool > > > > > pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *pt= r, > > > > > uint64 *expected, uint64 newv= al) > > > > > { > > > > > AssertPointerAlignment(expected, 8); > > > > > return __atomic_compare_exchange_n(&ptr->value, expected, new= val, false, > > > > > __ATOMIC_SEQ_CST, __ATOMIC= _SEQ_CST); > > > > > } > > > > > > > > > > According to the docs __ATOMIC_SEQ_CST enforces total ordering wi= th *all > > > > > other __ATOMIC_SEQ_CST operations*. It only says about other > > > > > __ATOMIC_SEQ_CST operations; nothing is said about regular reads/= writes. > > > > > This sounds quite far from our comment, promising full barrier se= mantics, > > > > > which, in my understanding, means the equivalent of > > > > > both pg_read_barrier()/pg_write_barrier(), which should barrier *= all other > > > > > read/writes*. > > > > > > > > A memory barrier in one process/thread always needs be paired with = a barrier > > > > in another process/thread. It's not enough to have a memory barrier= on one > > > > side but not the other. > > > > > > Yes, there surely should be a memory barrier on another side. But > > > does __ATOMIC_SEQ_CST works as a barrier for the regular read/write > > > operations on the same side? > > > > Yes, if it's paired with another barrier on the other side. The regular > > read/write operations are basically protected transitively, due to > > > > a) An acquire barrier preventing non-atomic reads/writes in the same th= read > > from appearing to have been moved to before the barrier > > > > b) A release barrier preventing non-atomic reads/writes in the same thr= ead > > from appearing to have been moved to after the barrier > > > > c) The other thread being guaranteed a) and b) for the other threads' > > non-atomic reads/writes depending on the type of barrier > > > > d) The atomic value itself being guaranteed to be, well, atomic > > Yes, looks good to me. > > > > > > This function first checks if LSE instructions are present. If s= o, > > > > > instruction LSE casal is used. If not (in my case), the loop of > > > > > ldaxr/stlxr is used instead. The documentation says both ldaxr/s= tlxr > > > > > provides one-way barriers. Read/writes after ldaxr will be obser= ved after, > > > > > and read/writes before stlxr will be observed before. So, a pair= of these > > > > > instructions provides a full memory barrier. However, if we don'= t observe > > > > > the expected value, only ldaxr gets executed. So, an unsuccessfu= l > > > > > pg_atomic_compare_exchange_u64() attempt will leave us with a one= -way > > > > > barrier, and that caused a bug in my case. > > > > > > > > That has to be a compiler bug. We specify __ATOMIC_SEQ_CST for bot= h > > > > success_memorder *and* failure_memorder. > > > > > > > > What compiler & version is this? > > > > > > I've tried and got the same results with two compilers. > > > gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 > > > Ubuntu clang version 19.1.1 (1ubuntu1~24.04.2) > > > > Thinking more about it I wonder if the behaviour of not doing a release= in > > case the atomic fails *might* arguably actually be correct - if the > > compare-exchange fails, there's nothing that the non-atomic values coul= d be > > ordered against. > > > > What is the access pattern and the observed problems with it that made = you > > look at the disassembly? > > Check this code. > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); > > /* > * Try to advance XLogCtl->InitializedUpTo. > * > * If the CAS operation failed, then some of previous pages are n= ot > * initialized yet, and this backend gives up. > * > * Since initializer of next page might give up on advancing of > * InitializedUpTo, this backend have to attempt advancing until = it > * find page "in the past" or concurrent backend succeeded at > * advancing. When we finish advancing XLogCtl->InitializedUpTo,= we > * notify all the waiters with XLogCtl->InitializedUpToCondVar. > */ > l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, = &NewPageBeginPtr, NewPageEndPtr)) > { > NewPageBeginPtr =3D NewPageEndPtr; > NewPageEndPtr =3D NewPageBeginPtr + XLOG_BLCKSZ; > nextidx =3D XLogRecPtrToBufIdx(NewPageBeginPtr); > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) !=3D NewP= ageEndPtr) > { > /* > * Page at nextidx wasn't initialized yet, so we cann't m= ove > * InitializedUpto further. It will be moved by backend w= hich > * will initialize nextidx. > */ > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondV= ar); > break; > } > } > > Consider the following execution order with process 1 (p1) and process 2 = (p2). > > 1. p1 executes l1 > 2. p2 executes l2 with success > 3. p1 executes l2 with failure > 4. p2 execute l3, but doesn't see the results of step 1, because 3 didn't= provide enough of memory barrier Sorry, I messed this up. The correct sequence is following. 1. p1 executes l1 2. p1 executes l2 with failure 3. p2 executes l2 with success 4. p2 execute l3, but doesn't see the results of step 1, because 3 didn't provide enough of memory barrier Note that step 2 would be successful if it would be after step 3. ------ Regards, Alexander Korotkov Supabase