Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbuW-007hFh-0h for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:55:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqbuU-00DeJA-HU for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:54:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbuU-00DeJ1-5S for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 17:54:58 +0000 Received: from mail-ed1-x529.google.com ([2a00:1450:4864:20::529]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tqbuS-001XIn-10 for pgsql-hackers@postgresql.org; Fri, 07 Mar 2025 17:54:57 +0000 Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-5e5e63162a0so1317040a12.3 for ; Fri, 07 Mar 2025 09:54:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741370095; x=1741974895; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nrFK1+Ti+9AkG22+0LU7JYUKkRAwuFyd74QRuiY42cY=; b=HdoZ9Aptesc9TF4dqWy0jTTOvxXzrUP/iDAx1r/K2hcoCPCeZLuVjYrtIa0Gxno9AY oy/ZHFXZ07YK2be3wSxG0Eu+B8S+beFCvz95FZfXcNk+WoxRISbI5VDnQ9SYcHgstsL5 FCAudzUPaVhT7ZbU6ADoqHZSwPpsY7SZ5oBw+Pif0rafdSsTvx55VOcYZhMQibSal741 dCqhXpbNEszG85/wM83xVA+HynLEisfq60K5jCsL2IduNj3IOoXZCC5nCbzAfyD8z+9R 1dV1XKMd2UDgQFXBI9ah5KIwi1DzaeFcS7rgZScgNf07bm/H7FD8TuIihdgJ/33Dq3ko YXnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741370095; x=1741974895; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nrFK1+Ti+9AkG22+0LU7JYUKkRAwuFyd74QRuiY42cY=; b=nPE1gjji0zlXoLt6spX66BF02NA2EpG2TXMhmPLQ3DO00P6hJ8vh9h1NtEZem9a64S QDWHssfj05yPJjwR7zG+E9A+PuMFrT2MfqA28GzBlv25O8JtaO+5ahSoRIdbREqQkTbt a+4xasWykD/l42xCUGB2Fui4VlQq2XZZGp65z6jrxqWOZtTklnWBi4u/g5YX6O93Hayz lIR3rBMAF3mWZXyR8jG5V2M9bN+EC8qg0/uaUgscBDzvA7fnOCU2CMX7taymUBhJQbjD 3EBkIAD+QjIUSxomu5EIQKvOqyvTsUmRBCqTbfwW/d4RlK5xtTShKJEMiLSAHC4fJLQf /eGA== X-Gm-Message-State: AOJu0YwW1g33HltMoeC7ur6oI9ozVYCzeYRqznDnDM9SQUgmpKImDHDQ NSeqfECubULCxhK+TM3NttfAtwQhIZngFF2rw6VUdNuHqpy8TfvEEO/XBwxLHagrhyFKRajVga+ HqsGlYSNXNjxUcB1ixRL7AAU4ATHOND0f X-Gm-Gg: ASbGncuHuPgxL4JLVdTWadbX1t9wPlG9cpHvhijWzUp2ejoL54EfC1MzBewGFBIRKwv NJTSPN0Vd+UwbFm7l+/aIoY4BZJ3UluNykxJ9X7W6GNIVcQB5+841dUxyZmLwU+rgFfoJ2EBvy7 r1g3iehZinPejpdq7KEwsMvCwRLQ== X-Google-Smtp-Source: AGHT+IGy+7SeXadz1OD+CEJYH/2v2U92Aiwo4bEpy5SIwUpL0tsyOiOZ5QW0jrFwhZW89CKpTgTcuBURklgbg2lxjEs= X-Received: by 2002:a17:907:7256:b0:ac1:f002:d85d with SMTP id a640c23a62f3a-ac252737f57mr438086566b.6.1741370094914; Fri, 07 Mar 2025 09:54:54 -0800 (PST) MIME-Version: 1.0 References: <2muwyx6a5vojkg7iegknhnkcch3lfxptsxk7icwuh7szkvvu2y@vc3ukkfvnu6i> In-Reply-To: From: Alexander Korotkov Date: Fri, 7 Mar 2025 19:54:43 +0200 X-Gm-Features: AQ5f1JoMVlUvZIZ_vs1TJYM-aswh3ZbtVl6UM8XOqxBugRCq6422MCX_R9CMptg Message-ID: Subject: Re: pg_atomic_compare_exchange_*() and memory barriers To: Andres Freund Cc: pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Mar 7, 2025 at 7:46=E2=80=AFPM Andres Freund w= rote: > On 2025-03-07 19:44:20 +0200, Alexander Korotkov wrote: > > On Fri, Mar 7, 2025 at 7:38=E2=80=AFPM Andres Freund wrote: > > > On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote: > > > > On Fri, Mar 7, 2025 at 7:07=E2=80=AFPM Andres Freund wrote: > > > > > What is the access pattern and the observed problems with it that= made you > > > > > look at the disassembly? > > > > > > > > Check this code. > > > > > > > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEnd= Ptr); > > > > > > > /* > > > > * Try to advance XLogCtl->InitializedUpTo. > > > > * > > > > * If the CAS operation failed, then some of previous pages= are not > > > > * initialized yet, and this backend gives up. > > > > * > > > > * Since initializer of next page might give up on advancin= g of > > > > * InitializedUpTo, this backend have to attempt advancing = until it > > > > * find page "in the past" or concurrent backend succeeded = at > > > > * advancing. When we finish advancing XLogCtl->Initialize= dUpTo, we > > > > * notify all the waiters with XLogCtl->InitializedUpToCond= Var. > > > > */ > > > > l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->Initialized= UpTo, > > > > &NewPageBeginPtr, NewPageEndPtr)) > > > > { > > > > NewPageBeginPtr =3D NewPageEndPtr; > > > > NewPageEndPtr =3D NewPageBeginPtr + XLOG_BLCKSZ; > > > > nextidx =3D XLogRecPtrToBufIdx(NewPageBeginPtr); > > > > > > > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != =3D > > > > NewPageEndPtr) > > > > { > > > > /* > > > > * Page at nextidx wasn't initialized yet, so we ca= nn't move > > > > * InitializedUpto further. It will be moved by bac= kend > > > > which > > > > * will initialize nextidx. > > > > */ > > > > > > > > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); > > > > break; > > > > } > > > > } > > > > > > > > Consider the following execution order with process 1 (p1) and proc= ess 2 > > > > (p2). > > > > > > On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote: > > > > Sorry, I messed this up. > > > > The correct sequence is following. > > > > > > > > 1. p1 executes l1 > > > > 2. p1 executes l2 with failure > > > > 3. p2 executes l2 with success > > > > 4. p2 execute l3, but doesn't see the results of step 1, because 3 > > > > didn't provide enough of memory barrier > > > > > > Did you mean because 2) didn't provide enough of a memory barrier? Be= cause 3) > > > does, right? > > > > Yes, exactly. > > > > > You could get in exactly same the situation if the p1 is scheduled ou= t by the > > > OS after step 1, no? > > > > No. In that case, p1 will execute l2 with success. p1 executes l2 > > with failure only because it goes before p2 executes l2. > > In my scenario p1 will not execute l2 because p2 gets scheduled before it= can > do so. So p1 cant yet execute l2 before p2 executes l2. In my scenario p1.l2 will be successful if executed after p2.l2 and failed if executed before p2.l2. Imagine initial value is 0. p1 tries to change 1=3D>2, while p2 tries to change 0 =3D> 1. ------ Regards, Alexander Korotkov Supabase