Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqZv3-0078DN-77 for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 15:47:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqZv1-00902q-P8 for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 15:47:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqZv1-00902h-5B for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 15:47:23 +0000 Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tqZuy-001W1r-2z for pgsql-hackers@postgresql.org; Fri, 07 Mar 2025 15:47:22 +0000 Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-ac202264f9cso375482166b.0 for ; Fri, 07 Mar 2025 07:47:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741362439; x=1741967239; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=TfJRTn3Crb5D9CJEb2pKPUPcNj0+0zm7iSO7c3StUGI=; b=RmXEU+yYxN5fzqtIftZ6aeHJBaiACeNdXVIfmKOmnLUo0ImzdrHPDurJJFtCxgkzG2 5tqmdMR8LDnqxgMkpauVI+9k3vlYcbM4OTjtFjLVIxBSkwiA54j5nnkOEDi5ZrUgL5I8 LylpQn8tU/ldPeJInvxkQpkDPEhI/kSvvSKQKdko3nfduYKiK/joyITOZVQLh6RGTkD6 89VAABJzD5DlGyT85Wt2LhORRP7KLbvPxr22bUiVBeuiZrTa2erC4MfzmMdgej7EWssp oXnw9WBhd/IPmdDCxil1AwmjyBgAsuZtLHM0RG9dBI2CKSQikvC4emSjICbdZp8jBpgg NuCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741362439; x=1741967239; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=TfJRTn3Crb5D9CJEb2pKPUPcNj0+0zm7iSO7c3StUGI=; b=Y5/tDhmbWm53Iy/6RwtMdpeF9rxknBN5IPn8mfh8bp3rA6pcHSMpD6cKIjMrEsSxtY ktIKsJT/fjIoBAYguzgLrMB2EkuisNNk4LHl4AW9cEnpP+sqcaHxccHFMlOhI2k24MlI 7TU/vJJlm5FnfGPV4V2kPGQtGdjvDhku+bvRHqWPpTgEVTh2Zr7xyNCUEYywCAfRM9Wq vkw98HvtaBuLxGbl8NwlaPPAEQ+zk+/u0owmygix2nUqAh35wwLByY9NmOIALn8h3/4m /yr595ybr0mZAtEjDL8AWEwlIsb+6+enSZoK3rlbpegSlEFa3i1CcQk3mfe8H6hRrI1D VHBQ== X-Gm-Message-State: AOJu0YxrZnd7mm6u2+BpMuvZJNtqUr8dxEvipm5w1eXhhO/Af3YgBiYR Uo3fuIROOOVL1fGfny5pPM4NW7o4+Z5aEse0vxB0RkvfnfBe8M4sJVEsTm1u0W5o8loDGE/ukYa bfbZe97jxuv6vsNSHiT4gRmoQRBZV0BD3 X-Gm-Gg: ASbGncvXDENCQRdosBrI7odHkEek8MdVKwYJ+np1rFSAe7ClGQsUf2bftJwDUero/mh 7RE9q87umfGE+fhK4Y0OOw32rH9TIha4+2wgYuzbbVV7p/6EOlGE4nKcv0xyxdqFve2XY/nLrh/ cB7w0072VpK9J7dmNWbbk54Jye8w== X-Google-Smtp-Source: AGHT+IEYIzlJv1VckWnvxAppYSU2rQRb55sXs3IVoMWg8JO7JR5GWZCmokOjTA+v26/44fix3aOqeQPVMcDx2e97xh4= X-Received: by 2002:a17:907:a089:b0:abf:3cb2:1c03 with SMTP id a640c23a62f3a-ac252732890mr318509766b.8.1741362439293; Fri, 07 Mar 2025 07:47:19 -0800 (PST) MIME-Version: 1.0 From: Alexander Korotkov Date: Fri, 7 Mar 2025 17:47:08 +0200 X-Gm-Features: AQ5f1JrShv4xtO6f5Zh3Ol35PvbOcWk4EVYcaiCgaN0lbI5r2iszWMFe4ddhx0U Message-ID: Subject: pg_atomic_compare_exchange_*() and memory barriers To: pgsql-hackers Content-Type: multipart/alternative; boundary="0000000000009e97e7062fc289bf" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009e97e7062fc289bf Content-Type: text/plain; charset="UTF-8" Hi all, While investigating a bug in the patch to get rid of WALBufMappingLock, I found that the surrounding pg_atomic_compare_exchange_u64() fixes the problem for me. That doesn't feel right because, according to the comments, both pg_atomic_compare_exchange_u32() and pg_atomic_compare_exchange_u64() should provide full memory barrier semantics. So, I decided to investigate this further. In my case, the implementation of pg_atomic_compare_exchange_u64() is based on __atomic_compare_exchange_n(). static inline bool pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 *expected, uint64 newval) { AssertPointerAlignment(expected, 8); return __atomic_compare_exchange_n(&ptr->value, expected, newval, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST); } According to the docs __ATOMIC_SEQ_CST enforces total ordering with *all other __ATOMIC_SEQ_CST operations*. It only says about other __ATOMIC_SEQ_CST operations; nothing is said about regular reads/writes. This sounds quite far from our comment, promising full barrier semantics, which, in my understanding, means the equivalent of both pg_read_barrier()/pg_write_barrier(), which should barrier *all other read/writes*. I've found that __atomic_compare_exchange_n() ends up with usage of __aarch64_cas8_acq_rel, which disassembles as following. 0000000000000860 <__aarch64_cas8_acq_rel>: 860: d503245f bti c 864: 90000110 adrp x16, 20000 <__data_start> 868: 3940a210 ldrb w16, [x16, #40] 86c: 34000070 cbz w16, 878 <__aarch64_cas8_acq_rel+0x18> 870: c8e0fc41 casal x0, x1, [x2] 874: d65f03c0 ret 878: aa0003f0 mov x16, x0 87c: c85ffc40 ldaxr x0, [x2] 880: eb10001f cmp x0, x16 884: 54000061 b.ne 890 <__aarch64_cas8_acq_rel+0x30> // b.any 888: c811fc41 stlxr w17, x1, [x2] 88c: 35ffff91 cbnz w17, 87c <__aarch64_cas8_acq_rel+0x1c> 890: d65f03c0 ret This function first checks if LSE instructions are present. If so, instruction LSE casal is used. If not (in my case), the loop of ldaxr/stlxr is used instead. The documentation says both ldaxr/stlxr provides one-way barriers. Read/writes after ldaxr will be observed after, and read/writes before stlxr will be observed before. So, a pair of these instructions provides a full memory barrier. However, if we don't observe the expected value, only ldaxr gets executed. So, an unsuccessful pg_atomic_compare_exchange_u64() attempt will leave us with a one-way barrier, and that caused a bug in my case. In contrast, __sync_val_compare_and_swap() uses __aarch64_cas8_sync, which is quite the same as __aarch64_cas8_acq_rel, but has an explicit memory barrier in the end. I have a couple of ideas on how to fix this problem. 1. Provide full barrier semantics for pg_atomic_compare_exchange_*(). In particular, for __atomic_compare_exchange_n(), we should probably use __ATOMIC_ACQ_REL for success and run an explicit memory barrier in the case of failure. 2. Document that pg_atomic_compare_exchange_*() doesn't necessarily provide a memory barrier on failure. But then we need to carefully check if existing use-cases need explicit memory barriers now. Any thoughts? Links 1. https://www.postgresql.org/message-id/CAPpHfdsWcQb-u-9K%3DipneBf8CMhoUuBWKYc%2BXWJEHVdtONOepQ%40mail.gmail.com 2. https://developer.arm.com/documentation/100941/0101/Barriers ------ Regards, Alexander Korotkov Supabase --0000000000009e97e7062fc289bf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

While investigating a bug in the patch to g= et rid of WALBufMappingLock, I found that the surrounding pg_atomic_compare= _exchange_u64() fixes the problem for me.=C2=A0 That doesn't feel right= because, according to the comments, both pg_atomic_compare_exchange_u32() = and pg_atomic_compare_exchange_u64() should provide full memory barrier sem= antics.=C2=A0 So, I decided to investigate this further.

In my case,= the implementation of pg_atomic_compare_exchange_u64() is based on __atomi= c_compare_exchange_n().

static inline bool=
pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 uint64 *expected, uint= 64 newval)
{
=C2=A0 =C2=A0 AssertPointerAlignment(expected, 8);
= =C2=A0 =C2=A0 return __atomic_compare_exchange_n(&ptr->value, expect= ed, newval, false,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);
}

Accord= ing to the docs=C2=A0__ATOMIC_SEQ_CST enforces total ordering with all o= ther __ATOMIC_SEQ_CST operations.=C2=A0 It only says about other __ATOM= IC_SEQ_CST operations; nothing is said about regular reads/writes.=C2=A0 Th= is sounds quite far from our comment, promising full barrier semantics, whi= ch, in my understanding, means the equivalent of both=C2=A0pg_read_barrier(= )/pg_write_barrier(), which should barrier all other read/writes.
I've found that=C2=A0__atomic_compare_exchange_n() end= s up with usage of=C2=A0__aarch64_cas8_acq_rel, which disassembles as follo= wing.

0000000000000860 &l= t;__aarch64_cas8_acq_rel>:
=C2=A0860: =C2=A0 d503245f =C2=A0 =C2=A0bt= i c
=C2=A0864: =C2=A0 90000110 =C2=A0 =C2=A0adrp =C2=A0 =C2=A0x16, 20000= <__data_start>
=C2=A0868: =C2=A0 3940a210 =C2=A0 =C2=A0ldrb =C2= =A0 =C2=A0w16, [x16, #40]
=C2=A086c: =C2=A0 34000070 =C2=A0 =C2=A0cbz w1= 6, 878 <__aarch64_cas8_acq_rel+0x18>
=C2=A0870: =C2=A0 c8e0fc41 = =C2=A0 =C2=A0casal =C2=A0 x0, x1, [x2]
=C2=A0874: =C2=A0 d65f03c0 =C2=A0= =C2=A0ret
=C2=A0878: =C2=A0 aa0003f0 =C2=A0 =C2=A0mov x16, x0
=C2=A0= 87c: =C2=A0 c85ffc40 =C2=A0 =C2=A0ldaxr =C2=A0 x0, [x2]
=C2=A0880: =C2= =A0 eb10001f =C2=A0 =C2=A0cmp x0, x16
=C2=A0884: =C2=A0 54000061 =C2=A0 = =C2=A0b.ne =C2=A0 =C2=A0890 <__aarch64_cas8_= acq_rel+0x30> =C2=A0// b.any
=C2=A0888: =C2=A0 c811fc41 =C2=A0 =C2=A0= stlxr =C2=A0 w17, x1, [x2]
=C2=A088c: =C2=A0 35ffff91 =C2=A0 =C2=A0cbnz = =C2=A0 =C2=A0w17, 87c <__aarch64_cas8_acq_rel+0x1c>
=C2=A0890: =C2= =A0 d65f03c0 =C2=A0 =C2=A0ret

This function firs= t checks if LSE instructions are present.=C2=A0 If so, instruction LSE casa= l is used.=C2=A0 If not (in my case), the loop of ldaxr/stlxr is used inste= ad.=C2=A0 The documentation says both ldaxr/stlxr provides one-way barriers= .=C2=A0 Read/writes after ldaxr will be observed after, and read/writes bef= ore stlxr will be observed before.=C2=A0 So, a pair of these instructions p= rovides a full memory barrier.=C2=A0 However, if we don't observe the e= xpected value, only ldaxr gets executed.=C2=A0 So, an unsuccessful pg_atomi= c_compare_exchange_u64() attempt will leave us with a one-way barrier, and = that caused a bug in my case.=C2=A0 In contrast, __sync_val_compare_and_swa= p() uses=C2=A0__aarch64_cas8_sync, which is quite the same as __aarch64_cas= 8_acq_rel, but has an explicit memory barrier in the end.

I have a couple of ideas on how to fix this problem.
1. Provide= full barrier semantics for pg_atomic_compare_exchange_*().=C2=A0 In partic= ular, for __atomic_compare_exchange_n(), we should probably use=C2=A0__ATOM= IC_ACQ_REL for success and run an explicit memory barrier in the case of fa= ilure.
2. Document that pg_atomic_compare_exchange_*() doesn'= t necessarily provide a memory barrier on failure.=C2=A0 But then we need t= o carefully check if existing use-cases need explicit memory barriers now.<= /div>

Any thoughts?
--0000000000009e97e7062fc289bf--