Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbmq-007fPT-Sg for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:47:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqbmp-00DTlZ-AF for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:47:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbmp-00DTlR-0P for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 17:47:03 +0000 Received: from fout-b4-smtp.messagingengine.com ([202.12.124.147]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tqbml-001Xbd-1d for pgsql-hackers@postgresql.org; Fri, 07 Mar 2025 17:47:02 +0000 Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfout.stl.internal (Postfix) with ESMTP id 37CBD1140069; Fri, 7 Mar 2025 12:46:59 -0500 (EST) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-12.internal (MEProxy); Fri, 07 Mar 2025 12:46:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1741369619; x=1741456019; bh=pqP5miRjtTpsomFi4P3V26euw0IWliaAVXt/gDINaCQ=; b= dteMvnL0cbL+ORDgULQYEO+/v6LmTwEHtzvI7uBCQwVsJdGpt97xvO6X3SyKiWtw w+SVmQjLfkWV1LkeRrJkZxvy+eFC/cYWDQwynAy+VBqM41YnysjblKsLp7Jl0Pnx HsvFu+KsPd62CP+DgVoaUWrMiBR8SoCw3/sd/U9/cdVVp5ojwRI0JFm4pS/g9ADL 0Ja3zsjW3KzYb1KHKcfWs86xqp1GEkmJuD4JFyKHryDEP/CiD2f/TXcRBHOz04JI rMyKOhb69OHBYos3LuZctfCNDDaFg/LgHUahNwsPqduhkA1Hk9hPPyazzgdYCodw sXJczOAcuwXvrhau4pGPPA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1741369619; x= 1741456019; bh=pqP5miRjtTpsomFi4P3V26euw0IWliaAVXt/gDINaCQ=; b=C LQB3k9wXTOiUH0qDTq22s0m/C0CAhBsv1s5XROa3yQer14h1l6uCAlJlC/BAC6G/ nxyB6YftVUpHmugl1uHN5cUNIfEY+6mmaTVTchOmMOXBl9GNLAocTguKvjqNzNoV gJv0LJOKOpW6e+Wvy+UPucwwRnJjF+/BOMurD28wYdWJHyPor0VTnuYbDCNVUxky SzCGZzw6/Howw/iyZINrCB9mlsUZ2RKLIZCwon9kRPE9dlsM0z7fZfXrk/en00UH 2/TYB7xsBT+OIJ45isRhGzI77xQzy+Jv0j46U2SSlWFFF85oeSkktwHS3hCCt8Xx Bw0Ozui7PcVBD32MhFVrA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduudduvdelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggugfgjsehtkefstddt tdejnecuhfhrohhmpeetnhgurhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrh griigvlhdruggvqeenucggtffrrghtthgvrhhnpedtleelvdfgjedvffeiueekfeeuleff hfegfffhgfffkeevueehieehhfeigffhvdenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgs pghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtoheprggvkhhorh hothhkohhvsehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghr shesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 7 Mar 2025 12:46:58 -0500 (EST) Date: Fri, 7 Mar 2025 12:46:58 -0500 From: Andres Freund To: Alexander Korotkov Cc: pgsql-hackers Subject: Re: pg_atomic_compare_exchange_*() and memory barriers Message-ID: References: <2muwyx6a5vojkg7iegknhnkcch3lfxptsxk7icwuh7szkvvu2y@vc3ukkfvnu6i> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2025-03-07 19:44:20 +0200, Alexander Korotkov wrote: > On Fri, Mar 7, 2025 at 7:38 PM Andres Freund wrote: > > On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote: > > > On Fri, Mar 7, 2025 at 7:07 PM Andres Freund wrote: > > > > What is the access pattern and the observed problems with it that made you > > > > look at the disassembly? > > > > > > Check this code. > > > > > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); > > > > > /* > > > * Try to advance XLogCtl->InitializedUpTo. > > > * > > > * If the CAS operation failed, then some of previous pages are not > > > * initialized yet, and this backend gives up. > > > * > > > * Since initializer of next page might give up on advancing of > > > * InitializedUpTo, this backend have to attempt advancing until it > > > * find page "in the past" or concurrent backend succeeded at > > > * advancing. When we finish advancing XLogCtl->InitializedUpTo, we > > > * notify all the waiters with XLogCtl->InitializedUpToCondVar. > > > */ > > > l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, > > > &NewPageBeginPtr, NewPageEndPtr)) > > > { > > > NewPageBeginPtr = NewPageEndPtr; > > > NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; > > > nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr); > > > > > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != > > > NewPageEndPtr) > > > { > > > /* > > > * Page at nextidx wasn't initialized yet, so we cann't move > > > * InitializedUpto further. It will be moved by backend > > > which > > > * will initialize nextidx. > > > */ > > > > > > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); > > > break; > > > } > > > } > > > > > > Consider the following execution order with process 1 (p1) and process 2 > > > (p2). > > > > On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote: > > > Sorry, I messed this up. > > > The correct sequence is following. > > > > > > 1. p1 executes l1 > > > 2. p1 executes l2 with failure > > > 3. p2 executes l2 with success > > > 4. p2 execute l3, but doesn't see the results of step 1, because 3 > > > didn't provide enough of memory barrier > > > > Did you mean because 2) didn't provide enough of a memory barrier? Because 3) > > does, right? > > Yes, exactly. > > > You could get in exactly same the situation if the p1 is scheduled out by the > > OS after step 1, no? > > No. In that case, p1 will execute l2 with success. p1 executes l2 > with failure only because it goes before p2 executes l2. In my scenario p1 will not execute l2 because p2 gets scheduled before it can do so. So p1 cant yet execute l2 before p2 executes l2. Greetings, Andres Freund