Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbeU-007dhn-BK for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:38:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqbeT-00DBch-4P for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 17:38:25 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqbeS-00DBcW-QK for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 17:38:24 +0000 Received: from fhigh-b5-smtp.messagingengine.com ([202.12.124.156]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tqbeR-001X92-01 for pgsql-hackers@postgresql.org; Fri, 07 Mar 2025 17:38:23 +0000 Received: from phl-compute-08.internal (phl-compute-08.phl.internal [10.202.2.48]) by mailfhigh.stl.internal (Postfix) with ESMTP id B25702540169; Fri, 7 Mar 2025 12:38:22 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-08.internal (MEProxy); Fri, 07 Mar 2025 12:38:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1741369102; x=1741455502; bh=JpIv5lPk58hO31amIFtghlLhpbdHnrTXLyAD+f8NIJA=; b= dud5B8bHmZOQvR1L9Pd14yTwbkg3Sux79qEm5gdhqNdepTIjE6IY/juJZmVxqaav +WTCscByxn7TQNzVAiVUGUtxI/CDZYBycmK9Ve7mJ32r5AVIyCbsOlZC39pbiB5o p2XIDQ1t3TTMGT1ASyjt0MzHt/HQaBNgI7VBPY+B6nWLfDLXYezgI5CdyBvGV8xT MWUmR81xRw1q0gNkG+Ur7wSENL4ncxHOkBfWJHo0ogDptskOg3iNlcOgCDkVgoja o28jCvWItegPuAuStvbzisYJU6WgtEoFwQnzIWHvvYNtptuGYjUXg/pgQHrJqh2l cf4H1hQmJ3PJiEmtO6g2Sw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1741369102; x= 1741455502; bh=JpIv5lPk58hO31amIFtghlLhpbdHnrTXLyAD+f8NIJA=; b=d oSDy36SeElttvXe8jHEZDwNE/BbaFvqbXCzhmgX54AiyqOzvkMiJ3PWwSZ0M2qiN +xRqgdP+tVIElGjiPMfS5mhUbCV1AvBUISfVP83kwJnnb7bB0pStSANJj2fDf7iD vZEzgN+VXdGUFL11CvOeItWNH6ukwvZ0Gy21avgA3bFA1njUMWVu4hjjMOiXjLgs gJfhLb2WZosQrwdwuRly47CKhUebd7R6daMUSugCKSeevQz2zhUglNKXrD2lZRc4 fmERyBxFTsoWTrEr91bQNgdQc/cBRCO5JHxV2Tt6LppQTr12kAucrPvRpC3Ymxtl tncki9BdhrstbuNgdZHhw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduudduvdekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggugfgjsehtkefstddt tdejnecuhfhrohhmpeetnhgurhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrh griigvlhdruggvqeenucggtffrrghtthgvrhhnpedtleelvdfgjedvffeiueekfeeuleff hfegfffhgfffkeevueehieehhfeigffhvdenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgs pghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtoheprggvkhhorh hothhkohhvsehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghr shesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 7 Mar 2025 12:38:22 -0500 (EST) Date: Fri, 7 Mar 2025 12:38:21 -0500 From: Andres Freund To: Alexander Korotkov Cc: pgsql-hackers Subject: Re: pg_atomic_compare_exchange_*() and memory barriers Message-ID: References: <2muwyx6a5vojkg7iegknhnkcch3lfxptsxk7icwuh7szkvvu2y@vc3ukkfvnu6i> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-03-07 19:15:23 +0200, Alexander Korotkov wrote: > On Fri, Mar 7, 2025 at 7:07 PM Andres Freund wrote: > > What is the access pattern and the observed problems with it that made you > > look at the disassembly? > > Check this code. > > l1: pg_atomic_write_u64(&XLogCtl->xlblocks[nextidx], NewPageEndPtr); > /* > * Try to advance XLogCtl->InitializedUpTo. > * > * If the CAS operation failed, then some of previous pages are not > * initialized yet, and this backend gives up. > * > * Since initializer of next page might give up on advancing of > * InitializedUpTo, this backend have to attempt advancing until it > * find page "in the past" or concurrent backend succeeded at > * advancing. When we finish advancing XLogCtl->InitializedUpTo, we > * notify all the waiters with XLogCtl->InitializedUpToCondVar. > */ > l2: while (pg_atomic_compare_exchange_u64(&XLogCtl->InitializedUpTo, > &NewPageBeginPtr, NewPageEndPtr)) > { > NewPageBeginPtr = NewPageEndPtr; > NewPageEndPtr = NewPageBeginPtr + XLOG_BLCKSZ; > nextidx = XLogRecPtrToBufIdx(NewPageBeginPtr); > > l3: if (pg_atomic_read_u64(&XLogCtl->xlblocks[nextidx]) != > NewPageEndPtr) > { > /* > * Page at nextidx wasn't initialized yet, so we cann't move > * InitializedUpto further. It will be moved by backend > which > * will initialize nextidx. > */ > > ConditionVariableBroadcast(&XLogCtl->InitializedUpToCondVar); > break; > } > } > > Consider the following execution order with process 1 (p1) and process 2 > (p2). On 2025-03-07 19:24:39 +0200, Alexander Korotkov wrote: > Sorry, I messed this up. > The correct sequence is following. > > 1. p1 executes l1 > 2. p1 executes l2 with failure > 3. p2 executes l2 with success > 4. p2 execute l3, but doesn't see the results of step 1, because 3 > didn't provide enough of memory barrier Did you mean because 2) didn't provide enough of a memory barrier? Because 3) does, right? You could get in exactly same the situation if the p1 is scheduled out by the OS after step 1, no? Greetings, Andres Freund