Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vohae-00CbDm-3C for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 12:39:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vohad-006dqX-2h for pgsql-hackers@arkaria.postgresql.org; Sat, 07 Feb 2026 12:39:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vohad-006dqP-1O for pgsql-hackers@lists.postgresql.org; Sat, 07 Feb 2026 12:39:07 +0000 Received: from meesny.iki.fi ([2001:67c:2b0:1c1::201]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vohaa-00000000zMz-2kSv for pgsql-hackers@postgresql.org; Sat, 07 Feb 2026 12:39:06 +0000 Received: from [10.0.2.15] (unknown [130.41.208.2]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by meesny.iki.fi (Postfix) with ESMTPSA id 4f7VrG22JRzyPX; Sat, 07 Feb 2026 14:38:54 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1770467935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W1qitROBIRMebn0lo1rIbWQ97ANV/1oc/Mr1jDt3Rpk=; b=s+uAKdWhkfBJQOpMYvT71y3qviW3UZucCOqfS436CdsPH0NRbtnYOFQ293/qXmTLer3ray Ml3I3t6SASZdYlaGu7BTiXcy9HIdIUy0O+07Qd8Tqx+HBw1uVX6IYuIIeUifQbzTyXlMYy /p5pL728+BuRYy7kisgHhPsuBsKgCKM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1770467935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W1qitROBIRMebn0lo1rIbWQ97ANV/1oc/Mr1jDt3Rpk=; b=o/hMOGhuPKmPRQd0HpGUzJZXsICK0sebWw+xHVVuXoCrToD+xYhlDfg0pdmgtNt+EAF2YJ MPuScVA9hGjx4cra8x9iQhpbiOiBljiTAjl6R+Wm9ZRfz0WyOJsUT8JqnXCPaDyxCJi3uE zsBc/+7ba4rk79l4lNc2hqmBfcjTOt8= ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Seal: i=1; a=rsa-sha256; d=iki.fi; s=meesny; cv=none; t=1770467935; b=qJanx284Ydo0bsbCuhOvslt0xmhfNSYpT2L7qHSwhIin0l6HCLfxUbQNM9r0w7KHJ1AjMR wN61Z37ZRcnHCkIgn24/nEx5rF5ufOgoLPr4o+uLrNLYtdGv99NrEM+SskWYihwxamheq5 XHSSLjpuX/H860Yg+Yf69POTrTeyyfo= Message-ID: <03041d48-1e15-4741-b365-0809f2bc75c4@iki.fi> Date: Sat, 7 Feb 2026 14:38:53 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Buffer locking is special (hints, checksums, AIO writes) To: Andres Freund , Melanie Plageman , Noah Misch Cc: Kirill Reshke , Matthias van de Meent , pgsql-hackers@postgresql.org, Thomas Munro , Robert Haas , Michael Paquier References: <1108f18d-cf7c-4f17-b29c-a119fe42f7e5@iki.fi> <5dwlfu2jyzkyf3nrlzxxblxctb6xio5es73ptgsahjnmfu5miu@772rc764hfhi> <4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf> <5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d> Content-Language: en-US From: Heikki Linnakangas In-Reply-To: <5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 03/02/2026 00:33, Andres Freund wrote: > - Now that we use the normal order of WAL logging, we don't need to delay > checkpoint starts anymore. > > I think the explanation for why that is ok is correct [1], but it needs to > be looked at by somebody with experience around this. Maybe Heikki? So that's patch 0004 "bufmgr: Switch to standard order in MarkBufferDirtyHint()". Yes, looks correct to me. > /* > * Update RedoRecPtr so that we can make the right decision. It's possible > * that a new checkpoint will start just after GetRedoRecPtr(), but that > * is ok, as the buffer is already dirty, ensuring that any BufferSync() > * started after the buffer was marked dirty cannot complete without > * flushing this buffer. If a checkpoint started between marking the > * buffer dirty and this check, we will emit an unnecessary WAL record (as > * the buffer will be written out as part of the checkpoint), but the > * window for that is small. > */ > RedoRecPtr = GetRedoRecPtr(); That "small window" is actually pretty big if you think of it a little more loosely. Our rule is that we write the full page image if a checkpoint has started since the page LSN, but that's very conservative already. It would be sufficient to write the full page image only if the checkpoint has already flushed the page. This small window is just a special case of that conservatism. I've been thinking of trying track that more accurately for a long time, because it would smoothen the WAL spike when a checkpoint begins. That gets off-topic, but my point is that it feels a little silly to mention that small window when there's the other giant panoramic window next to it. - Heikki