Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQLK5-005wWw-1Q for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Dec 2025 08:01:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vQLK3-006YH4-23 for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Dec 2025 08:01:19 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQLK3-006YGv-0L for pgsql-hackers@lists.postgresql.org; Tue, 02 Dec 2025 08:01:19 +0000 Received: from lahtoruutu.iki.fi ([2a0b:5c81:1c1::37]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQLJz-002i7S-38 for pgsql-hackers@postgresql.org; Tue, 02 Dec 2025 08:01:18 +0000 Received: from [10.0.2.15] (unknown [130.41.208.2]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id 4dLCrg0T29z49Pxn; Tue, 02 Dec 2025 10:01:06 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1764662468; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3KRrBnNTgEDug0ty/Bc+JEbqE7/AVaZdcB6F2OdTwsY=; b=COcVF1wJghhyofUilmSuGNMwx/v+xxgQzbd4bqMhy18BYbTdhzOx2iYQIsb+AXUH5upnpN I0vJ2Owi6i6xREkF8ASrF0His+tkVlxSFpA5PJAmiCMpByDKvFL99oSpa6mDDQa4Ln9YYA D/WaSj9mtsK9ElxFiterXn1/Udk+0f3F7wv0giGIFzjc0KBR8IoKCvgn7oxNRLdTIyTG/b AhU/Nivi+9msWLCYjXR095rlHb37KKp6BQLm4xqgIRu2cqFbJ0j8SZB+cfbhkOp40hPL6l RHY321FsMmghNijYAsvqsrHRVVVDsQ7hLnblx6Ifih+9fE688G4/yOvHyzETAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1764662468; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3KRrBnNTgEDug0ty/Bc+JEbqE7/AVaZdcB6F2OdTwsY=; b=v5tuRxTSWaVYL6Cw2LtHR8EoKwVWNvNWlooIKZlXPI6jiyOEvgZNTed5gMKwDqHT29yB2n v/uQy5gaWCDGfvZUjlcUkli2tiMGBp2d3EuzGtzm69BE1HOzdhiltUY7kfgQ3fVCSj6fea 5Ffdav1jVaGMvQAPieiR9OnGZ2xYEqtKSVmIkpsrNoK22bTJf5+1xaFv1OoAQSCI74dwon gamSLxnuEwJ4+fsQ1O9q6gnlUngR/IIKRSZyMx1zu1zFRAAzPVUvMc9UICT1VLzxb2eRFj H9Qmir+hf8nHiO45DyAJtRToMtMZUgTimm8FQXSvH+Lviyu+kKlz4USdeUc6Jg== ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Seal: i=1; a=rsa-sha256; d=iki.fi; s=lahtoruutu; cv=none; t=1764662468; b=NnUdfHCxY1pwmWLAFYgpbOyXBgn4fMX5tu9qakuTu1QUmGw7LzqvDA0HqNC1eFUz1nN+Xi TVhjZ6IIz1IzY6K0kgKelRDIbMCT+0KEADTKJVuzRopXxAFFoC+Dc24NMut90bj5LFegB1 ai0vAV1fPuF2mVTKoycot1+UqBi89mcZZ11r+EIYG7Y9Vch731Co6WCA1XXy4phAwZvrem t0laqS5uXTBigzElUqtzwNGjG7PIID90gT/fb3wGCSSah9mqjQn0qJv7aOgqwZBxZlakAq UncB4nWqtj/R//aijTXFs3J1uCiGzuOTHRwdXNH0ZujEUlVxe3lrY0WxlifEgg== Message-ID: Date: Tue, 2 Dec 2025 10:01:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Buffer locking is special (hints, checksums, AIO writes) To: Andres Freund , Melanie Plageman Cc: Matthias van de Meent , pgsql-hackers@postgresql.org, Thomas Munro , Noah Misch , Robert Haas , Michael Paquier References: <6kmid26do57ykqfpvq6iieniy4djsymhrypkjccazq5g4bbe6a@2y6owwv7qpex> <3je3ahgf7rrmmurxo6hnlhg5d3ffwfrtjwjxd6jm5srlv5iebp@vxqk5qtgmowr> <3w7v3w6a57jnssokap4k7thoekig72flnyhd4wp3yftzdd7lm7@f6lpcfen6hr7> <6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar> <3nce7i72ayzkunai6mkz24ckbxk74jodz4ua2chcdrwppxlxcd@w6x5kfkjrkru> Content-Language: en-US From: Heikki Linnakangas In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 25/11/2025 22:46, Andres Freund wrote: > On 2025-11-25 15:02:02 -0500, Melanie Plageman wrote: >> On Tue, Nov 25, 2025 at 11:54 AM Andres Freund wrote: >>>> --- a/src/backend/storage/freespace/freespace.c >>>> +++ b/src/backend/storage/freespace/freespace.c >>>> @@ -904,14 +904,22 @@ fsm_vacuum_page(Relation rel, FSMAddress addr, >>>> max_avail = fsm_get_max_avail(page); >>>> /* >>>> - * Reset the next slot pointer. This encourages the use of low-numbered >>>> - * pages, increasing the chances that a later vacuum can truncate the >>>> - * relation. We don't bother with a lock here, nor with marking the page >>>> - * dirty if it wasn't already, since this is just a hint. >>>> + * Try to reset the next slot pointer. This encourages the use of >>>> + * low-numbered pages, increasing the chances that a later vacuum can >>>> + * truncate the relation. We don't bother with a lock here, nor with >>>> + * marking the page dirty if it wasn't already, since this is just a hint. >>>> + * >>>> + * To be allowed to update the page without an exclusive lock, we have to >>>> + * use the hint bit infrastructure. >>>> */ >>>> >>>> What the heck? This didn't even take a share lock before... >>> >>> It is insane. I do not understand how anybody thought this was ok. I think I >>> probably should split this out into a separate commit, since it's so insane. >> >> Aren't you going to backport having it take a lock? Then it can be >> separate (e.g. have it take a share lock in the "fix" commit and then >> this commit bumps it to share-exclusive). > > I wasn't thinking we should backport that. It's been this way for ages, > without having caused known issues.... I wrote that originally :-). The FSM code always treats the FSM page contents as potentially corrupted garbage. FSM page updates are not WAL-logged, for starters. And as the comment says, the next slot pointer is just a hint for where within the page to start looking for free space. Page checksums were added later, and now we know about the problems of modifying a page a write() is in progress, on some filesystems with filesystem-level checksums. So it's a good idea to tighten it up, but it was fine back then. >>>> diff --git a/src/backend/storage/freespace/fsmpage.c >>>> b/src/backend/storage/freesp> >>>> /* >>>> * Update the next-target pointer. Note that we do this even if we're only >>>> * holding a shared lock, on the grounds that it's better to use a shared >>>> * lock and get a garbled next pointer every now and then, than take the >>>> * concurrency hit of an exclusive lock. >>>> >>>> We appear to avoid the garbling now? >>> >>> I don't think so. Two backends concurrently can do fsm_search_avail() and >>> one backend might set a hint to a page that is already used up by the other >>> one. At least I think so? >> >> Maybe I don't know what it meant by garbled, but I thought it was >> talking about two backends each trying to set fp_next_slot. If they >> now have to have a share-exclusive lock and they can't both have a >> share-exclusive lock at the same time, then it seems like that >> wouldn't be a problem. It sounds like you may be talking about a >> backend taking up the freespace of a page that is referred to by the >> fp_next_slot? > > Yes, a version of the latter. The value that fp_next_slot will be set to can > be outdated by the time we actually set it, unless we do all of > fsm_search_avail() under some form of exclusive lock - clearly not something > desirable. I'm pretty sure the "garbled" in the comment means the former, not the latter. I.e. it means that the pointer itself might become garbage. Would be good to update the comment if that's no longer possible. But speaking of that: why do we not allow two processes to concurrently set hint bits on a page anymore? - Heikki