Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uvyuT-004UF8-9T for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Sep 2025 14:01:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uvytS-00CLtQ-VZ for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Sep 2025 14:00:23 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uvytS-00CLtH-JP for pgsql-hackers@lists.postgresql.org; Tue, 09 Sep 2025 14:00:23 +0000 Received: from mail-ej1-x62c.google.com ([2a00:1450:4864:20::62c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uvytP-001WvQ-0D for pgsql-hackers@lists.postgresql.org; Tue, 09 Sep 2025 14:00:22 +0000 Received: by mail-ej1-x62c.google.com with SMTP id a640c23a62f3a-b00a9989633so187348266b.0 for ; Tue, 09 Sep 2025 07:00:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757426419; x=1758031219; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Qwx/JdRJxn3qVok/cjPscTGkj0YBCWJVGWuHsnS+l6M=; b=can8nmQE2L16W3/0oNKqpIIpLnim93MgQ+qY48jzA0EWtIdLOpAxEB7WglIXtEzCNs 7VBPdzLoo5lR5NEOgLNwDaOCnLPI+0mMJLFTvg5FmtMtPKhXNid9RNWy22blX5xQbxww TcIMsD6ptr93xqi3Ny5FzH39z5A+bjMSzvt8DInzb70w05madNJ0ExOWDbd7RfejiNZS P8XQjHz1jlhp7I03n8iQzp2O2ZQFFt7smdwRuBR0AS/Hg2ZF+jzNv8KcMZF1Xr0Mbw6A KWhcAg+TxSD21fa5IjYDNZlACyU1s+06twd6FNU7NJV04c+4adWjTNCVcASVw5acdsiv CQOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757426419; x=1758031219; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qwx/JdRJxn3qVok/cjPscTGkj0YBCWJVGWuHsnS+l6M=; b=Ao9FnAsrrJnMw7ejpZs/webluNFrqNj2uoLAJvoaqKXu/94SHOGKafaMX9HHfhfdta 2AyOTKCIDqY/aYNyVWAITS5OeCTD57r1SM88PWs2KmfENt6YPW2heei9eGUHwEShCQdX AiLi6KrLKxarrCzMI8J9pxbDjAG4POV1DdmRkYJAM0Ojc8R+c6W7slyhidsTfslu4sEa yLujautYX8Fiooo/fg3Iz1EG3UucRAvHsoa2vjP/iqSkhFWoZVgBaI7WG9oKUJAg2nAk NvCa9mAL4ry7YODpB+N5SbHqJQZQoNh6LczvUfx84bFHDfSHW8WhneclAG0+TIo1ACP1 Txtw== X-Forwarded-Encrypted: i=1; AJvYcCUb9PFvORYIQg/mPNZH7cLF9Imh+dvUmgfMxwOpNENA0Kq7zx6esJAGAjHxftNBc23EfqmlYjUGh+GejjGh@lists.postgresql.org X-Gm-Message-State: AOJu0Ywlf7vjApHtgUYXoQvcG7Hdy0csyqbHLp39pjfN21U5SI1xEVvv mxoBggen8NhVs0fWSTrRo84loo1tyo8n/7KNEbRqel7N2Ft7DS646TW1fCEqIhj6Qgy8NsCUQFL Xyx3XpkcfriSlGam6sFALBBb0vevoqA8= X-Gm-Gg: ASbGnct+zvkDNc+HZzW9FsZUH5pH4o4Y/Rk1phmWnPQNZuFsogz8gdogG+9ZA8QT42J aTqT/zYr0lsfO1CKXNkPvtSG0BESD+GWKykGUwoZHHVW6btY49cHgorPfW+LN9tIJpvUmxFu6aT L0Bg0VQ6skCVSjIYnlx+ixWWl0G62eMUoKn4/3o8VwtQ8jsgwY1eXNKltV35i2Eq42yGgUR0G68 hiGPcke X-Google-Smtp-Source: AGHT+IHkfd1lII6hPpqWBrOBL6PQljeoSamVNtlTQ9tgM3DvfXNAR8LfuW2nEmW+6H3be+G9rc+/8TSv+Inmwn1pkHo= X-Received: by 2002:a17:906:7954:b0:b04:1d85:7106 with SMTP id a640c23a62f3a-b04931f4b3bmr1536277966b.21.1757426418600; Tue, 09 Sep 2025 07:00:18 -0700 (PDT) MIME-Version: 1.0 References: <87DD95AA-274F-4F4F-BAD9-7738E5B1F905@yandex-team.ru> In-Reply-To: From: Robert Haas Date: Tue, 9 Sep 2025 10:00:04 -0400 X-Gm-Features: Ac12FXxsypNNGz5pZ1koZPYXUqhb7CMjmlFVk70MYWXbHtNYV6WminBLnATb7tc Message-ID: Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) To: Melanie Plageman Cc: Andres Freund , Kirill Reshke , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Sep 8, 2025 at 6:29=E2=80=AFPM Melanie Plageman wrote: > But, I think you're right that maintaining the order of operations > proposed in transam/README is more important. As such, in attached > v11, I've modified this patch and the other patches where I replace > visibilitymap_set() with visibilitymap_set_vmbits() to exclusively > lock the vmbuffer before the critical section. > visibilitymap_set_vmbits() asserts that we have the vmbuffer > exclusively locked, so we should be good. That sounds good. I think it is OK to keep some of the odd things that we're currently doing if they're hard to eliminate, but if they're not really needed then I'd rather see us standardize the code. I feel (and I think you may agree, based on other conversations that we've had) that the visibility map code is somewhat oddly structured, and I'd like to see us push the amount of oddness down rather than up, if we can reasonably do so without breaking everything. > The only difference is I replaced the phrase "LSN interlock" with > "being dropped or truncated later in recovery" -- which is more > specific and, I thought, more clear. Without this comment, it took me > some time to understand the scenarios that might lead us to skip > updating the heap block. heap_xlog_visible() has cause to describe > this situation in an earlier comment -- which is why I think the LSN > interlock comment is less confusing there. > > Anyway, I'm open to changing the comment. I could: > 1) copy-paste the same comment as heap_xlog_visible() > 2) refer to the comment in heap_xlog_visible() (comment seemed a bit > short for that) > 3) diverge the comments further by improving the new comment in > heap_xlog_multi_insert() in some way > 4) something else? IMHO, copying and pasting comments is not great, and comments with identical intent and divergent wording are also not great. The former is not great because having a whole bunch of copies of the same comment, especially if it's a block comment rather than a 1-liner, uses up a bunch of space and creates a maintenance hazard in the sense that future updates might not get propagated to all copies. The latter is not great because it makes it hard to grep for other instances that should be adjusted when you adjust one, and also because if one version really is better than the other than ideally we'd like to have the good version everywhere. Of course, there's some tension between these two goals. In this particular case, thinking a little harder about your proposed change, it seems to me that "LSN interlock" is more clear about what the immediate test is that would cause us to skip updating the heap page, and "being dropped or truncated later in recovery" is more clear about what the larger state of the world that would lead to that situation is. But whatever preference anyone might have about which way to go with that choice, it is hard to see why the preference should go one way in one case and the other way in another case. Therefore, I favor an approach that leads either to an identical comment in both places, or to one comment referring to the other. > > The second paragraph does not convince me at all. I see no reason to > > believe that this is safe, or that it is a good idea. The code in > > xlog_heap_visible() thinks its OK to unlock and relock the page to > > make visibilitymap_set() happy, which is cringy but probably safe for > > lack of concurrent writers, but skipping locking altogether seems > > deeply unwise. > > Actually in master, heap_xlog_visible() has no lock on the heap page > when it calls visibiltymap_set(). It releases that lock before > recording the freespace in the FSM and doesn't take it again. > > It does unlock and relock the VM page -- because visibilitymap_set() > expects to take the lock on the VM. > > I agree that not holding the heap lock while updating the VM is > unsatisfying. We can't hold it while doing the IO to read in the VM > block in XLogReadBufferForRedoExtended(). So, we could take it again > before calling visibilitymap_set(). But we don't always have the heap > buffer, though. I suspect this is partially why heap_xlog_visible() > unconditionally passes InvalidBuffer to visibilitymap_set() as the > heap buffer and has special case handling for recovery when we don't > have the heap buffer. You know, I wasn't thinking carefully enough about the distinction between the heap page and the visibility map page here. I thought you were saying that you were modifying a page without a lock on that page, but you aren't: you're saying you're modifying a page without a lock on another page to which it is related. The former seems disastrous, but the latter might be OK. However, I'm sort of confused about what the comment is trying to say to justify that: + * It is only okay to set the VM bits without holding the heap page= lock + * because we can expect no other writers of this page. It is not exactly clear to me whether "this page" here refers to the heap page or the VM page. If it means the heap page, why should that be so if we haven't got any kind of lock? If it means the VM page, then why is the heap page even relevant? --=20 Robert Haas EDB: http://www.enterprisedb.com