Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uviGX-000ESu-Bb for pgsql-hackers@arkaria.postgresql.org; Mon, 08 Sep 2025 20:15:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uviGW-007WYR-HK for pgsql-hackers@arkaria.postgresql.org; Mon, 08 Sep 2025 20:15:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uviGW-007WYJ-7u for pgsql-hackers@lists.postgresql.org; Mon, 08 Sep 2025 20:15:04 +0000 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uviGS-001GJ2-2X for pgsql-hackers@lists.postgresql.org; Mon, 08 Sep 2025 20:15:04 +0000 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-620724883e6so8010204a12.2 for ; Mon, 08 Sep 2025 13:15:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757362500; x=1757967300; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=WMxFvtjakTTv2R0FCuycLkd1AOFMdPqViMs6BhNRkMM=; b=IvxWf9S3giNJtK1UKalIcmUra7otN9Sq48oBnXiHL8kCRNARvYbNytVQ23R/kX028Q pQkXVPCNyjOb5ZAF0Fpw7ADO6bQ3/sdut6LPIltW6imiTYbYzlqIWw4bE829ALnmCxkx TIrSj7Ll8L0UNzId2QP2TsRHpv5t7Auv49rpbwu3/3adiX/wZAvkPlAJI4F9LMJxqoxq ZaWqIktWMjF8oLkUtvVLCnrmydL7kG+uuK51xxF+4NRQwy4PIQK5sOsZVx+RBUXeET3m zbH65QyvaDoG0YQ+s0hzOcqqwOvWtUoX/elffkjMnpq0ahaQ5mxUhpMMJGL70cSz4SkY 5HRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757362500; x=1757967300; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WMxFvtjakTTv2R0FCuycLkd1AOFMdPqViMs6BhNRkMM=; b=c82Vw5MG97I0PQ9RDj+apolDQ/W/fOebbikLD9hJ3npHZ0o6vUXYZsOsSPZkbfVbCC v/BZpXivmSFoqyw2tBGPTmGmiJGlbkFNEwKO+7vkpr89nSmZqXe+6dejGkos2JO/cpfh HBxAqOCv992dwAP6+GEBJcwnUCkwcB7GlrAEoNYDvniaJqg1F/qIztn53qSlxQRPFtVW oaJWGLIScJOG3d/e70+6qLAFJjTLM9sEcZ5+4f6s4ELQsMvQO60wILZzM0AnuvLiX4y5 TdPkFEsgFpp56I3tcWh3tpG8g+eZl42NYlEcw0fbJVmRVJKtlkFkYZwx3X2FWNaLrXpU yQ9Q== X-Forwarded-Encrypted: i=1; AJvYcCXC2g/0dTQzqq46W/jxEJx6VbbOeGg/4E0rX+69SMAOxqYiedLgdiCx7OW+s3eutg5SgLoft1hiNXO4WPwU@lists.postgresql.org X-Gm-Message-State: AOJu0YwL9NhJHO03SSqZg+Mux0YO15LfPHXQG6wlbTnTolqOhchDek6p uI9Mm7xtlraKhBJWkuE/Y8NpHfrEvI14E8JClBM8OfONxJeKP7XBfR7DV6i6JelxEvRBhdFARf4 FwlpjI0qR+SydO+2sj6eU2zmnIvSxHhI= X-Gm-Gg: ASbGncts78kjD6FPdLtw91W1uKCuKLhMqJUreD0F+PHB5Hh/b76lioldZuVZcycpip3 B3qDf7Bvwe9QNGkRZpHadgkBYairP5/YLm/TAXpeI45QSgLva1mWZBUsiThW6L2ThYU8DB9Mm07 AlwpiffcGWqV9YPngIOZCWGTd3NyMitdyWUiCBfZBpZ+Mm9teWkhwCqxwfUtcfAgQerL5tdqfCm KdaZvsKahYCJ3Zj0+4= X-Google-Smtp-Source: AGHT+IHlvdYvuVsT/UyzcBP8HgPEm6/CHSNdTrfz/AyD9koXQ7EgVO+3KHB5PAnu/xslS94gCXe3kyEyKxTlJgb/akY= X-Received: by 2002:a17:906:1354:b0:b04:b900:451d with SMTP id a640c23a62f3a-b04b90061d8mr620109866b.26.1757362499878; Mon, 08 Sep 2025 13:14:59 -0700 (PDT) MIME-Version: 1.0 References: <87DD95AA-274F-4F4F-BAD9-7738E5B1F905@yandex-team.ru> In-Reply-To: From: Robert Haas Date: Mon, 8 Sep 2025 16:14:47 -0400 X-Gm-Features: Ac12FXxzV-K9T6G2nU2uWjwoAXa9BqPLDcnrrXttNOksvogiZrWijiY0L3VCgKA Message-ID: Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) To: Melanie Plageman Cc: Andres Freund , Kirill Reshke , Andrey Borodin , PostgreSQL Hackers , Heikki Linnakangas Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Reviewing 0003: + /* + * If we're only adding already frozen rows to a previously empty + * page, mark it as all-frozen and update the visibility map. We're + * already holding a pin on the vmbuffer. + */ else if (all_frozen_set) + { PageSetAllVisible(page); + LockBuffer(vmbuffer, BUFFER_LOCK_EXCLUSIVE); + visibilitymap_set_vmbits(relation, + BufferGetBlockNumber(buffer), + vmbuffer, + VISIBILITYMAP_ALL_VISIBLE | + VISIBILITYMAP_ALL_FROZEN); Locking a buffer in a critical section violates the order of operations proposed in the 'Write-Ahead Log Coding' section of src/backend/access/transam/README. + * Now read and update the VM block. Even if we skipped updating the heap + * page due to the file being dropped or truncated later in recovery, it's + * still safe to update the visibility map. Any WAL record that clears + * the visibility map bit does so before checking the page LSN, so any + * bits that need to be cleared will still be cleared. + * + * It is only okay to set the VM bits without holding the heap page lock + * because we can expect no other writers of this page. The first paragraph of this paraphrases a similar content in xlog_heap_visible(), but I don't see the variation in phrasing as an improvement. The second paragraph does not convince me at all. I see no reason to believe that this is safe, or that it is a good idea. The code in xlog_heap_visible() thinks its OK to unlock and relock the page to make visibilitymap_set() happy, which is cringy but probably safe for lack of concurrent writers, but skipping locking altogether seems deeply unwise. - * visibilitymap_set - set a bit in a previously pinned page + * visibilitymap_set - set bit(s) in a previously pinned page and log + * visibilitymap_set_vmbits - set bit(s) in a pinned page I suspect the indentation was done with a different mix of spaces and tabs here, because this doesn't align for me. In general, this idea makes some sense to me -- there doesn't seem to be any particularly good reason why the visibility-map update should be handled by a different WAL record than the all-visible flag on the page itself. It's a little hard for me to make that statement too conclusively without studying more of the patches than I've had time to do today, but off the top of my head it seems to make sense. However, I'm not sure you've taken enough care with the details here. -- Robert Haas EDB: http://www.enterprisedb.com