Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6Tqp-005Gu3-GZ for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 13:05:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1v6Tqn-00ABFV-BS for pgsql-hackers@arkaria.postgresql.org; Wed, 08 Oct 2025 13:05:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1v6Tqn-00ABFN-1q for pgsql-hackers@lists.postgresql.org; Wed, 08 Oct 2025 13:05:01 +0000 Received: from mail-ej1-x630.google.com ([2a00:1450:4864:20::630]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1v6Tql-0012Q6-1y for pgsql-hackers@lists.postgresql.org; Wed, 08 Oct 2025 13:05:01 +0000 Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-b4f323cf89bso418902266b.2 for ; Wed, 08 Oct 2025 06:04:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1759928698; x=1760533498; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HCd47A9QM8wMBYohr6Fs6HgshUTvLqNHw1Zd3CTC8UY=; b=UkyNnYL0ntc1MYj1MCw7ZuDBW9ph74wen14RJtzc9K68vKVeATOR8XA9DwZlB+CflB i+9dJtjltTRS+ExbYv1pIe2mt25+csoBRLYXDl/ldFoJwYgGDnnUGVXeEqsxLjexN9B0 ZheF62RSpnyqM571HazrnKEcKn2l+aZ2vMyfUezcXY5t0Uxt6UhTehFnbdkJ6ZA5P3dB E1euSDtilu+FB2XwetsXI5ZJohWSsX20TgfNu1tGooYb7SFa89iYnWxZBq5IFPmDHOKY uW8vBVj/79kG/85Sg1Pbb8D0RWxgItu4FwoyeCwjtDSCEeGQGQUudOn3IVOkmzo9qWSu oM4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759928698; x=1760533498; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HCd47A9QM8wMBYohr6Fs6HgshUTvLqNHw1Zd3CTC8UY=; b=IwGEE2fVz2S2BNj7u8Bnhnj7wJ6nYor/TbJB8rKpHksIUaFu9pelFVAqjNWyhLX3+t TVavQ8HRNyPLCjqaaANDNtRwywI/nCDcEMphhD1xb/3ClVvLb3BoXocsy7345bNuaS88 NmdLDffCZReUb5H3qu3wvFwsVA4HJGoDShGf6CQSRRN1zezaRdXwlZn6CNpnop4fSURZ kxwmXImjef+4Qudacw9iRzd2QLKe+EZth1O1sYKUfQx1Lrf8BsWflVBkX8JECDo3DBhw nSH2YzwmXNwfBC4tV+KnabVwRVBMVuRXaCmxjHhmsw/2aYjq3r9bW8I/a8WSqgtKXQJt IYyQ== X-Forwarded-Encrypted: i=1; AJvYcCUlUAxlS4hZkVMyiUnA0Sq8XtNe09/kZO1bOx1kGc/GDNwqPlYPstdSziyGDtOUzWJoq1YhBGilfNmUbXVB@lists.postgresql.org X-Gm-Message-State: AOJu0YwErGGosnAshhvQWOHxk2m/s8GtMmZbRGRC0J03mdEAt4TUP1es 8bgwqxJtIHt8j5gD3AcpuTCWAoPMs4dg5osNzvyJH8v9kpxIXyx6iJVpnm0lOGPdzX8gBUNPn3H xNP9SoK0wdpPGRHE5udaDEXwr1egB7YA= X-Gm-Gg: ASbGncvijvPdBICOLHV1EWuxEWQLIoPCygKwx21lGNP2pmyf5qCq9M8XiYl4kHu/lJi gSaJJw6D7tTqbGCGsRoDaRKADSm2XJKUj6gV6rTrdm4svT5KX/JxBO9IGBp71lk3HBuz9plyeKt 6d8f36LpgKF/Mh0suXSzaR3YBTjG18mt2DVxWnvPac85enzFGNxW4R8VQkZ8ng2wX2tQ/7Z69h8 Km37quKx/BWKAc3pQlLRrWj9a3QRYxdlg== X-Google-Smtp-Source: AGHT+IEcgyN0uKzNpzct4Hls9QBfcUiSt+7aof5UoabfhlveNOBCOrbM9rswX5+CFP1OzSDUO9oAxZShKAHAcAgFqaI= X-Received: by 2002:a17:907:c2a:b0:b3d:e9be:7ac2 with SMTP id a640c23a62f3a-b50a9d6f5f3mr378064766b.5.1759928698267; Wed, 08 Oct 2025 06:04:58 -0700 (PDT) MIME-Version: 1.0 References: <013D63E2-5D75-492E-85FF-1D5CC0148C82@gmail.com> <45b3a1dd6be28dd409850a5503c029a97ca0754e.camel@cybertec.at> <483AD9AA-39C1-4F4B-B0D6-4CBA0014E877@gmail.com> <083a4d75ab127a603743058bce26e79ba793494e.camel@cybertec.at> In-Reply-To: From: Robert Haas Date: Wed, 8 Oct 2025 09:04:46 -0400 X-Gm-Features: AS18NWB38fzUR0wbaYmrWvXUbzh80495Cc4ikJrD8inxZ-P05gJCf-x-jMUtTuo Message-ID: Subject: Re: The ability of postgres to determine loss of files of the main fork To: Frits Hoogland Cc: Andres Freund , Laurenz Albe , Aleksander Alekseev , pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Oct 1, 2025 at 11:25=E2=80=AFAM Frits Hoogland wrote: > What would be a achievable way of making postgres under the relation size= ? > How about a field in pg_class that keeps the final data page, so that the= catalog > keeps the size, which then allows utilities and the database itself to un= derstand how > many segments should exist? I think that would definitely be impractical. Your idea of having an option for amcheck that is the reverse of heapallindexed (indexallheaped?) seems perfectly reasonable as a debugging tool and probably not that hard to implement, but actually noticing organically would be tricky, both in terms of code complexity and also in terms of performance. Updating pg_class every time we extend any relation in the system by a block is definitely going to be painfully slow -- and there's also the problem that you can't very well track the length of pg_class itself by updating pg_class, because you might not be able to update pg_class without extending it. What seems more practical is to store metadata in a metapage within each relation or in some separate storage. However, even that is far from problem-free. Even in the best case where there are no other problems, you're talking about emitting WAL records upon relation extension, which I suspect would cause a quite noticeable impact if you did it for every block. An idea that I had was to keep track of the number of segments rather than the entire length of the relation. That's not as good, because then you can't detect truncation of the last file, but it would be good enough to detect the disappearance of entire files, and it would mean that the metadata only needs to be updated once per GB of the relation rather than every time you extend. But even this has a lot of engineering challenges. To really be able to do the cross-checks in a meaningful way, you'd want the md* functions to have access to the information -- and I'm having some difficulty imagining how we would arrange for that. For instance, if mdread() is asked for a block and first needs to know whether that block (or the containing segment) should exist, it's not going to have access to the relcache to check some cached data. We could possibly cache something in the SMgrRelation, but if the cache is not populated, then we'd have to read the data from the original source. But surely we can't have mdread() calling ReadBuffer(); that would be a huge layering violation and would likely cause some very unpleasant problems. I expect there is some way to rejigger things so that the md.c layer has to be told by the higher layers how many segments can exist and then to figure out a way to bootstrap things, but it's probably all quite complicated so I am definitely not volunteering to be the one to do the work... --=20 Robert Haas EDB: http://www.enterprisedb.com