Re: The ability of postgres to determine loss of files of the main fork

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Frits Hoogland <[email protected]>
To: Jakub Wartak <[email protected]>
Cc: Michael Banck <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Aleksander Alekseev <[email protected]>
Cc: [email protected]
Subject: Re: The ability of postgres to determine loss of files of the main fork
Date: Wed, 1 Oct 2025 14:15:06 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAKZiRmwHKY=KTjBEL3S2cVQpo1OjHyky4BdgJm4Hkv1-ig9PfQ@mail.gmail.com>
References: <[email protected]>
	<CAJ7c6TMYvYBQpgv8k+4QnajqPcroOzrj7d3snBBTKX4tHQfVYw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAKZiRmwHKY=KTjBEL3S2cVQpo1OjHyky4BdgJm4Hkv1-ig9PfQ@mail.gmail.com>

Thank you for looking into this Jakub, Aleksander, Michael, Tom,

There are more scenario's where files might be missing: how about a backup somehow missing files?
But also deliberate sabotage, you can hide data in this way, and it would be hard to detect, and even
harder to understand what happened, why it happened and how it happened. I dare to say quite ideal
if you want a name to be removed from a database a covert way, and know something about the
database and have access, or somehow can manipulate in the data directory.

Without having the exact sourcecode files and functions handy, the way I think using the segments
works in postgres, is that the database, because of the absence of any indicator of size, will scan 
the relfilenode, and when it reaches the set limit (1GB), it will try to open the next segment. If that 
segment doesn't exist, the database assumes it's the end of the segment. That is all. That is why there
is no error: it's exactly alike when there truly is no more data, and there is no metadata to understand
there is supposed to be more data.

And therefore I talked about the primary key: that contains "indirect" data to be able to detect if a
table has a missing segment. But I think really what is needed is a max page number to indicate
the known size.



Frits Hoogland




> On 1 Oct 2025, at 13:20, Jakub Wartak <[email protected]> wrote:
> 
> On Wed, Oct 1, 2025 at 9:02 AM Michael Banck <[email protected]> wrote:
>> 
>> Hi,
>> 
>> wow, this is one of the most terrifying threads I've ever seen...
> 
> Same.
> 
>> On Tue, Sep 30, 2025 at 12:41:29PM -0400, Tom Lane wrote:
>>> Aleksander Alekseev <[email protected]> writes:
>>>>> Therefore, I would like to request an enhancement: add an option to
>>>>> verify_heapam() that causes the primary key index to be scanned and makes
>>>>> sure that all line pointers in the index point to existing tuples.
>>> 
>>>> ... IMO there is little value in adding a check for the existence of
>>>> the segments for a single table. And the *real* check will not differ
>>>> much from something like SELECT * FROM my_table, or from making a
>>>> complete backup of the database.
>>> 
>>> As Frits mentioned, neither of those actions will really notice if a
>>> table has been truncated via loss of a segment.
>> 
>> Is there a valid case for a missing segment? If not, couldn't this be
>> caught somewhere in the storage manager?
>> 
> 
> I've took a look on PG17 and in _mfd_openseg() there's if fd < 0
> return NULL after open(), but out of it's callers only _mdfd_getseg()
> seems to be alerting on that NULL. To me this seems like a bug,
> because i've seen way too many times people and software deleting
> files randomly. Even simple crashes (with e2fsck, xfs_repair) could
> put orphaned inodes into /lost+found. IMHO all files should be opened
> at least on startup to check integrity, because the non-zero return
> code (during such SELECT) for openat(2) seems o be coming out of
> RelationGetNumberOfBlocksInFork()->table_block_relation_size()->smgrnblocks()->mdnblocks()->_mdfd_openseg().
> Now if the 1st seg file would be missing we would complain in
> mdopenfork(). mdnblocks() says even "all active segments of the
> relation are opened...", but even that apparently is not true.
> 
> The bigger context seems to be be that 049469e7e7cfe0c69 (2015) could
> be culprit here as well, as it is stated there that mdnblocks() could
> earlier create zero-length files back in day and it removed that
> ereport(ERROR) when unable to access that file.
> 
> Another idea (than this being a bug) is that Thomas had a large
> relation patchset back in [1], but I wouldn't be a fan of us operating
> on 31-32TB files ;)
> 
> -J.
> 
> [1] - https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA...

view thread (21+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: The ability of postgres to determine loss of files of the main fork
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox