public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andres Freund <[email protected]>
To: Tom Lane <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: subscriptionCheck failures on nightjar
Date: Wed, 13 Feb 2019 09:11:01 -0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAEepm=1pbie9C_PtojGum7qXAAU1hB8JtA6v_9dQFPgay3PcZg@mail.gmail.com>
	<[email protected]>

Hi,

On 2019-02-13 11:57:32 -0500, Tom Lane wrote:
> I've managed to reproduce this locally, and obtained this PANIC:

Cool. How exactly?

Nice catch.

> Anyway, I think we might be able to fix this along the lines of
> 
>     CloseTransientFile(fd);
> 
> +   /* ensure snapshot file is down to stable storage */
> +   fsync_fname(tmppath, false);
>     fsync_fname("pg_logical/snapshots", true);
> 
>     /*
>      * We may overwrite the work from some other backend, but that's ok, our
>      * snapshot is valid as well, we'll just have done some superfluous work.
>      */
>     if (rename(tmppath, path) != 0)
>     {
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not rename file \"%s\" to \"%s\": %m",
>                         tmppath, path)));
>     }
> 
> -   /* make sure we persist */
> -   fsync_fname(path, false);
> +   /* ensure we persist the file rename */
>     fsync_fname("pg_logical/snapshots", true);

Hm, but that's not the same? On some filesystems one needs the directory
fsync, on some the file fsync, and I think both in some cases. ISTM we
should just open the file before the rename, and then use fsync() on the
filehandle rather than the filename.  Then a concurrent rename ought not
to hurt us?


> The existing code here seems simply wacky/unsafe to me regardless
> of this race condition: couldn't it potentially result in a corrupt
> snapshot file appearing with a valid name, if the system crashes
> after persisting the rename but before it's pushed the data out?

What do you mean precisely with "before it's pushed the data out"?


> I also wonder why bother with the directory sync just before the
> rename.

Because on some FS/OS combinations the size of the renamed-into-place
file isn't guaranteed to be durable unless the directory was
fsynced. Otherwise there's the possibility to have incomplete data if we
crash after renaming, but before fsyncing.

Greetings,

Andres Freund




view thread (44+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: subscriptionCheck failures on nightjar
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox