public inbox for [email protected]  
help / color / mirror / Atom feed
From: Thomas Munro <[email protected]>
To: Tom Lane <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: subscriptionCheck failures on nightjar
Date: Thu, 14 Feb 2019 09:52:33 +1300
Message-ID: <CAEepm=0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP+Ffq-WT3g8VA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAEepm=1pbie9C_PtojGum7qXAAU1hB8JtA6v_9dQFPgay3PcZg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <[email protected]> wrote:
> Andres Freund <[email protected]> writes:
> > I was kinda pondering just open coding it.  I am not yet convinced that
> > my idea of just using an open FD isn't the least bad approach for the
> > issue at hand.  What precisely is the NFS issue you're concerned about?
>
> I'm not sure that fsync-on-FD after the rename will work, considering that
> the issue here is that somebody might've unlinked the file altogether
> before we get to doing the fsync.  I don't have a hard time believing that
> that might result in a failure report on NFS or similar.  Yeah, it's
> hypothetical, but the argument that we need a repeat fsync at all seems
> equally hypothetical.
>
> > Right now fsync_fname_ext isn't exposed outside fd.c...
>
> Mmm.  That makes it easier to consider changing its API.

Just to make sure I understand: it's OK for the file not to be there
when we try to fsync it by name, because a concurrent checkpoint can
remove it, having determined that we don't need it anymore?  In other
words, we really needed either missing_ok=true semantics, or to use
the fd we already had instead of the name?

I found 3 examples of this failing with an ERROR (though not turning
the BF red, so nobody noticed) before the PANIC patch went in:

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-09-10%2020%3A...
2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory
2018-09-10 17:20:09.247 EDT [23285] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D778.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-31%2023%3A...
2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-31 19:52:06.634 EDT [52721] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&amp;dt=2018-08-22%2021%3A...
2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory
2018-08-22 18:10:29.422 EDT [44206] ERROR:  could not receive data
from WAL stream: ERROR:  could not open file
"pg_logical/snapshots/0-161D718.snap": No such file or directory

-- 
Thomas Munro
http://www.enterprisedb.com




view thread (44+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: subscriptionCheck failures on nightjar
  In-Reply-To: <CAEepm=0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP+Ffq-WT3g8VA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox