Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gu1Wv-0007D3-MN for pgsql-hackers@arkaria.postgresql.org; Wed, 13 Feb 2019 20:53:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1gu1Wt-0004sH-KP for pgsql-hackers@arkaria.postgresql.org; Wed, 13 Feb 2019 20:53:15 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gu1Wt-0004pg-32 for pgsql-hackers@lists.postgresql.org; Wed, 13 Feb 2019 20:53:15 +0000 Received: from mail-wm1-x343.google.com ([2a00:1450:4864:20::343]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1gu1Wp-0001P9-9f for pgsql-hackers@lists.postgresql.org; Wed, 13 Feb 2019 20:53:13 +0000 Received: by mail-wm1-x343.google.com with SMTP id y185so2762343wmd.1 for ; Wed, 13 Feb 2019 12:53:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=R0MQkVZ0kjZ15hC/Qu20jJ1spUkSWDn+d4mlcYMICr0=; b=yEviiazhonI4s1XAXisG6C+StNs+ZrSRtFwMT7ddP3oG39TH/vbnWiGXQZN0lvI5WF qAZZE5Sl7O/QGjKe6/i4u9/MLLnJBb/lMt0MHar/nI00VIBf7MNnf20BUX3k34dnQ4vg Gl+DW/9XZQQry23cRTScnGVBYoU3kDJz377+OazoRjWazf79+0Ydip6MeAoWp8tqhx9+ 27krVWye+pQlntGiOHiKlfetxyZ1AwvoTwuyj7PgcD4bxG89h6czBFPYo3JoppU+qA04 OWap7MQSg0uUS6YIL5XtDLKarrnn7ada9WdiXbsQjaGY9VNXFUXBfVEYEeiZxzGlJmW7 Qs0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=R0MQkVZ0kjZ15hC/Qu20jJ1spUkSWDn+d4mlcYMICr0=; b=aGOQn8zUSN4GCmGybKRjtxLZqyY3X+Z69GbRT8BwU4LbzNy5NTOFMEKTQrLz7mzaBs Sb8G/qhRnl2vle7OprYlAB1UKuubH95dH9r7S4luYUyu6+x8EuTU6ABOQ5I4wWrFniXs JsfMCL9OnsQzVMPWLrn1y+tjZlh/8WEszapbgrLoDxLPH0L/kh6/Uu+hX3Xd/QbUavbH gNRIvYf0/rDiApZFIAPbKew6AwHpHeNEwncaAA6kgsJq+OqEiwSP88N0k9mRIVXo35GN /yCmFYLoeY7I1PmiCccxaA/w6h7aXOimqe1QB1qNr1h7UkdGKEruTQ7olM5hdcjYg1RY VSZA== X-Gm-Message-State: AHQUAuarFNfxrllZjmYUFUf3W5TlQeieykrSfaG3cNY0eDf3uHHncRGO nc5LVi5Qv4kOIypt+dx9NfZCVESoSUcm8coh/+jgSQ== X-Google-Smtp-Source: AHgI3IZP0WTM+zrpbzXD4R5NaQCT2yRBD1GaPr3cwxqfdpTOzoI3OpODvRlWeqaucHKgnzv/nVKMYs4uwzD78AfKexA= X-Received: by 2002:a1c:7dd6:: with SMTP id y205mr29441wmc.121.1550091189461; Wed, 13 Feb 2019 12:53:09 -0800 (PST) MIME-Version: 1.0 References: <17827.1549866683@sss.pgh.pa.us> <27965.1550077052@sss.pgh.pa.us> <20190213171101.6wpz7tardp3t3uvk@alap3.anarazel.de> <29708.1550079455@sss.pgh.pa.us> <20190213174151.mfylkessxmapt4io@alap3.anarazel.de> <30608.1550080759@sss.pgh.pa.us> <20190213181225.fathyapig4sm4exa@alap3.anarazel.de> <31663.1550082243@sss.pgh.pa.us> <20190213183303.ns54frt7cmvo6pgg@alap3.anarazel.de> <1466.1550085086@sss.pgh.pa.us> In-Reply-To: <1466.1550085086@sss.pgh.pa.us> From: Thomas Munro Date: Thu, 14 Feb 2019 09:52:33 +1300 Message-ID: Subject: Re: subscriptionCheck failures on nightjar To: Tom Lane Cc: Andres Freund , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk On Thu, Feb 14, 2019 at 8:11 AM Tom Lane wrote: > Andres Freund writes: > > I was kinda pondering just open coding it. I am not yet convinced that > > my idea of just using an open FD isn't the least bad approach for the > > issue at hand. What precisely is the NFS issue you're concerned about? > > I'm not sure that fsync-on-FD after the rename will work, considering that > the issue here is that somebody might've unlinked the file altogether > before we get to doing the fsync. I don't have a hard time believing that > that might result in a failure report on NFS or similar. Yeah, it's > hypothetical, but the argument that we need a repeat fsync at all seems > equally hypothetical. > > > Right now fsync_fname_ext isn't exposed outside fd.c... > > Mmm. That makes it easier to consider changing its API. Just to make sure I understand: it's OK for the file not to be there when we try to fsync it by name, because a concurrent checkpoint can remove it, having determined that we don't need it anymore? In other words, we really needed either missing_ok=true semantics, or to use the fd we already had instead of the name? I found 3 examples of this failing with an ERROR (though not turning the BF red, so nobody noticed) before the PANIC patch went in: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-09-10%2020%3A54%3A21&stg=subscription-check 2018-09-10 17:20:09.247 EDT [23287] sub1 ERROR: could not open file "pg_logical/snapshots/0-161D778.snap": No such file or directory 2018-09-10 17:20:09.247 EDT [23285] ERROR: could not receive data from WAL stream: ERROR: could not open file "pg_logical/snapshots/0-161D778.snap": No such file or directory https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-31%2023%3A25%3A41&stg=subscription-check 2018-08-31 19:52:06.634 EDT [52724] sub1 ERROR: could not open file "pg_logical/snapshots/0-161D718.snap": No such file or directory 2018-08-31 19:52:06.634 EDT [52721] ERROR: could not receive data from WAL stream: ERROR: could not open file "pg_logical/snapshots/0-161D718.snap": No such file or directory https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=nightjar&dt=2018-08-22%2021%3A49%3A18&stg=subscription-check 2018-08-22 18:10:29.422 EDT [44208] sub1 ERROR: could not open file "pg_logical/snapshots/0-161D718.snap": No such file or directory 2018-08-22 18:10:29.422 EDT [44206] ERROR: could not receive data from WAL stream: ERROR: could not open file "pg_logical/snapshots/0-161D718.snap": No such file or directory -- Thomas Munro http://www.enterprisedb.com