Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hgIwO-0001OL-MT for pgsql-hackers@arkaria.postgresql.org; Thu, 27 Jun 2019 01:11:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1hgIwM-0002dH-ON for pgsql-hackers@arkaria.postgresql.org; Thu, 27 Jun 2019 01:11:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hgIwM-0002Zu-DN for pgsql-hackers@lists.postgresql.org; Thu, 27 Jun 2019 01:11:06 +0000 Received: from sss.pgh.pa.us ([66.207.139.130]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hgIwE-0005dt-FL for pgsql-hackers@lists.postgresql.org; Thu, 27 Jun 2019 01:11:06 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.14.4/8.14.4) with ESMTP id x5R1AnJu021918; Wed, 26 Jun 2019 21:10:49 -0400 From: Tom Lane To: Andres Freund cc: Thomas Munro , PostgreSQL Hackers Subject: Re: subscriptionCheck failures on nightjar In-reply-to: <20190213215147.cjbymfojf6xndr4t@alap3.anarazel.de> References: <27965.1550077052@sss.pgh.pa.us> <20190213171101.6wpz7tardp3t3uvk@alap3.anarazel.de> <29708.1550079455@sss.pgh.pa.us> <20190213174151.mfylkessxmapt4io@alap3.anarazel.de> <30608.1550080759@sss.pgh.pa.us> <20190213181225.fathyapig4sm4exa@alap3.anarazel.de> <31663.1550082243@sss.pgh.pa.us> <20190213183303.ns54frt7cmvo6pgg@alap3.anarazel.de> <1466.1550085086@sss.pgh.pa.us> <20190213215147.cjbymfojf6xndr4t@alap3.anarazel.de> Comments: In-reply-to Andres Freund message dated "Wed, 13 Feb 2019 13:51:47 -0800" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <21916.1561597849.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Wed, 26 Jun 2019 21:10:49 -0400 Message-ID: <21917.1561597849@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk Andres Freund writes: > On 2019-02-14 09:52:33 +1300, Thomas Munro wrote: >> Just to make sure I understand: it's OK for the file not to be there >> when we try to fsync it by name, because a concurrent checkpoint can >> remove it, having determined that we don't need it anymore? In other >> words, we really needed either missing_ok=3Dtrue semantics, or to use >> the fd we already had instead of the name? > I'm not yet sure that that's actually something that's supposed to > happen, I got to spend some time analysing how this actually > happens. Normally the contents of the slot should actually prevent it > from being removed (as they're newer than > ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's > a bug in the drop logic in newer releases. My animal dromedary just reproduced this failure, which we've previously only seen on nightjar. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=3Ddromedary&dt=3D2= 019-06-26%2023%3A57%3A45 regards, tom lane