Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iBMOg-0001YZ-EI for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Sep 2019 17:08:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1iBMOe-0007wj-IZ for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Sep 2019 17:08:40 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iBMOc-0007wO-At for pgsql-hackers@lists.postgresql.org; Fri, 20 Sep 2019 17:08:40 +0000 Received: from out2-smtp.messagingengine.com ([66.111.4.26]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1iBMOZ-00049l-86 for pgsql-hackers@lists.postgresql.org; Fri, 20 Sep 2019 17:08:37 +0000 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id D505F215B2; Fri, 20 Sep 2019 13:08:33 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Fri, 20 Sep 2019 13:08:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm2; bh=M28aIuhb7Pko9Y+0zV/M7HNyLeW j+SA5Oq8NUM+JoG4=; b=qGblii2uZS/zFdG8bIEjjIBVKps3WjRQ95fFiHHBUes iD1ZgXJQF9/QjxN/UnLGwhQRXXPRxKtnZXxepBF4Jidezdyz1gU8Ovn5uuKqocvG zKZNFLrzXYRO3fQVRxFg7mRyA8Ie3Pe08dygIK15m5cDiq+Ev7vsq+SwT+COGMHx sDYcDpiAJdzoEuS9Q3umb1woAOXL9HWrL1SNCtHdUlmo4qNy0ZN9eV6ZFCSni6kg KHERNVJspmuVpnuPuPyHbkCyWPeYUavIJUIYos3Q0D2tF6/hV5fKVycRJFD7ry9J 0+1nRSieiQ60GOdaA7XUhVV/KLchVGvgZHShhATBYbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=M28aIu hb7Pko9Y+0zV/M7HNyLeWj+SA5Oq8NUM+JoG4=; b=Uw+pJomxnSHI3yYy6JZSn3 5fboCCcOYuv/e3EAITozxt5z1oKT1/LxFdjM4eFCkBrst38sxXkxbPMQ20YW5aPs A1OTMT21rbcZhlJJkOr3tja9arR1YfAW52TYfSFqN6hvLbz/EJgd1hQ1G5miHxuX nvRPL8/giquo+CPmSCj0xUz2CegretPfED4qeW7RkEzcjf1/cVtMFnEI8WdH7u5s gn1ScBUGQMD1xfJ/MbIWDCk1IBaOj0HcclVvlBwr6fRaIrvSWKQYpIF0u5y6JRf9 7gfh20TPzEtdE8mzf5iJqp8Yd49tVHKjddH68K+ZylYvRY8mX+oJvDQU6s/3ShgQ == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvddvgddutdelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucfkphepie ejrdduiedtrddvudekrddvfeejnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgv shesrghnrghrrgiivghlrdguvgenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from intern.anarazel.de (c-67-160-218-237.hsd1.ca.comcast.net [67.160.218.237]) by mail.messagingengine.com (Postfix) with ESMTPA id 5A5B28005B; Fri, 20 Sep 2019 13:08:32 -0400 (EDT) Date: Fri, 20 Sep 2019 10:08:31 -0700 From: Andres Freund To: Kuntal Ghosh Cc: Michael Paquier , Tomas Vondra , Tom Lane , Robert Haas , Thomas Munro , PostgreSQL Hackers Subject: Re: subscriptionCheck failures on nightjar Message-ID: <20190920170831.aaljabal6lyivre5@alap3.anarazel.de> References: <20190813080435.GL2551@paquier.xyz> <20190826132904.3ayuw36qzl2c4ktr@development> <7361.1568738373@sss.pgh.pa.us> <20190917194510.iqwyl3be62pz7l27@development> <20190918005815.GB8909@paquier.xyz> <20190918215808.yonxqgycme6pbctp@development> <20190919042305.GA21144@paquier.xyz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk Hi, On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote: > It seems there is a pattern how the error is occurring in different > systems. Following are the relevant log snippets: > > nightjar: > sub3 LOG: received replication command: CREATE_REPLICATION_SLOT > "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT > sub3 LOG: logical decoding found consistent point at 0/160B578 > sub1 PANIC: could not open file > "pg_logical/snapshots/0-160B578.snap": No such file or directory > > dromedary scenario 1: > sub3_16414_sync_16399 LOG: received replication command: > CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL > pgoutput USE_SNAPSHOT > sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694 > sub2 PANIC: could not open file > "pg_logical/snapshots/0-15EA694.snap": No such file or directory > > > dromedary scenario 2: > sub3_16414_sync_16399 LOG: received replication command: > CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL > pgoutput USE_SNAPSHOT > sub3_16414_sync_16399 LOG: logical decoding found consistent point at 0/15EA694 > sub1 PANIC: could not open file > "pg_logical/snapshots/0-15EA694.snap": No such file or directory > > While subscription 3 is created, it eventually reaches to a consistent > snapshot point and prints the WAL location corresponding to it. It > seems sub1/sub2 immediately fails to serialize the snapshot to the > .snap file having the same WAL location. Since now a number of people (I tried as well), failed to reproduce this locally, I propose that we increase the log-level during this test on master. And perhaps expand the set of debugging information. With the hope that the additional information on the cases encountered on the bf helps us build a reproducer or, even better, diagnose the issue directly. If people agree, I'll come up with a patch. Greetings, Andres Freund