public inbox for [email protected]  
help / color / mirror / Atom feed
From: Andres Freund <[email protected]>
To: Kuntal Ghosh <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: subscriptionCheck failures on nightjar
Date: Fri, 20 Sep 2019 10:08:31 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAGz5QC+5_mPFoDj7ZSMV0gwvMY+kdOp4t1w=TTDpzuV9F2-X6g@mail.gmail.com>
References: <[email protected]>
	<20190826132904.3ayuw36qzl2c4ktr@development>
	<CA+TgmoaNOMG9+Ho9d3CX+-10O7+nqqvmSpXb1m0F3dqWB4C-8g@mail.gmail.com>
	<[email protected]>
	<20190917194510.iqwyl3be62pz7l27@development>
	<[email protected]>
	<CAGz5QCJv5JbRDsATDTkJqq7h9F7u0QLnNnLHfxR1nEOa4DnkJQ@mail.gmail.com>
	<20190918215808.yonxqgycme6pbctp@development>
	<[email protected]>
	<CAGz5QC+5_mPFoDj7ZSMV0gwvMY+kdOp4t1w=TTDpzuV9F2-X6g@mail.gmail.com>

Hi,

On 2019-09-19 17:20:15 +0530, Kuntal Ghosh wrote:
> It seems there is a pattern how the error is occurring in different
> systems. Following are the relevant log snippets:
> 
> nightjar:
> sub3 LOG:  received replication command: CREATE_REPLICATION_SLOT
> "sub3_16414_sync_16394" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
> sub3 LOG:  logical decoding found consistent point at 0/160B578
> sub1 PANIC:  could not open file
> "pg_logical/snapshots/0-160B578.snap": No such file or directory
> 
> dromedary scenario 1:
> sub3_16414_sync_16399 LOG:  received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
> sub2 PANIC:  could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
> 
> 
> dromedary scenario 2:
> sub3_16414_sync_16399 LOG:  received replication command:
> CREATE_REPLICATION_SLOT "sub3_16414_sync_16399" TEMPORARY LOGICAL
> pgoutput USE_SNAPSHOT
> sub3_16414_sync_16399 LOG:  logical decoding found consistent point at 0/15EA694
> sub1 PANIC:  could not open file
> "pg_logical/snapshots/0-15EA694.snap": No such file or directory
> 
> While subscription 3 is created, it eventually reaches to a consistent
> snapshot point and prints the WAL location corresponding to it. It
> seems sub1/sub2 immediately fails to serialize the snapshot to the
> .snap file having the same WAL location.

Since now a number of people (I tried as well), failed to reproduce this
locally, I propose that we increase the log-level during this test on
master. And perhaps expand the set of debugging information. With the
hope that the additional information on the cases encountered on the bf
helps us build a reproducer or, even better, diagnose the issue
directly.  If people agree, I'll come up with a patch.

Greetings,

Andres Freund





view thread (44+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: subscriptionCheck failures on nightjar
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox