Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iBQPx-0002pV-Hh for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Sep 2019 21:26:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1iBQPw-0004EK-2f for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Sep 2019 21:26:16 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iBQPu-0004Dx-1J for pgsql-hackers@lists.postgresql.org; Fri, 20 Sep 2019 21:26:15 +0000 Received: from wout3-smtp.messagingengine.com ([64.147.123.19]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1iBQPp-0003Mq-Kj for pgsql-hackers@lists.postgresql.org; Fri, 20 Sep 2019 21:26:13 +0000 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.west.internal (Postfix) with ESMTP id 108B7448; Fri, 20 Sep 2019 17:26:06 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Fri, 20 Sep 2019 17:26:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=fm2; bh=8Q38gr7WCUZK5Ra8RJAT2xMsYk4 nwjXlurpCYJ9AGaQ=; b=lgKI9SzgX7/7zAfknegjKG8ZsFTPnnZusiGxD4JFXKG ypoi5TRR/E1jrvmBTb+Ff/y1vYjgCykHwn2JXX+Nyu4OzkhvdsBtuVMBDTTLW9P0 JCFLGNf0oTdGGithmPuC/AL2H7aceIs91X6KOUCt09jZqUWyYHHtlbNcZnjmNhHd JjN2LAD1p8oDXaUqXez8zrSneDqTZZlKY+HZVFq09DMHITaghmjsaTpAKBtpOTUO uxvUAiLnTHZjk6TF+oMU0OzewL3U4iN2y06ZmYh7wdcqbY4MehRQRD2P8re11s+6 HRQj/JqmpcpQDFXEVcf9rnNb/1yKFqWkk8OQJa93o8w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; bh=8Q38gr 7WCUZK5Ra8RJAT2xMsYk4nwjXlurpCYJ9AGaQ=; b=zjRcyza9ufeNTLaJVdNNzz /97XCcc5tP87hsm/ueTB7Nvki5q5yTTUOSfjMg9/XOh140zSzq3DnehEq4VBRUU5 rkhDNOiK5JMRTF5CJRkVa4yGN2CnTGCjuvKQHdiUpyUOfJ9KrSHD2jCPI1nw8Nv1 Llab6PNqqUn3BE6IrMbhH5juPaEfd/HdzeK1lHx0jNWdwzTGvdB2/XKGpkHxkvx/ bxj8vvuY0YYJvzzyB9/vDC6hhmKyLy6F4dceoqmrWK+DWJGIfluZPZqQo0e6+0cT 52qxggpAxqFCZ+ji7/9dmYL5yAksK3GGv7bGBurN7VMKRG/wtm6hgM3QntuG3T1w == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvddvgdduiedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehmtderredttddvnecuhfhrohhmpeetnhgurhgv shcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgriigvlhdruggvqeenucfkphepie ejrdduiedtrddvudekrddvfeejnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgurhgv shesrghnrghrrgiivghlrdguvgenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from intern.anarazel.de (c-67-160-218-237.hsd1.ca.comcast.net [67.160.218.237]) by mail.messagingengine.com (Postfix) with ESMTPA id A5A08D6005A; Fri, 20 Sep 2019 17:26:04 -0400 (EDT) Date: Fri, 20 Sep 2019 14:26:03 -0700 From: Andres Freund To: Tom Lane Cc: Kuntal Ghosh , Michael Paquier , Tomas Vondra , Robert Haas , Thomas Munro , PostgreSQL Hackers Subject: Re: subscriptionCheck failures on nightjar Message-ID: <20190920212603.7zlgrlwtdirbmuw7@alap3.anarazel.de> References: <7361.1568738373@sss.pgh.pa.us> <20190917194510.iqwyl3be62pz7l27@development> <20190918005815.GB8909@paquier.xyz> <20190918215808.yonxqgycme6pbctp@development> <20190919042305.GA21144@paquier.xyz> <20190920170831.aaljabal6lyivre5@alap3.anarazel.de> <29511.1569011121@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="33t7ipt62rzqqnz6" Content-Disposition: inline In-Reply-To: <29511.1569011121@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk --33t7ipt62rzqqnz6 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, On 2019-09-20 16:25:21 -0400, Tom Lane wrote: > Andres Freund writes: > > Since now a number of people (I tried as well), failed to reproduce this > > locally, I propose that we increase the log-level during this test on > > master. And perhaps expand the set of debugging information. With the > > hope that the additional information on the cases encountered on the bf > > helps us build a reproducer or, even better, diagnose the issue > > directly. If people agree, I'll come up with a patch. > > I recreated my freebsd-9-under-qemu setup and I can still reproduce > the problem, though not with high reliability (order of 1 time in 10). > Anything particular you want logged? A DEBUG2 log would help a fair bit, because it'd log some information about what changes the "horizons" determining when data may be removed. Perhaps with the additional elogs attached? I lowered some messages to DEBUG2 so we don't have to suffer the noise of the ipc.c DEBUG3 messages. If I use a TEMP_CONFIG setting log_min_messages=DEBUG2 with the patches applied, the subscription tests still pass. I hope they still fail on your setup, even though the increased logging volume probably changes timing somewhat. Greetings, Andres Freund --33t7ipt62rzqqnz6 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="verbose.diff" diff --git i/src/backend/access/transam/xlog.c w/src/backend/access/transam/xlog.c index b7ff004234a..0d48ef3f039 100644 --- i/src/backend/access/transam/xlog.c +++ w/src/backend/access/transam/xlog.c @@ -2676,9 +2676,16 @@ XLogSetAsyncXactLSN(XLogRecPtr asyncXactLSN) void XLogSetReplicationSlotMinimumLSN(XLogRecPtr lsn) { + XLogRecPtr old_lsn; + SpinLockAcquire(&XLogCtl->info_lck); + old_lsn = XLogCtl->replicationSlotMinLSN; XLogCtl->replicationSlotMinLSN = lsn; SpinLockRelease(&XLogCtl->info_lck); + + elog(DEBUG2, "updating slot minimum lsn from %X/%X to %X/%X", + (uint32) (old_lsn >> 32), (uint32) old_lsn, + (uint32) (lsn >> 32), (uint32) lsn); } diff --git i/src/backend/replication/logical/snapbuild.c w/src/backend/replication/logical/snapbuild.c index 0bd1d0f9545..f7d270340de 100644 --- i/src/backend/replication/logical/snapbuild.c +++ w/src/backend/replication/logical/snapbuild.c @@ -913,7 +913,7 @@ SnapBuildPurgeCommittedTxn(SnapBuild *builder) memcpy(builder->committed.xip, workspace, surviving_xids * sizeof(TransactionId)); - elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u", + elog(DEBUG2, "purged committed transactions from %u to %u, xmin: %u, xmax: %u", (uint32) builder->committed.xcnt, (uint32) surviving_xids, builder->xmin, builder->xmax); builder->committed.xcnt = surviving_xids; @@ -1140,7 +1140,7 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact xmin = ReorderBufferGetOldestXmin(builder->reorder); if (xmin == InvalidTransactionId) xmin = running->oldestRunningXid; - elog(DEBUG3, "xmin: %u, xmax: %u, oldest running: %u, oldest xmin: %u", + elog(DEBUG2, "xmin: %u, xmax: %u, oldest running: %u, oldest xmin: %u", builder->xmin, builder->xmax, running->oldestRunningXid, xmin); LogicalIncreaseXminForSlot(lsn, xmin); @@ -1952,6 +1952,10 @@ CheckPointSnapBuild(void) /* now check for the restart ptrs from existing slots */ cutoff = ReplicationSlotsComputeLogicalRestartLSN(); + elog(DEBUG2, "doing snapbuild checkpoint with restart lsn %X/%X, redo %X/%X", + (uint32) (cutoff >> 32), (uint32) cutoff, + (uint32) (redo >> 32), (uint32) redo); + /* don't start earlier than the restart lsn */ if (redo < cutoff) cutoff = redo; diff --git i/src/backend/storage/ipc/procarray.c w/src/backend/storage/ipc/procarray.c index 8abcfdf841f..9e8a93e5962 100644 --- i/src/backend/storage/ipc/procarray.c +++ w/src/backend/storage/ipc/procarray.c @@ -2981,16 +2981,24 @@ void ProcArraySetReplicationSlotXmin(TransactionId xmin, TransactionId catalog_xmin, bool already_locked) { + TransactionId old_xmin; + TransactionId old_catalog_xmin; + Assert(!already_locked || LWLockHeldByMe(ProcArrayLock)); if (!already_locked) LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); + old_xmin = procArray->replication_slot_xmin; + old_catalog_xmin = procArray->replication_slot_catalog_xmin; procArray->replication_slot_xmin = xmin; procArray->replication_slot_catalog_xmin = catalog_xmin; if (!already_locked) LWLockRelease(ProcArrayLock); + + elog(DEBUG2, "updating slot xmin/catalog_xmin from %u/%u to %u/%u", + old_xmin, old_catalog_xmin, xmin, catalog_xmin); } /* --33t7ipt62rzqqnz6--