Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1i2F3X-0003PC-JF for pgsql-hackers@arkaria.postgresql.org; Mon, 26 Aug 2019 13:29:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.89) (envelope-from ) id 1i2F3W-00063h-5N for pgsql-hackers@arkaria.postgresql.org; Mon, 26 Aug 2019 13:29:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1i2F3V-00061e-In for pgsql-hackers@lists.postgresql.org; Mon, 26 Aug 2019 13:29:09 +0000 Received: from mail-wr1-x441.google.com ([2a00:1450:4864:20::441]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1i2F3T-0003Mh-IK for pgsql-hackers@lists.postgresql.org; Mon, 26 Aug 2019 13:29:09 +0000 Received: by mail-wr1-x441.google.com with SMTP id g17so15347672wrr.5 for ; Mon, 26 Aug 2019 06:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=2ndquadrant-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=82lvbrlweyTzYCbavSVA28f7Qj+i5P12z4qICjXBNgA=; b=xQiF/ijTbCBuZjX9nYvAesrFDqkxS7woypRwKh97mQGF1zaN6CzcjznLPyJQLtzf4G d1zYZ6rmNq01Rn3uHJ7ziZUytoLCEjgtqTablL9wrdscYn+zletYq7aXW3roh713VjK+ RyIXec4EQMJZpI0TOkAssvFczsnvtR0N6EVpos6wBtXRD3wl9gdU4Ges18H4xjE9jZyN dYHSaGoSzM59y4FQUI7x0jT9Voe3n6ZF0mwecdIE7+uNrpDQFMpKVxFeCb9kfG4EwIAD hAPMAnP16WT8gv+HiyYc5pWSr7nqRHJgESvqkGyfzp7fWJ4GspjC9fHHNi0hHoWFX6Ho F6Kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=82lvbrlweyTzYCbavSVA28f7Qj+i5P12z4qICjXBNgA=; b=kYENZ2LE4YbIXgPd6gjKGRRgkAN3u1GBv5jlGUz99SCRGTL48qbxBm2P4SDNQ98AVF Wuq8lnMZeGxLfXcOU8IQLKCLt/9SqE5L1eDAGqK65DK2mm1KeRnMhNCNzxNPfdM4BJDS iSHjqCi+sYxqxAiSbg2MSid2C0nUiRu/7mPA7EGgJLzEFqN2fUlTJa2n/LWCKJ9SXfbJ AJazc7vdEIO9x25AzcCTRJiieA/W47FQRxX7G3YV/Ijj+TCcWwMac8z6Aro/Y1fRtK4o BLxfijxMGBZh5pqCX9NHbh5+/NBKyXE38SbQoHU2+4mhUdJA1v/CoYLEA72wiUOe/7+b Vqcw== X-Gm-Message-State: APjAAAWex8aXJkcUe9Qgn0m/nU4AA55ipwkHAaHoYCfZM+EtZfIwNuu6 OttGdvZ8dq8jv+fhBC5fo3HarA== X-Google-Smtp-Source: APXvYqxZQBFXLKK6BlVoqBNoFR/FYNr6xUS7c//M334F4nlN+nWOk9L5zOq6BE8G9ruxJinfQAAkpQ== X-Received: by 2002:adf:82d4:: with SMTP id 78mr21008647wrc.85.1566826146439; Mon, 26 Aug 2019 06:29:06 -0700 (PDT) Received: from localhost (ip-86-49-253-160.net.upcbroadband.cz. [86.49.253.160]) by smtp.gmail.com with ESMTPSA id f7sm14171870wrf.8.2019.08.26.06.29.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2019 06:29:04 -0700 (PDT) Date: Mon, 26 Aug 2019 15:29:04 +0200 From: Tomas Vondra To: Michael Paquier Cc: Andres Freund , Thomas Munro , Tom Lane , PostgreSQL Hackers Subject: Re: subscriptionCheck failures on nightjar Message-ID: <20190826132904.3ayuw36qzl2c4ktr@development> References: <29708.1550079455@sss.pgh.pa.us> <20190213174151.mfylkessxmapt4io@alap3.anarazel.de> <30608.1550080759@sss.pgh.pa.us> <20190213181225.fathyapig4sm4exa@alap3.anarazel.de> <31663.1550082243@sss.pgh.pa.us> <20190213183303.ns54frt7cmvo6pgg@alap3.anarazel.de> <1466.1550085086@sss.pgh.pa.us> <20190213215147.cjbymfojf6xndr4t@alap3.anarazel.de> <20190813080435.GL2551@paquier.xyz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20190813080435.GL2551@paquier.xyz> User-Agent: NeoMutt/20180716-1444-295967 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk On Tue, Aug 13, 2019 at 05:04:35PM +0900, Michael Paquier wrote: >On Wed, Feb 13, 2019 at 01:51:47PM -0800, Andres Freund wrote: >> I'm not yet sure that that's actually something that's supposed to >> happen, I got to spend some time analysing how this actually >> happens. Normally the contents of the slot should actually prevent it >> from being removed (as they're newer than >> ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's >> a bug in the drop logic in newer releases. > >In the same context, could it be a consequence of 9915de6c which has >introduced a conditional variable to control slot operations? This >could have exposed more easily a pre-existing race condition. >-- This is one of the remaining open items, and we don't seem to be moving forward with it :-( I'm willing to take a stab at it, but to do that I need a way to reproduce it. Tom, you mentioned you've managed to reproduce it in a qemu instance, but that it took some fiddling with qemu parmeters or something. Can you share what exactly was necessary? An observation about the issue - while we started to notice this after Decemeber, that's mostly because the PANIC patch went it shortly before. We've however seen the issue before, as Thomas Munro mentioned in [1]. Those reports are from August, so it's quite possible something in the first CF upset the code. And there's only a single commit in 2018-07 that seems related to logical decoding / snapshots [2], i.e. f49a80c: commit f49a80c481f74fa81407dce8e51dea6956cb64f8 Author: Alvaro Herrera Date: Tue Jun 26 16:38:34 2018 -0400 Fix "base" snapshot handling in logical decoding ... The other reason to suspect this is related is that the fix also made it to REL_11_STABLE at that time, and if you check the buildfarm data [3], you'll see 11 fails on nightjar too, from time to time. This means it's not a 12+ only issue, it's a live issue on 11. I don't know if f49a80c is the culprit, or if it simply uncovered a pre-existing bug (e.g. due to timing). [1] https://www.postgresql.org/message-id/CAEepm%3D0wB7vgztC5sg2nmJ-H3bnrBT5GQfhUzP%2BFfq-WT3g8VA%40mail.gmail.com [2] https://commitfest.postgresql.org/18/1650/ [3] https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=nightjar&br=REL_11_STABLE -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services