Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tshuW-004s0y-1L for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Mar 2025 12:43:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tshuU-00CpHD-4E for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Mar 2025 12:43:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tshrt-00CjVA-Um for pgsql-hackers@lists.postgresql.org; Thu, 13 Mar 2025 12:40:58 +0000 Received: from meesny.iki.fi ([195.140.195.201]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tshrr-002c21-19 for pgsql-hackers@lists.postgresql.org; Thu, 13 Mar 2025 12:40:56 +0000 Received: from [192.168.1.112] (iptv-hkibng21-58c090-167.dhcp.inet.fi [88.192.144.167]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by meesny.iki.fi (Postfix) with ESMTPSA id 4ZD6YH4Wh3zyR7; Thu, 13 Mar 2025 14:40:51 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1741869651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7iVXafIQlyWEomMrVsYA02s34OfvJ+l4kMBmoU/3O3E=; b=hCWSyrznfZxr39xLeRR3ltrJO71MBO/tzq/nC7tYmvC2VOlJEpyhz/nK3lmMTg3DAWZG+B PzsGKOPDml9czAgY7W9VDG0GHpme0+w1cKBQGkXqWSjNjxoHCG3g6dLWZsJgiNTaGtHlRD ZM5AIYUalFvXBBaWFTN6tsp662XrhlY= ARC-Seal: i=1; s=meesny; d=iki.fi; t=1741869651; a=rsa-sha256; cv=none; b=FS/mTRZMFjHELuloPUWNdtA5e9dhaGvHmUhnd7wmC+3A+iJ7PVpE351o6WPb0DJwZ25a3L lhbI46RxBjPRHPf7QtkyRYaSVXQ2BDw6xcPjor2wqAF9PlZFaXmhTSEj0jzlMsccThtMOZ 5e2snn8KDmCIUXOHM/p3K3XFfj5U0n8= ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=meesny; t=1741869651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7iVXafIQlyWEomMrVsYA02s34OfvJ+l4kMBmoU/3O3E=; b=Pj43j3i6MLg/wSPmFHs59xMUZhuP3uzPEzRoHqsE2qS2BynDOw3Y5HXG8cqYKfjm+co/bB qkM3JJPW6MRXkU4f8KZZek48nQZkQss12oPqOxusnWhHW/uYCpDgovcoOoeQrYBnGeNnCr 3OYJzMCVwXNR8ODNqb4X8S9hWn/OA2c= Message-ID: Date: Thu, 13 Mar 2025 14:40:50 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Random pg_upgrade 004_subscription test failure on drongo To: vignesh C , PostgreSQL Hackers References: Content-Language: en-US From: Heikki Linnakangas In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 13/03/2025 11:04, vignesh C wrote: > ## Analysis > I think it was caused due to the STATUS_DELETE_PENDING failure, not > related with recent > updates for pg_upgrade. > > The file "base/1/2683" is an index file for > pg_largeobject_loid_pn_index, and the > output meant that file creation failed. Below is a backtrace. > > ``` > pgwin32_open() // <-- this returns -1 > open() > BasicOpenFilePerm() > PathNameOpenFilePerm() > PathNameOpenFile() > mdcreate() > smgrcreate() > RelationCreateStorage() > RelationSetNewRelfilenumber() > ExecuteTruncateGuts() > ExecuteTruncate() > ``` > > But this is strange. Before calling mdcreate(), we surely unlink the > file which have the same name. Below is a trace until unlink. > > ``` > pgunlink() > unlink() > mdunlinkfork() > mdunlink() > smgrdounlinkall() > RelationSetNewRelfilenumber() // common path with above > ExecuteTruncateGuts() > ExecuteTruncate() > ``` > > I found Thomas said that [4] pgunlink sometimes could not remove a > file even if it returns OK, at that time NTSTATUS is > STATUS_DELETE_PENDING. Also, a comment in pgwin32_open_handle() > mentions the same thing: > > ``` > /* > * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet > * gone (Windows NT status code is STATUS_DELETE_PENDING). In that > * case, we'd better ask for the NT status too so we can translate it > * to a more Unix-like error. We hope that nothing clobbers the NT > * status in between the internal NtCreateFile() call and CreateFile() > * returning. > * > ``` > > The definition of STATUS_DELETE_PENDING can be seen in [5]. Based on > that, indeed, open() would be able to fail with STATUS_DELETE_PENDING > if the deletion is pending but it is trying to open. > --------------------------------------------- > > This was fixed by the following change in the target upgrade nodes: > bgwriter_lru_maxpages = 0 > checkpoint_timeout = 1h > > Attached is a patch in similar lines for 004_subscription. Hmm, this problem isn't limited to this one pg_upgrade test, right? It could happen with any pg_upgrade invocation. And perhaps in a running server too, if a relfilenumber is reused quickly. In dropdb() and DropTableSpace() we do this: WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); Should we do the same here? Not sure where exactly to put that; perhaps in mdcreate(), if the creation fails with STATUS_DELETE_PENDING. -- Heikki Linnakangas Neon (https://neon.tech)