Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tseUU-003zbM-FC for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Mar 2025 09:04:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tseUS-008OmL-Lj for pgsql-hackers@arkaria.postgresql.org; Thu, 13 Mar 2025 09:04:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tseUS-008OjL-A0 for pgsql-hackers@lists.postgresql.org; Thu, 13 Mar 2025 09:04:32 +0000 Received: from mail-yw1-x1134.google.com ([2607:f8b0:4864:20::1134]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tseUO-002cQp-1e for pgsql-hackers@lists.postgresql.org; Thu, 13 Mar 2025 09:04:31 +0000 Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-6f768e9be1aso18894017b3.0 for ; Thu, 13 Mar 2025 02:04:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741856667; x=1742461467; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=gL6ZoBEVpNlYb7r8uceNy9HjzJkOtbGLF3l4eqnIOoA=; b=Rsr7wdAmK3aCSQj/5meJ5IcpfC6SUecd7VDs9CRoaFM9etAfSF+HkncQO6EbbfBUpc +gPj1O48MfuzotRnNJqo2vlm//XOxaFIitPFbNkhFSWtAX7b/+GZMneyPV03izP9yrsb GOkn8wKdh0J7vySjLGfmxbdxnp/7KAuxkzSSxCrU7yweCelv1GMYIq3EkDm7rKSHR5+F /Um+/bE1bUHeEdxltlrxImZnmUK0rb+KGqQqobCTjXybi2JBHjfGfL8l4S87wmkoO4Pn KtBhPndq8iPnRtxiNBxUyr3MwLC84/GBNYMOM+yuhB/JRpJrOns65cctLsjxvHUBGm6a NIuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741856667; x=1742461467; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=gL6ZoBEVpNlYb7r8uceNy9HjzJkOtbGLF3l4eqnIOoA=; b=UJkceATFgN7mF4rwOSvOHQihlpDk35wQUeIsrThfaZRVmb50Rags3Onc5ugK8hO70H L83M1+FEqdXQdSuQV4GbLzwMin/QkytUwrAxyXep2HWzlbe66uOixHyHq/McMca+qdHh CtjE+1ZNthmaDNphmCuPcpS05e3/pk296M252ikCk3O3Yhg3LhwkaePJ2uzPU45P0H9u qUSOH+sPs8zOBLnFSN3rb+WmI+VF1peEMlYtAg1SboKAnQCJUa0OZrT/ErNOST5aCYyg eAmvtflvNhluvUfgtQPVNPcSm4YhN3i6xZUPeNOc+crdvUOTqxdfsz9QC7vHxpaisIVq VBIA== X-Gm-Message-State: AOJu0YzsJFe9Y6mA/RaKrKdDeYE+tRb13TxLk5IY6InT1ZyRc2sxcQ8C U7gN3GtM9I4eucDpFSK+ulQGjGcyFB3YOgVokrhbyOALy7T/om86tIOqEffjwwLli0HtR0orMvc LwxIUbDZdtMrAfNxajE0N2l5eab6wl9kg75A= X-Gm-Gg: ASbGnctSnzIDG9hZI6lOq/hL8Tb4NnG9wwQ4xsccdj0PCf8WDPLVxAGCydC0KkUKYVx 3ovtBeEJQUy0DrV7gNkbJofCRMEv6tIBfoy4pkoX3CvSmvke/d0/vH8vRO6Hjoq94bVfCYoHZuf HZOdRVAy5BYGOfXXm6akwZ4iNjxw== X-Google-Smtp-Source: AGHT+IGJ2U/14hC2nasXNierSLvGov3Cs1HpAFuTcLMIYt76gCY8CoJq1Cgmb2+5C7TLQniC67QeLF7aW74BB0PaHhs= X-Received: by 2002:a05:690c:fce:b0:6fb:949b:57b with SMTP id 00721157ae682-6ff2f811c4amr20684847b3.12.1741856667122; Thu, 13 Mar 2025 02:04:27 -0700 (PDT) MIME-Version: 1.0 From: vignesh C Date: Thu, 13 Mar 2025 14:34:15 +0530 X-Gm-Features: AQ5f1JqMwwqusI3jBn23RrhuGCVjZz-RWAcKdLehHNfqfpBFAzIqj6Lne3bauro Message-ID: Subject: Random pg_upgrade 004_subscription test failure on drongo To: PostgreSQL Hackers Content-Type: multipart/mixed; boundary="000000000000e54ea80630359bbd" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e54ea80630359bbd Content-Type: text/plain; charset="UTF-8" Hi, In one of the recent buildfarm runs, the test pg_upgrade/004_subscription test fails at [1]. It has failed with the following: Restoring database schemas in the new cluster *failure* Consult the last few lines of "C:/prog/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_new_sub_data/pgdata/pg_upgrade_output.d/20250310T194018.517/log/pg_upgrade_dump_1.log" for the probable cause of the failure. Failure, exiting Unfortunately we don't have pg_upgrade_output.d contents in buildfarm to see what is the exact reason. I checked old buildfarm failures and found a similar failure in drongo for pg_upgarde/003_logical_slots at [2]. The test failure can happen because of the reason mentioned at [3] and [6]. I'm just requoting the contents from there, if you have checked [3] and [6] skip the below contents: --------------------------------------------- The failure reason for the earlier failure was found to be: According to it, the TRUNCATE command seemed to be failed due to a "File exists" error. 2023-11-15 00:02:02.239 UTC [1752:18] 003_logical_slots.pl ERROR: could not create file "base/1/2683": File exists 2023-11-15 00:02:02.239 UTC [1752:19] 003_logical_slots.pl STATEMENT: -- For binary upgrade, preserve pg_largeobject and index relfilenodes SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('2683'::pg_catalog.oid); SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('2613'::pg_catalog.oid); TRUNCATE pg_catalog.pg_largeobject; ... ## Analysis I think it was caused due to the STATUS_DELETE_PENDING failure, not related with recent updates for pg_upgrade. The file "base/1/2683" is an index file for pg_largeobject_loid_pn_index, and the output meant that file creation failed. Below is a backtrace. ``` pgwin32_open() // <-- this returns -1 open() BasicOpenFilePerm() PathNameOpenFilePerm() PathNameOpenFile() mdcreate() smgrcreate() RelationCreateStorage() RelationSetNewRelfilenumber() ExecuteTruncateGuts() ExecuteTruncate() ``` But this is strange. Before calling mdcreate(), we surely unlink the file which have the same name. Below is a trace until unlink. ``` pgunlink() unlink() mdunlinkfork() mdunlink() smgrdounlinkall() RelationSetNewRelfilenumber() // common path with above ExecuteTruncateGuts() ExecuteTruncate() ``` I found Thomas said that [4] pgunlink sometimes could not remove a file even if it returns OK, at that time NTSTATUS is STATUS_DELETE_PENDING. Also, a comment in pgwin32_open_handle() mentions the same thing: ``` /* * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet * gone (Windows NT status code is STATUS_DELETE_PENDING). In that * case, we'd better ask for the NT status too so we can translate it * to a more Unix-like error. We hope that nothing clobbers the NT * status in between the internal NtCreateFile() call and CreateFile() * returning. * ``` The definition of STATUS_DELETE_PENDING can be seen in [5]. Based on that, indeed, open() would be able to fail with STATUS_DELETE_PENDING if the deletion is pending but it is trying to open. --------------------------------------------- This was fixed by the following change in the target upgrade nodes: bgwriter_lru_maxpages = 0 checkpoint_timeout = 1h Attached is a patch in similar lines for 004_subscription. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-03-10%2019%3A26%3A35 [2] - https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com [3] - https://www.postgresql.org/message-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA%40TYAPR01MB5866.jpnprd01.prod.outlook.com [4] - https://www.postgresql.org/message-id/CA%2BhUKGKsdzw06c5nnb%3DKYG9GmvyykoVpJA_VR3k0r7cZOKcx6Q%40mail.gmail.com [5] - https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55 [6]- https://www.postgresql.org/message-id/f0d303f1-e380-5988-91c7-74b755abd4bb%40gmail.com Regards, Vignesh --000000000000e54ea80630359bbd Content-Type: application/octet-stream; name="0001-Fix-an-intermetant-BF-failure-in-004_subscription.patch" Content-Disposition: attachment; filename="0001-Fix-an-intermetant-BF-failure-in-004_subscription.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_m8746d0z0 RnJvbSBjNzg2NWFlZDg0OTQyZTc2ZDg4ZmMwOGZkYmNlNzRlMmIwMDFjZWMyIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBWaWduZXNoIDx2aWduZXNoMjFAZ21haWwuY29tPgpEYXRlOiBU aHUsIDEzIE1hciAyMDI1IDE0OjAxOjAzICswNTMwClN1YmplY3Q6IFtQQVRDSF0gRml4IGFuIGlu dGVybWV0YW50IEJGIGZhaWx1cmUgaW4gMDA0X3N1YnNjcmlwdGlvbi4KCkR1cmluZyB1cGdyYWRl LCB3aGVuIHBnX3Jlc3RvcmUgcGVyZm9ybXMgQ1JFQVRFIERBVEFCQVNFLCBiZ3dyaXRlciBvcgpj aGVja3BvaW50ZXIgbWF5IGZsdXNoIGJ1ZmZlcnMgYW5kIGhvbGQgYSBmaWxlIGhhbmRsZSBmb3Ig cGdfbGFyZ2VvYmplY3QsCnNvIGxhdGVyIFRSVU5DQVRFIHBnX2xhcmdlb2JqZWN0IGNvbW1hbmQg d2lsbCBmYWlsIGlmIE9TIChzdWNoIGFzIG9sZGVyCldpbmRvd3MgdmVyc2lvbnMpIGRvZXNuJ3Qg cmVtb3ZlIGFuIHVubGlua2VkIGZpbGUgY29tcGxldGVseSB0aWxsIGl0J3MKb3Blbi4KLS0tCiBz cmMvYmluL3BnX3VwZ3JhZGUvdC8wMDRfc3Vic2NyaXB0aW9uLnBsIHwgMTMgKysrKysrKysrKysr KwogMSBmaWxlIGNoYW5nZWQsIDEzIGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9zcmMvYmlu L3BnX3VwZ3JhZGUvdC8wMDRfc3Vic2NyaXB0aW9uLnBsIGIvc3JjL2Jpbi9wZ191cGdyYWRlL3Qv MDA0X3N1YnNjcmlwdGlvbi5wbAppbmRleCBlM2NjZmY5ZjI3MC4uOTUxMGUzMmE3ZTMgMTAwNjQ0 Ci0tLSBhL3NyYy9iaW4vcGdfdXBncmFkZS90LzAwNF9zdWJzY3JpcHRpb24ucGwKKysrIGIvc3Jj L2Jpbi9wZ191cGdyYWRlL3QvMDA0X3N1YnNjcmlwdGlvbi5wbApAQCAtMjksNiArMjksMTkgQEAg bXkgJG9sZGJpbmRpciA9ICRvbGRfc3ViLT5jb25maWdfZGF0YSgnLS1iaW5kaXInKTsKICMgSW5p dGlhbGl6ZSB0aGUgbmV3IHN1YnNjcmliZXIKIG15ICRuZXdfc3ViID0gUG9zdGdyZVNRTDo6VGVz dDo6Q2x1c3Rlci0+bmV3KCduZXdfc3ViJyk7CiAkbmV3X3N1Yi0+aW5pdDsKKworIyBEdXJpbmcg dXBncmFkZSwgd2hlbiBwZ19yZXN0b3JlIHBlcmZvcm1zIENSRUFURSBEQVRBQkFTRSwgYmd3cml0 ZXIgb3IKKyMgY2hlY2twb2ludGVyIG1heSBmbHVzaCBidWZmZXJzIGFuZCBob2xkIGEgZmlsZSBo YW5kbGUgZm9yIHRoZSBzeXN0ZW0gdGFibGUuCisjIFNvLCBpZiBsYXRlciBkdWUgdG8gc29tZSBy ZWFzb24gd2UgbmVlZCB0byByZS1jcmVhdGUgdGhlIGZpbGUgd2l0aCB0aGUgc2FtZQorIyBuYW1l IGxpa2UgYSBUUlVOQ0FURSBjb21tYW5kIG9uIHRoZSBzYW1lIHRhYmxlLCB0aGVuIHRoZSBjb21t YW5kIHdpbGwgZmFpbAorIyBpZiBPUyAoc3VjaCBhcyBvbGRlciBXaW5kb3dzIHZlcnNpb25zKSBk b2Vzbid0IHJlbW92ZSBhbiB1bmxpbmtlZCBmaWxlCisjIGNvbXBsZXRlbHkgdGlsbCBpdCBpcyBv cGVuLgorJG5ld19zdWItPmFwcGVuZF9jb25mKAorCSdwb3N0Z3Jlc3FsLmNvbmYnLCBxeworYmd3 cml0ZXJfbHJ1X21heHBhZ2VzID0gMAorY2hlY2twb2ludF90aW1lb3V0ID0gMWgKK30pOworCiBt eSAkbmV3YmluZGlyID0gJG5ld19zdWItPmNvbmZpZ19kYXRhKCctLWJpbmRpcicpOwogCiAjIElu IGEgVlBBVEggYnVpbGQsIHdlJ2xsIGJlIHN0YXJ0ZWQgaW4gdGhlIHNvdXJjZSBkaXJlY3Rvcnks IGJ1dCB3ZSB3YW50Ci0tIAoyLjQzLjAKCg== --000000000000e54ea80630359bbd--