public inbox for [email protected]  
help / color / mirror / Atom feed
From: vignesh C <[email protected]>
To: Heikki Linnakangas <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: Random pg_upgrade 004_subscription test failure on drongo
Date: Mon, 22 Sep 2025 14:28:35 +0530
Message-ID: <CALDaNm1NtWVosSSb9mp3OKic60em5HF2zmURC77MLWyYLMWqyw@mail.gmail.com> (raw)
In-Reply-To: <CALDaNm2y+nf-V9tjKwvbPprobZs1t_UrcCpJ0qYD5-KkOUFAyg@mail.gmail.com>
References: <CALDaNm3tjY44HoSwY84=XGEbTg0ruVfD4hAMTm=TgBqVysH4Qw@mail.gmail.com>
	<[email protected]>
	<CALDaNm2y+nf-V9tjKwvbPprobZs1t_UrcCpJ0qYD5-KkOUFAyg@mail.gmail.com>

On Fri, 21 Mar 2025 at 18:54, vignesh C <[email protected]> wrote:
>
> On Thu, 13 Mar 2025 at 18:10, Heikki Linnakangas <[email protected]> wrote:
> >
> >
> > Hmm, this problem isn't limited to this one pg_upgrade test, right? It
> > could happen with any pg_upgrade invocation. And perhaps in a running
> > server too, if a relfilenumber is reused quickly. In dropdb() and
> > DropTableSpace() we do this:
> >
> > WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
> >
> > Should we do the same here? Not sure where exactly to put that; perhaps
> > in mdcreate(), if the creation fails with STATUS_DELETE_PENDING.
>
> How about a patch similar to the attached one? I have run pg_upgrade
> tests multiple times, but unfortunately, I was unable to reproduce the
> issue or verify these changes.

CFBot reported an issue in one of the machines, here is an updated
version for the same.

Regards,
Vignesh


Attachments:

  [application/octet-stream] v2-0001-Fix-issue-with-file-handle-retention-during-CREAT.patch (2.7K, 2-v2-0001-Fix-issue-with-file-handle-retention-during-CREAT.patch)
  download | inline diff:
From f076ec514631034e081740291d069a1f20fbb0a1 Mon Sep 17 00:00:00 2001
From: Vignesh <[email protected]>
Date: Fri, 21 Mar 2025 18:24:48 +0530
Subject: [PATCH v2] Fix issue with file handle retention during CREATE
 DATABASE in pg_restore

During upgrades, when pg_restore performs CREATE DATABASE, the
bgwriter or checkpointer may flush buffers and hold a file handle
for the table. This causes issues if the table needs to be re-created
later (e.g., after a TRUNCATE command), especially on OSes like older
versions of Windows, where unlinked files aren't fully removed until
they are no longer open.

This commit fixes the issue by checking for STATUS_DELETE_PENDING and
calling WaitForProcSignalBarrier, ensuring that all smgr file descriptors
are closed across all backends before retrying the file operation.
---
 src/backend/storage/smgr/md.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2ccb0faceb5..a97afedafdd 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -31,10 +31,20 @@
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
+
+#if defined(WIN32) && !defined(__CYGWIN__)
+#include "port/win32ntdll.h"
+#endif
+
 #include "storage/aio.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/md.h"
+
+#if defined(WIN32) && !defined(__CYGWIN__)
+#include "storage/procsignal.h"
+#endif
+
 #include "storage/relfilelocator.h"
 #include "storage/smgr.h"
 #include "storage/sync.h"
@@ -214,6 +224,9 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 	MdfdVec    *mdfd;
 	RelPathStr	path;
 	File		fd;
+#if defined(WIN32) && !defined(__CYGWIN__)
+	bool		retryattempted = false;
+#endif
 
 	if (isRedo && reln->md_num_open_segs[forknum] > 0)
 		return;					/* created and opened already... */
@@ -235,6 +248,9 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 
 	path = relpath(reln->smgr_rlocator, forknum);
 
+#if defined(WIN32) && !defined(__CYGWIN__)
+retry:
+#endif
 	fd = PathNameOpenFile(path.str, _mdfd_open_flags() | O_CREAT | O_EXCL);
 
 	if (fd < 0)
@@ -245,6 +261,15 @@ mdcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo)
 			fd = PathNameOpenFile(path.str, _mdfd_open_flags());
 		if (fd < 0)
 		{
+#if defined(WIN32) && !defined(__CYGWIN__)
+			if (!retryattempted && pg_RtlGetLastNtStatus() == STATUS_DELETE_PENDING)
+			{
+				retryattempted = true;
+				WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
+				goto retry;
+			}
+#endif
+
 			/* be sure to report the error reported by create, not open */
 			errno = save_errno;
 			ereport(ERROR,
-- 
2.43.0



view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Random pg_upgrade 004_subscription test failure on drongo
  In-Reply-To: <CALDaNm1NtWVosSSb9mp3OKic60em5HF2zmURC77MLWyYLMWqyw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox