public inbox for [email protected]  
help / color / mirror / Atom feed
From: Noah Misch <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: [email protected]
Subject: Re: strange git problems on turaco
Date: Sun, 1 Dec 2024 19:46:23 -0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote:
> turaco seems to be having some strange git issues - some of the
> buildfarm runs fail like this:
> 
> 
> turaco:REL_16_STABLE [22:41:11] OK
> Sun Dec  1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting
> turaco:REL_17_STABLE [22:41:27] checking out source ...
> Missing checked out branch bf_REL_17_STABLE:
> fatal: not a git repository (or any parent up to mount point /mnt)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git
> Sun Dec  1 22:41:33 2024: buildfarm run for turaco:HEAD starting
> turaco:HEAD          [22:41:33] checking out source ...
> 
> 
> I initially suspected this might be due to aging storage (SD card on
> rpi), but I replaced that, and there's nothing strange in dmesg. Also,
> other branches seem to be working fine ...
> 
> Any ideas what could be causing this?

I had this happen ~9 times on the host of my AIX buildfarm members.  Example:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28

I figured it was some system problem, so I didn't root-cause it.  I carry the
following workaround in my fork of the buildfarm client code.  The unknown
problem caused failure reports and work stoppage ~4 times before I installed
this workaround, then logs show the workaround prevented damage 5 times.  The
last "removed intruder .git" log message appeared on 2024-07-23.  There was no
kernel reboot, and logs don't point to buildfarm client processes getting
involuntary termination, either.

diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm
index dcfd180..2cd610a 100644
--- a/PGBuild/SCM.pm
+++ b/PGBuild/SCM.pm
@@ -1059,9 +1059,19 @@ sub _update_target
 	my @gitlog;
 
 	# If a run crashed during copy_source(), repair.
-	if (-d "./git-save" && !-d "$target/.git")
+	if (-d "./git-save")
 	{
+		# As of 2024-07-13, the following has happened about four times in the
+		# last month, to different gcc111 animals.  Despite no known crash,
+		# there's a git-save directory containing the proper git repo, and
+		# there's a bogus .git missing most content.  Remove the bogus one.
+		# This is deeply hacky, but it beats buildfarm report noise and manual
+		# intervention.
+		if (rmtree("$target/.git") > 0) {
+			print "removed intruder .git\n" if $verbose;
+		}
 		move "./git-save", "$target/.git";
+		print "restored git-save\n" if $verbose;
 	}
 
 	chdir $target;





view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: strange git problems on turaco
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox