public inbox for [email protected]  
help / color / mirror / Atom feed
strange git problems on turaco
7+ messages / 4 participants
[nested] [flat]

* strange git problems on turaco
@ 2024-12-02 01:20 Tomas Vondra <[email protected]>
  2024-12-02 01:56 ` Re: strange git problems on turaco Tom Lane <[email protected]>
  2024-12-02 03:46 ` Re: strange git problems on turaco Noah Misch <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Tomas Vondra @ 2024-12-02 01:20 UTC (permalink / raw)
  To: [email protected]

Hi,

turaco seems to be having some strange git issues - some of the
buildfarm runs fail like this:


turaco:REL_16_STABLE [22:41:11] OK
Sun Dec  1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting
turaco:REL_17_STABLE [22:41:27] checking out source ...
Missing checked out branch bf_REL_17_STABLE:
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git
Sun Dec  1 22:41:33 2024: buildfarm run for turaco:HEAD starting
turaco:HEAD          [22:41:33] checking out source ...


I initially suspected this might be due to aging storage (SD card on
rpi), but I replaced that, and there's nothing strange in dmesg. Also,
other branches seem to be working fine ...

Any ideas what could be causing this?


regards

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
@ 2024-12-02 01:56 ` Tom Lane <[email protected]>
  2024-12-02 02:23   ` Re: strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-12 15:46   ` Re: strange git problems on turaco Andrew Dunstan <[email protected]>
  1 sibling, 2 replies; 7+ messages in thread

From: Tom Lane @ 2024-12-02 01:56 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: [email protected]

Tomas Vondra <[email protected]> writes:
> turaco seems to be having some strange git issues - some of the
> buildfarm runs fail like this:

Have you tried rm -rf'ing its git repo and letting the script
check that out from scratch?  The fact that it's just the 17
branch has a whiff of repo corruption.

Andrew might correct me, but I think you have to remove
both the pgmirror.git directory and the per-branch pgsql
subdirectories to be clean.  Don't remove the various
<animal>.* status files.

			regards, tom lane





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-02 01:56 ` Re: strange git problems on turaco Tom Lane <[email protected]>
@ 2024-12-02 02:23   ` Tomas Vondra <[email protected]>
  1 sibling, 0 replies; 7+ messages in thread

From: Tomas Vondra @ 2024-12-02 02:23 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]



On 12/2/24 02:56, Tom Lane wrote:
> Tomas Vondra <[email protected]> writes:
>> turaco seems to be having some strange git issues - some of the
>> buildfarm runs fail like this:
> 
> Have you tried rm -rf'ing its git repo and letting the script
> check that out from scratch?  The fact that it's just the 17
> branch has a whiff of repo corruption.
> 

I actually nuked the whole buildroot, because the old SD card was having
issues and I wasn't sure what might be corrupted. So it's all fresh. But
I also first ran

   ./run_branches.pl --run-all --nosend --nostatus

just to make sure everything works fine, and it did ...

> Andrew might correct me, but I think you have to remove
> both the pgmirror.git directory and the per-branch pgsql
> subdirectories to be clean.  Don't remove the various
> <animal>.* status files.
> 

Done. Let's see how quickly it breaks again.


regards

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-02 01:56 ` Re: strange git problems on turaco Tom Lane <[email protected]>
@ 2024-12-12 15:46   ` Andrew Dunstan <[email protected]>
  1 sibling, 0 replies; 7+ messages in thread

From: Andrew Dunstan @ 2024-12-12 15:46 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; +Cc: [email protected]


On 2024-12-01 Su 8:56 PM, Tom Lane wrote:
> Tomas Vondra <[email protected]> writes:
>> turaco seems to be having some strange git issues - some of the
>> buildfarm runs fail like this:
> Have you tried rm -rf'ing its git repo and letting the script
> check that out from scratch?  The fact that it's just the 17
> branch has a whiff of repo corruption.
>
> Andrew might correct me, but I think you have to remove
> both the pgmirror.git directory and the per-branch pgsql
> subdirectories to be clean.  Don't remove the various
> <animal>.* status files.
>
> 			


In most cases you only have to remove the per-branch pgsql directory. 
I've only very occasionally seem corruption of the mirror.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com







^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
@ 2024-12-02 03:46 ` Noah Misch <[email protected]>
  2024-12-02 13:51   ` Re: strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-12 15:43   ` Re: strange git problems on turaco Andrew Dunstan <[email protected]>
  1 sibling, 2 replies; 7+ messages in thread

From: Noah Misch @ 2024-12-02 03:46 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: [email protected]

On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote:
> turaco seems to be having some strange git issues - some of the
> buildfarm runs fail like this:
> 
> 
> turaco:REL_16_STABLE [22:41:11] OK
> Sun Dec  1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting
> turaco:REL_17_STABLE [22:41:27] checking out source ...
> Missing checked out branch bf_REL_17_STABLE:
> fatal: not a git repository (or any parent up to mount point /mnt)
> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git
> Sun Dec  1 22:41:33 2024: buildfarm run for turaco:HEAD starting
> turaco:HEAD          [22:41:33] checking out source ...
> 
> 
> I initially suspected this might be due to aging storage (SD card on
> rpi), but I replaced that, and there's nothing strange in dmesg. Also,
> other branches seem to be working fine ...
> 
> Any ideas what could be causing this?

I had this happen ~9 times on the host of my AIX buildfarm members.  Example:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28

I figured it was some system problem, so I didn't root-cause it.  I carry the
following workaround in my fork of the buildfarm client code.  The unknown
problem caused failure reports and work stoppage ~4 times before I installed
this workaround, then logs show the workaround prevented damage 5 times.  The
last "removed intruder .git" log message appeared on 2024-07-23.  There was no
kernel reboot, and logs don't point to buildfarm client processes getting
involuntary termination, either.

diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm
index dcfd180..2cd610a 100644
--- a/PGBuild/SCM.pm
+++ b/PGBuild/SCM.pm
@@ -1059,9 +1059,19 @@ sub _update_target
 	my @gitlog;
 
 	# If a run crashed during copy_source(), repair.
-	if (-d "./git-save" && !-d "$target/.git")
+	if (-d "./git-save")
 	{
+		# As of 2024-07-13, the following has happened about four times in the
+		# last month, to different gcc111 animals.  Despite no known crash,
+		# there's a git-save directory containing the proper git repo, and
+		# there's a bogus .git missing most content.  Remove the bogus one.
+		# This is deeply hacky, but it beats buildfarm report noise and manual
+		# intervention.
+		if (rmtree("$target/.git") > 0) {
+			print "removed intruder .git\n" if $verbose;
+		}
 		move "./git-save", "$target/.git";
+		print "restored git-save\n" if $verbose;
 	}
 
 	chdir $target;





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-02 03:46 ` Re: strange git problems on turaco Noah Misch <[email protected]>
@ 2024-12-02 13:51   ` Tomas Vondra <[email protected]>
  1 sibling, 0 replies; 7+ messages in thread

From: Tomas Vondra @ 2024-12-02 13:51 UTC (permalink / raw)
  To: Noah Misch <[email protected]>; +Cc: [email protected]

On 12/2/24 04:46, Noah Misch wrote:
> On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote:
>> turaco seems to be having some strange git issues - some of the
>> buildfarm runs fail like this:
>>
>>
>> turaco:REL_16_STABLE [22:41:11] OK
>> Sun Dec  1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting
>> turaco:REL_17_STABLE [22:41:27] checking out source ...
>> Missing checked out branch bf_REL_17_STABLE:
>> fatal: not a git repository (or any parent up to mount point /mnt)
>> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
>> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git
>> Sun Dec  1 22:41:33 2024: buildfarm run for turaco:HEAD starting
>> turaco:HEAD          [22:41:33] checking out source ...
>>
>>
>> I initially suspected this might be due to aging storage (SD card on
>> rpi), but I replaced that, and there's nothing strange in dmesg. Also,
>> other branches seem to be working fine ...
>>
>> Any ideas what could be causing this?
> 
> I had this happen ~9 times on the host of my AIX buildfarm members.  Example:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28
> 
> I figured it was some system problem, so I didn't root-cause it.  I carry the
> following workaround in my fork of the buildfarm client code.  The unknown
> problem caused failure reports and work stoppage ~4 times before I installed
> this workaround, then logs show the workaround prevented damage 5 times.  The
> last "removed intruder .git" log message appeared on 2024-07-23.  There was no
> kernel reboot, and logs don't point to buildfarm client processes getting
> involuntary termination, either.
> 

Thanks. I suspect some system issue too, but I didn't want to blame the
system without some kind of proof. I applied your patch, let's see if
that helped after a couple runs.


regards

-- 
Tomas Vondra






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: strange git problems on turaco
  2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
  2024-12-02 03:46 ` Re: strange git problems on turaco Noah Misch <[email protected]>
@ 2024-12-12 15:43   ` Andrew Dunstan <[email protected]>
  1 sibling, 0 replies; 7+ messages in thread

From: Andrew Dunstan @ 2024-12-12 15:43 UTC (permalink / raw)
  To: Noah Misch <[email protected]>; Tomas Vondra <[email protected]>; +Cc: [email protected]


On 2024-12-01 Su 10:46 PM, Noah Misch wrote:
> On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote:
>> turaco seems to be having some strange git issues - some of the
>> buildfarm runs fail like this:
>>
>>
>> turaco:REL_16_STABLE [22:41:11] OK
>> Sun Dec  1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting
>> turaco:REL_17_STABLE [22:41:27] checking out source ...
>> Missing checked out branch bf_REL_17_STABLE:
>> fatal: not a git repository (or any parent up to mount point /mnt)
>> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
>> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git
>> Sun Dec  1 22:41:33 2024: buildfarm run for turaco:HEAD starting
>> turaco:HEAD          [22:41:33] checking out source ...
>>
>>
>> I initially suspected this might be due to aging storage (SD card on
>> rpi), but I replaced that, and there's nothing strange in dmesg. Also,
>> other branches seem to be working fine ...
>>
>> Any ideas what could be causing this?
> I had this happen ~9 times on the host of my AIX buildfarm members.  Example:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28
>
> I figured it was some system problem, so I didn't root-cause it.  I carry the
> following workaround in my fork of the buildfarm client code.  The unknown
> problem caused failure reports and work stoppage ~4 times before I installed
> this workaround, then logs show the workaround prevented damage 5 times.  The
> last "removed intruder .git" log message appeared on 2024-07-23.  There was no
> kernel reboot, and logs don't point to buildfarm client processes getting
> involuntary termination, either.
>
> diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm
> index dcfd180..2cd610a 100644
> --- a/PGBuild/SCM.pm
> +++ b/PGBuild/SCM.pm
> @@ -1059,9 +1059,19 @@ sub _update_target
>   	my @gitlog;
>   
>   	# If a run crashed during copy_source(), repair.
> -	if (-d "./git-save" && !-d "$target/.git")
> +	if (-d "./git-save")
>   	{
> +		# As of 2024-07-13, the following has happened about four times in the
> +		# last month, to different gcc111 animals.  Despite no known crash,
> +		# there's a git-save directory containing the proper git repo, and
> +		# there's a bogus .git missing most content.  Remove the bogus one.
> +		# This is deeply hacky, but it beats buildfarm report noise and manual
> +		# intervention.
> +		if (rmtree("$target/.git") > 0) {
> +			print "removed intruder .git\n" if $verbose;
> +		}
>   		move "./git-save", "$target/.git";
> +		print "restored git-save\n" if $verbose;
>   	}
>   
>   	chdir $target;
>
>

[catching up a huge email backlog]

That's kinda weird. The .git directory doesn't get moved at all if you 
have vpath turned on or you're building with meson (which always does 
vpath). So that's one possible workaround.

I guess I should put something like this in the next release ... will go 
and do that.


cheers


andrew






--
Andrew Dunstan
EDB: https://www.enterprisedb.com






^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2024-12-12 15:46 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]>
2024-12-02 01:56 ` Tom Lane <[email protected]>
2024-12-02 02:23   ` Tomas Vondra <[email protected]>
2024-12-12 15:46   ` Andrew Dunstan <[email protected]>
2024-12-02 03:46 ` Noah Misch <[email protected]>
2024-12-02 13:51   ` Tomas Vondra <[email protected]>
2024-12-12 15:43   ` Andrew Dunstan <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox