public inbox for [email protected]help / color / mirror / Atom feed
strange git problems on turaco 7+ messages / 4 participants [nested] [flat]
* strange git problems on turaco @ 2024-12-02 01:20 Tomas Vondra <[email protected]> 0 siblings, 2 replies; 7+ messages in thread From: Tomas Vondra @ 2024-12-02 01:20 UTC (permalink / raw) To: [email protected] Hi, turaco seems to be having some strange git issues - some of the buildfarm runs fail like this: turaco:REL_16_STABLE [22:41:11] OK Sun Dec 1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting turaco:REL_17_STABLE [22:41:27] checking out source ... Missing checked out branch bf_REL_17_STABLE: fatal: not a git repository (or any parent up to mount point /mnt) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git Sun Dec 1 22:41:33 2024: buildfarm run for turaco:HEAD starting turaco:HEAD [22:41:33] checking out source ... I initially suspected this might be due to aging storage (SD card on rpi), but I replaced that, and there's nothing strange in dmesg. Also, other branches seem to be working fine ... Any ideas what could be causing this? regards -- Tomas Vondra ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-02 01:56 Tom Lane <[email protected]> parent: Tomas Vondra <[email protected]> 1 sibling, 2 replies; 7+ messages in thread From: Tom Lane @ 2024-12-02 01:56 UTC (permalink / raw) To: Tomas Vondra <[email protected]>; +Cc: [email protected] Tomas Vondra <[email protected]> writes: > turaco seems to be having some strange git issues - some of the > buildfarm runs fail like this: Have you tried rm -rf'ing its git repo and letting the script check that out from scratch? The fact that it's just the 17 branch has a whiff of repo corruption. Andrew might correct me, but I think you have to remove both the pgmirror.git directory and the per-branch pgsql subdirectories to be clean. Don't remove the various <animal>.* status files. regards, tom lane ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-02 02:23 Tomas Vondra <[email protected]> parent: Tom Lane <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Tomas Vondra @ 2024-12-02 02:23 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: [email protected] On 12/2/24 02:56, Tom Lane wrote: > Tomas Vondra <[email protected]> writes: >> turaco seems to be having some strange git issues - some of the >> buildfarm runs fail like this: > > Have you tried rm -rf'ing its git repo and letting the script > check that out from scratch? The fact that it's just the 17 > branch has a whiff of repo corruption. > I actually nuked the whole buildroot, because the old SD card was having issues and I wasn't sure what might be corrupted. So it's all fresh. But I also first ran ./run_branches.pl --run-all --nosend --nostatus just to make sure everything works fine, and it did ... > Andrew might correct me, but I think you have to remove > both the pgmirror.git directory and the per-branch pgsql > subdirectories to be clean. Don't remove the various > <animal>.* status files. > Done. Let's see how quickly it breaks again. regards -- Tomas Vondra ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-02 03:46 Noah Misch <[email protected]> parent: Tomas Vondra <[email protected]> 1 sibling, 2 replies; 7+ messages in thread From: Noah Misch @ 2024-12-02 03:46 UTC (permalink / raw) To: Tomas Vondra <[email protected]>; +Cc: [email protected] On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote: > turaco seems to be having some strange git issues - some of the > buildfarm runs fail like this: > > > turaco:REL_16_STABLE [22:41:11] OK > Sun Dec 1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting > turaco:REL_17_STABLE [22:41:27] checking out source ... > Missing checked out branch bf_REL_17_STABLE: > fatal: not a git repository (or any parent up to mount point /mnt) > Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). > turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git > Sun Dec 1 22:41:33 2024: buildfarm run for turaco:HEAD starting > turaco:HEAD [22:41:33] checking out source ... > > > I initially suspected this might be due to aging storage (SD card on > rpi), but I replaced that, and there's nothing strange in dmesg. Also, > other branches seem to be working fine ... > > Any ideas what could be causing this? I had this happen ~9 times on the host of my AIX buildfarm members. Example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28 I figured it was some system problem, so I didn't root-cause it. I carry the following workaround in my fork of the buildfarm client code. The unknown problem caused failure reports and work stoppage ~4 times before I installed this workaround, then logs show the workaround prevented damage 5 times. The last "removed intruder .git" log message appeared on 2024-07-23. There was no kernel reboot, and logs don't point to buildfarm client processes getting involuntary termination, either. diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm index dcfd180..2cd610a 100644 --- a/PGBuild/SCM.pm +++ b/PGBuild/SCM.pm @@ -1059,9 +1059,19 @@ sub _update_target my @gitlog; # If a run crashed during copy_source(), repair. - if (-d "./git-save" && !-d "$target/.git") + if (-d "./git-save") { + # As of 2024-07-13, the following has happened about four times in the + # last month, to different gcc111 animals. Despite no known crash, + # there's a git-save directory containing the proper git repo, and + # there's a bogus .git missing most content. Remove the bogus one. + # This is deeply hacky, but it beats buildfarm report noise and manual + # intervention. + if (rmtree("$target/.git") > 0) { + print "removed intruder .git\n" if $verbose; + } move "./git-save", "$target/.git"; + print "restored git-save\n" if $verbose; } chdir $target; ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-02 13:51 Tomas Vondra <[email protected]> parent: Noah Misch <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Tomas Vondra @ 2024-12-02 13:51 UTC (permalink / raw) To: Noah Misch <[email protected]>; +Cc: [email protected] On 12/2/24 04:46, Noah Misch wrote: > On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote: >> turaco seems to be having some strange git issues - some of the >> buildfarm runs fail like this: >> >> >> turaco:REL_16_STABLE [22:41:11] OK >> Sun Dec 1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting >> turaco:REL_17_STABLE [22:41:27] checking out source ... >> Missing checked out branch bf_REL_17_STABLE: >> fatal: not a git repository (or any parent up to mount point /mnt) >> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). >> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git >> Sun Dec 1 22:41:33 2024: buildfarm run for turaco:HEAD starting >> turaco:HEAD [22:41:33] checking out source ... >> >> >> I initially suspected this might be due to aging storage (SD card on >> rpi), but I replaced that, and there's nothing strange in dmesg. Also, >> other branches seem to be working fine ... >> >> Any ideas what could be causing this? > > I had this happen ~9 times on the host of my AIX buildfarm members. Example: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28 > > I figured it was some system problem, so I didn't root-cause it. I carry the > following workaround in my fork of the buildfarm client code. The unknown > problem caused failure reports and work stoppage ~4 times before I installed > this workaround, then logs show the workaround prevented damage 5 times. The > last "removed intruder .git" log message appeared on 2024-07-23. There was no > kernel reboot, and logs don't point to buildfarm client processes getting > involuntary termination, either. > Thanks. I suspect some system issue too, but I didn't want to blame the system without some kind of proof. I applied your patch, let's see if that helped after a couple runs. regards -- Tomas Vondra ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-12 15:43 Andrew Dunstan <[email protected]> parent: Noah Misch <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Andrew Dunstan @ 2024-12-12 15:43 UTC (permalink / raw) To: Noah Misch <[email protected]>; Tomas Vondra <[email protected]>; +Cc: [email protected] On 2024-12-01 Su 10:46 PM, Noah Misch wrote: > On Mon, Dec 02, 2024 at 02:20:35AM +0100, Tomas Vondra wrote: >> turaco seems to be having some strange git issues - some of the >> buildfarm runs fail like this: >> >> >> turaco:REL_16_STABLE [22:41:11] OK >> Sun Dec 1 22:41:27 2024: buildfarm run for turaco:REL_17_STABLE starting >> turaco:REL_17_STABLE [22:41:27] checking out source ... >> Missing checked out branch bf_REL_17_STABLE: >> fatal: not a git repository (or any parent up to mount point /mnt) >> Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). >> turaco:REL_17_STABLE [22:41:32] failed at stage pgsql-Git >> Sun Dec 1 22:41:33 2024: buildfarm run for turaco:HEAD starting >> turaco:HEAD [22:41:33] checking out source ... >> >> >> I initially suspected this might be due to aging storage (SD card on >> rpi), but I replaced that, and there's nothing strange in dmesg. Also, >> other branches seem to be working fine ... >> >> Any ideas what could be causing this? > I had this happen ~9 times on the host of my AIX buildfarm members. Example: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2024-07-10%2019%3A51%3A28 > > I figured it was some system problem, so I didn't root-cause it. I carry the > following workaround in my fork of the buildfarm client code. The unknown > problem caused failure reports and work stoppage ~4 times before I installed > this workaround, then logs show the workaround prevented damage 5 times. The > last "removed intruder .git" log message appeared on 2024-07-23. There was no > kernel reboot, and logs don't point to buildfarm client processes getting > involuntary termination, either. > > diff --git a/PGBuild/SCM.pm b/PGBuild/SCM.pm > index dcfd180..2cd610a 100644 > --- a/PGBuild/SCM.pm > +++ b/PGBuild/SCM.pm > @@ -1059,9 +1059,19 @@ sub _update_target > my @gitlog; > > # If a run crashed during copy_source(), repair. > - if (-d "./git-save" && !-d "$target/.git") > + if (-d "./git-save") > { > + # As of 2024-07-13, the following has happened about four times in the > + # last month, to different gcc111 animals. Despite no known crash, > + # there's a git-save directory containing the proper git repo, and > + # there's a bogus .git missing most content. Remove the bogus one. > + # This is deeply hacky, but it beats buildfarm report noise and manual > + # intervention. > + if (rmtree("$target/.git") > 0) { > + print "removed intruder .git\n" if $verbose; > + } > move "./git-save", "$target/.git"; > + print "restored git-save\n" if $verbose; > } > > chdir $target; > > [catching up a huge email backlog] That's kinda weird. The .git directory doesn't get moved at all if you have vpath turned on or you're building with meson (which always does vpath). So that's one possible workaround. I guess I should put something like this in the next release ... will go and do that. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: strange git problems on turaco @ 2024-12-12 15:46 Andrew Dunstan <[email protected]> parent: Tom Lane <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Andrew Dunstan @ 2024-12-12 15:46 UTC (permalink / raw) To: Tom Lane <[email protected]>; Tomas Vondra <[email protected]>; +Cc: [email protected] On 2024-12-01 Su 8:56 PM, Tom Lane wrote: > Tomas Vondra <[email protected]> writes: >> turaco seems to be having some strange git issues - some of the >> buildfarm runs fail like this: > Have you tried rm -rf'ing its git repo and letting the script > check that out from scratch? The fact that it's just the 17 > branch has a whiff of repo corruption. > > Andrew might correct me, but I think you have to remove > both the pgmirror.git directory and the per-branch pgsql > subdirectories to be clean. Don't remove the various > <animal>.* status files. > > In most cases you only have to remove the per-branch pgsql directory. I've only very occasionally seem corruption of the mirror. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2024-12-12 15:46 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-12-02 01:20 strange git problems on turaco Tomas Vondra <[email protected]> 2024-12-02 01:56 ` Tom Lane <[email protected]> 2024-12-02 02:23 ` Tomas Vondra <[email protected]> 2024-12-12 15:46 ` Andrew Dunstan <[email protected]> 2024-12-02 03:46 ` Noah Misch <[email protected]> 2024-12-02 13:51 ` Tomas Vondra <[email protected]> 2024-12-12 15:43 ` Andrew Dunstan <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox