Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rlCVW-000jJ4-Bq for buildfarm-members@arkaria.postgresql.org; Fri, 15 Mar 2024 18:42:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rlCVU-00BWlN-VN for buildfarm-members@arkaria.postgresql.org; Fri, 15 Mar 2024 18:42:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rlCVU-00BWlE-PS for buildfarm-members@lists.postgresql.org; Fri, 15 Mar 2024 18:42:17 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rlCVR-004goM-Oo for buildfarm-members@lists.postgresql.org; Fri, 15 Mar 2024 18:42:16 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 42FIgE8s2530380 for ; Fri, 15 Mar 2024 14:42:14 -0400 From: Tom Lane To: buildfarm-members@lists.postgresql.org Subject: Possible infinite loop on buildfarm animals MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2530378.1710528134.1@sss.pgh.pa.us> Date: Fri, 15 Mar 2024 14:42:14 -0400 Message-ID: <2530379.1710528134@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Between approximately 11:05 UTC and 13:25 UTC today (15 Mar 2024), the Postgres git repo contained a buggy test recipe that caused an infinite loop that will eventually exhaust disk space. If you have any animals that might have launched a test run on HEAD in that interval, you might want to check up on them. A manual kill of the process that's consuming 100% CPU should be enough to get out of it. Another idea to consider is to set the wait_timeout parameter in your animals' configuration files, to put an upper bound on the total elapsed time for a run. By default that's infinite, since it's really hard to select a one-size-fits-all value ... but it's a good backstop if you don't mind picking machine-specific limits. regards, tom lane