Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2ajJ-000QQE-13 for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 20:09:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2ajH-004j61-2x for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 20:09:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2ajH-004j5t-24 for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 20:09:27 +0000 Received: from uucp.dinoex.org ([2a0b:f840::12]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w2ajE-00000000fB4-3t9B for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 20:09:27 +0000 Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.18.2/8.18.2) with ESMTPS id 62HK96np068608 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 17 Mar 2026 21:09:06 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) ARC-Seal: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1773778149; cv=none; b=luNM+3xFU84UM000Xb3tciFgu2I+/D9atA1IMKb1IbL9dlFX9MVIegqpuUnQBPsYBiLdiiCKWAR0gjaZ81YMDsDHWse3gK9ju80B7LZAVt9uFvT5uZX6PZgxkrEu2R4w2YSpySGgaSK3PXe7gGvxLb1iDLQfiZoiTI2j2dibZw0= ARC-Message-Signature: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1773778149; c=relaxed/simple; bh=tM2hoKbkmgUaQaZ9HyNOyfgpOp57uPuYxJHej7WAMEg=; h=Received:Received:Received:Received:X-Authentication-Warning:Date: From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:X-Milter:X-Greylist; b=QwlmkDF6+qoBg+CslW4BrTGGqt6HAg064Tr5bVHTLJ0OIM35EEix/HNwcQeKMN0KQEOnx2enwWNSQpmJgVE0odDb0nCdsuZUj/UAgJp6hpsFlIlj9KEdDO8my4j+rFlVoBpVhdsVNdQj7Y3qTmMQ7SFR5LMFF677wq9ERNfrlJM= ARC-Authentication-Results: i=1; uucp.dinoex.org Received: (from uucp@localhost) by uucp.dinoex.org (8.18.2/8.18.2/Submit) with UUCP id 62HK96a9068607; Tue, 17 Mar 2026 21:09:06 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 62HK4MTA007665 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK); Tue, 17 Mar 2026 21:04:22 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 62HK1ZnV057368 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 17 Mar 2026 21:01:36 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.18.1/8.18.1/Submit) id 62HK1ZlX057367; Tue, 17 Mar 2026 21:01:35 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Tue, 17 Mar 2026 21:01:35 +0100 From: "Peter 'PMc' Much" To: Tom Lane Cc: Tomas Vondra , pgsql-hackers@lists.postgresql.org Subject: Re: Need help debugging SIGBUS crashes Message-ID: References: <33d99d2f-4020-4615-9314-2f1a19927fa6@vondra.me> <392255.1773756727@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <392255.1773756727@sss.pgh.pa.us> X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Tue, 17 Mar 2026 21:09:09 +0100 (CET) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 17, 2026 at 10:12:07AM -0400, Tom Lane wrote: ! Tomas Vondra writes: ! > On 3/17/26 13:17, Peter 'PMc' Much wrote: ! >> So I am now quite clueless on how to proceed further, and could ! >> really use some educated inspiration. I can not even say if this is ! >> a postgres issue or a FreeBSD issue (but it doesn't happen to any ! >> other program). ! ! > I agree it's hard to deduce anything from the backtraces with the ! > interesting bits optimized out. Rebuilding the OS with -O0 might be an ! > overkill, I'd probably start by building just Postgres. That'd at least ! > give us some idea what happens there, you could inspect the memory ! > context etc. ! ! What I'm seeing is that malloc's internal data structures are already ! corrupt during startup of an autovacuum worker. I think the most ! likely theory is that this somehow traces to our old habit of ! launching postmaster child processes from a signal handler, something ! that violates the spirit and probably the letter of POSIX, and which ! we can clearly see was being done here. But we got rid of that in PG ! v16, so if I were Peter my first move would be to upgrade to something ! later than 15.x. I was considering, if there is an issue inside FreeBSD (which it somehow looks like), then I want it hunted down as such, rather than having it possibly covered up by using a newer version that might do things differently. Now, what I understand here is: A) I can stop searching for who is creating the SIGUSR1 signals, because these are created inside of PG Rel. 15. B) there is a potential issue in doing fork() within a sighandler, and then continuing to do malloc() in that new process, therefore this practice has been abandoned from PG Rel.16 onwards. In that case there is indeed good reason to upgrade. The one thing I don't get is then: as this has apparently nothing to do with any special configurations on my site, but is a genuine issue, then why does it happen now to me (and didn't blow up elsewhere already some ten years ago)? ! Why it was okay in older FreeBSD and not so much in v14, who knows? Maybe it wasn't. Here it appeared out of thin air in February, while the system was upgraded from 13.5 to 14.3 in July'25, and did run without problems for these eight months. So this is not directly or solely related to FBSD R.14, and while it happens more likely during massive memory use, but this also is not stingent. Neither did I find any other solid determining condition. So yes, if there is reason to believe the annoyance might just disappear in PG-16, then that is likely the most viable strategy. Thanks a lot for all inspiration! :) PMc