Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2WUJ-000MlH-2m for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 15:37:43 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2WUI-002mcb-2j for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 15:37:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2WT8-002k3B-2y for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 15:36:30 +0000 Received: from uucp.dinoex.org ([2a0b:f840::12]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w2WT6-00000000ctg-0aTZ for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 15:36:30 +0000 Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.18.2/8.18.2) with ESMTPS id 62HFaAH1086697 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 17 Mar 2026 16:36:11 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) ARC-Seal: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1773761773; cv=none; b=gaYnqKKdNJewKccrLpZWgcyu5LJVOMv1LMG9yia3A6luAynj0je2YcDKtDq7jAHl8UwCb4qKlm1hmCRV8W9XIok4KfUaB7ETcIzT5g4riH+sHU46w3JvsrNJKdtJRNc7cFxsbaGZWWTg1CQVEWOS/zpE/UXSp60ieJueEWVBhVs= ARC-Message-Signature: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1773761773; c=relaxed/simple; bh=JPlbfh7iWlEXE5M82FJvx5wnwLZ0F3ahbdservFUVbQ=; h=Received:Received:Received:Received:X-Authentication-Warning:Date: From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:X-Milter:X-Greylist; b=Exge+IH/KdnRKeX0eH3Z8RiZ6apjZRXGJPuCpntJ1PNRmf/kxCiEsmMYQgs+a+QghTas7m639sH1Pup9X8daM7A/QUpCzwLT8FwFdRgXteqANNlJlh6NqAZmcqYNT6XB4DEqDspEvcW9xwMbvV3BlG820yuIjgGGhyc3WtBxe70= ARC-Authentication-Results: i=1; uucp.dinoex.org Received: (from uucp@localhost) by uucp.dinoex.org (8.18.2/8.18.2/Submit) with UUCP id 62HFaATD086696; Tue, 17 Mar 2026 16:36:10 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 62HFVLBQ044842 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK); Tue, 17 Mar 2026 16:31:22 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 62HFTEbI055774 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Tue, 17 Mar 2026 16:29:14 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.18.1/8.18.1/Submit) id 62HFTEJT055773; Tue, 17 Mar 2026 16:29:14 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Tue, 17 Mar 2026 16:29:14 +0100 From: "Peter 'PMc' Much" To: Jakub Wartak Cc: pgsql-hackers@lists.postgresql.org Subject: Re: Need help debugging SIGBUS crashes Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Tue, 17 Mar 2026 16:36:13 +0100 (CET) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Mar 17, 2026 at 02:50:25PM +0100, Jakub Wartak wrote: ! ! Not an answer from a regular FreeBSD guy, but more questions: ! ! So have you removed those ZFS patches or not? (You said You reverted only ! NUMA ones)? They are completely removed now. ! Maybe those ZFS patches they corrupt some memory and jemalloc just ! hits those regions? I would revert the kernel to stock thing Yes, I would, too, but I can't. There are patches for kerberos (FreeBSD 14 still uses that very old Heimdal implementation, that is why I am kind of stuck with PG 15, and upgrading that one will be a bit of work), there are patches to make IPv6 fragmentation work with the firewalls - in short, removing all of the patches will make the SSO and networking fall apart entirely, and make the site nonfunctional. OTOH this crash seems to prefer happening in production. Last night when it happened, the machine was busy rebuilding the OS etc. for other nodes to upgrade to 14.4, and then I got bored and additionally did run an LLM for entertainment. So the server had some 25 GB paged out, when the nightly housekeeping started to push daily log data into the databases - which then led to the crash. That means, A) I have no good idea how to properly reproduce such conditions in a test scenario, and B) it is not impossible that there is a bug (somewhere), that just doesn't usually happen to orderly people who run their databases in rather overprovisioned conditions. ! Are You using hugepages? The jemalloc stack also contains "_large_" so can we ! assume jemalloc is using hugepages ? I think I remember I once tried to, but hugepages with postgres do not work on FreeBSD. The docs also say: "this setting is supported only on Linux and Windows." ! I don't know if that might help, but last time I hunted down SIGBUS [0] it was ! due to our incorrect patches (causing NUMA hugepages imbalances across nodes; ! our patch has some pause there, but what I did to track it down was to ! stack trace ! to Linux's kernel do_sigbus() routine via eBPF). Possibly You could hijack/ ! detect some traps and/or hijack some routines using DTrace that's in FreeBSD and ! that would get some hints? Thank You, currently everything helps. :) DTrace is super cool, but then it also needs to understand the code first before getting useful insight from it. So any approach will imply a bunch of work, and I am currently looking for the shortest path to an unknown target. ;) PMc