Re: Need help debugging SIGBUS crashes

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Jakub Wartak <[email protected]>
To: Peter 'PMc' Much <[email protected]>
Cc: [email protected]
Subject: Re: Need help debugging SIGBUS crashes
Date: Tue, 17 Mar 2026 14:50:25 +0100
Message-ID: <CAKZiRmyQz+jZWLC4GbyuCa6cjurS0nECgFbYVyjgxB3Hgo+VnQ@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Hi,

On Tue, Mar 17, 2026 at 1:27 PM Peter 'PMc' Much
<[email protected]> wrote:
>
> Hello,
>   please excuse I am writing here, I wrote earlier to the users list
> but got no answer.
>
> I am observing repeated SIGBUS crashes of the postgres backend binary
> on FreeBSD, starting at Feb 2, every couple of weeks.
> The postgres is 15.15, the FreeBSD Release was 14.3, the crashes
> happen in malloc().
>
> The crashes happened on different PG clusters (running off the same
> binaries), so they cannot be pinpointed to a specific application.
>
> After following a few red herrings, I figured that I had patched
> into the NUMA allocation policy in the kernel at Dec 18, so I
> obviousley thought this being the actual cause for the crashes. But
> apparently it isn't. I removed the patches that would relate to
> malloc() (and left only those relating to ZFS) - and after some
> days got another crash.
>
> So, yesterday I upgraded to FreeBSD 14.4, removed all my patches
> for NUMA, and in addition disabled NUMA entirely with
>    vm.numa.disabled=1
> and added debugging info for libc. I intended to also add debugging
> to postgres - but tonight I already got another crash: the problem
> is apparently not related to NUMA.
[..]

>     frame #6: 0x0000000829687afd libc.so.7`__je_arena_extent_alloc_large(tsdn=<unavailable>, arena=0x00003e616aa00980, usize=32768, alignment=<unavailable>, zero=0x0000000820c5bedf) at jemalloc_arena.c:448:12
>     frame #7: 0x00000008296afca0 libc.so.7`__je_large_palloc(tsdn=0x00003e616a889090, arena=<unavailable>, usize=<unavailable>, alignment=64, zero=<unavailable>) at jemalloc_large.c:47:43
>     frame #8: 0x00000008296afb02 libc.so.7`__je_large_malloc(tsdn=<unavailable>, arena=<unavailable>, usize=<unavailable>, zero=<unavailable>) at jemalloc_large.c:17:9 [artificial]
[..]

Not an answer from a regular FreeBSD guy, but more questions:

So have you removed those ZFS patches or not? (You said You reverted only
NUMA ones)? Maybe those ZFS patches they corrupt some memory and jemalloc just
hits those regions? I would revert the kernel to stock thing as nobody would
be able to tell otherwise what's happening there :)

Are You using hugepages? The jemalloc stack also contains "_large_" so can we
assume jemalloc is using hugepages ?

I don't know if that might help, but last time I hunted down SIGBUS [0] it was
due to our incorrect patches (causing NUMA hugepages imbalances across nodes;
our patch has some pause there, but what I did to track it down was to
stack trace
to Linux's kernel do_sigbus() routine via eBPF). Possibly You could hijack/
detect some traps and/or hijack some routines using DTrace that's in FreeBSD and
that would get some hints?

-J.

[0] - https://www.postgresql.org/message-id/CAKZiRmww2P6QAzu6W%2BvxB89i5Ha-YRSHMeyr6ax2Lymcu3LUcw%40mail.g...

view thread (9+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Need help debugging SIGBUS crashes
  In-Reply-To: <CAKZiRmyQz+jZWLC4GbyuCa6cjurS0nECgFbYVyjgxB3Hgo+VnQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox