Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2Uom-000LGz-2a for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 13:50:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2Uol-001eNB-1v for pgsql-hackers@arkaria.postgresql.org; Tue, 17 Mar 2026 13:50:43 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2Uol-001eN1-0y for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 13:50:43 +0000 Received: from mail-lf1-x134.google.com ([2a00:1450:4864:20::134]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2Uog-00000000Bxw-1S9j for pgsql-hackers@lists.postgresql.org; Tue, 17 Mar 2026 13:50:40 +0000 Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-5a278b4c1a5so132877e87.3 for ; Tue, 17 Mar 2026 06:50:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773755437; cv=none; d=google.com; s=arc-20240605; b=NajahF1PfaWO2lpyex8upaXEP0l1riR4hp+vcESMoP/UOr8ONIiuZ5LW4hkS+fiDXE eChjRnQukaIkNOuWjxnSEYm+PntrtPOyi0/S8zOfy0iHIqtnx5GOXJfNLZegM+T0FONg O/j35sUOt19IyfkPjdlLumzWztT+1An0tq9HCEDQQSpa6LdBCoM9LcQ3fdOy29DaBAuG aWd54yvyYg4g6IWiJdPMP59DeCaJxzFlPJLoRLzAR2+4obLob/pKBQQLCRY7jd2KyRll lX/7xYOirOiA7hheCTmEu7llQHvpbiMyvrNnBI6Or70YaIXjMVITxYx2neELtNQ7MKTw S0Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WX5D21CIWB0QVMuo6oGQFVukHMTwJVF14pF1NMVNnE8=; fh=HaydZrdEn+lH7msqgrinWZabdX25dUA4nCqDqoaMPYg=; b=YowuFS8F72FV/Vv65k5+Lh6hDE7qShtlyQN07q1xk9ILj3e3Uby3eDwqihxRRWvXyD EeE8JngkkEeBf6vy5gGx0rUisyEpMwZo3QLJVBMi+04Y5oEIQnjhdXZSBdjlGXKgw0JT rMJd5CGnzG4pXkrlfpRUZp+907qEnIzphfvhlwBnqVGKEmlKmjbsuQC7Q7S7TLFweK0k b2Pti3OuTfMLf84xRLNWmTlv0qqKIVMwOYI4OiFjRxj5ESV1ej/UjSwhgS7xBb+p0Co1 YXpWrFL67t54C6vsAH0O4r8joBOs+pVyNh9ps1eMc2xBqYNjRVuY0BsX9efdD8MElI2d fn6Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1773755437; x=1774360237; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WX5D21CIWB0QVMuo6oGQFVukHMTwJVF14pF1NMVNnE8=; b=eUPGlWSaWHq8ro3w67fx038Num3drlhuncje5fZSj9nIn6/b7R87fkkRkiD51xKqqa S6kORjK088pNXkHYaICERvsn84gY8MYTMAo66yHhxP/sfACORFB2j9pOf0TjZwdSpdUm cBb7PS1bpzgfhANZZ/tYgpi12GBnLklly2daAIaXewJzPjzLj+EL/Rg+GHnZA8rvlyVf b1KLl3jnUoXJoIQtIL8Svj4+Jg55zZ7ayuEVxrFf7FNj17L+7RZ5buCnV3NAV8/xDMCH jFXhmRP08ztR4bnaT4vgQz7rlmwNS2lOB5QZCL2FFdwSpReiiK0qv8AWN4IdAZApGGn9 QSjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773755437; x=1774360237; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WX5D21CIWB0QVMuo6oGQFVukHMTwJVF14pF1NMVNnE8=; b=mqdHPM7UZbVrhZrp52KPe6PxHEkipR9/9Y3uqoZNYwROeR/7BaI6gxKYke93B/dAC/ MzT2MgKBN6rA4mPph1H6RP4w1kQufkx7qvcRqOw1j/nzvN6ig3T19f602fEmHBNxqZzP Rx7yUSDDWo6tgQfbMnd9SfuiPnACdUWAs0cjFwYVW4YSWayiLo8kQDiqLPHI6Z/X3Oua irdBQiRIAV3WSDFA9aEitSa9H2hgjQsnIOSXT8j/bap8R9OZly4SMrA5BBbkvhyKfXwc qjFPKgCMLOChy6yFjJSKgUHoco9vM6TacriDWxA6izMpSNyu6a5GQKQ0EnukbF+TBAzv 1Sbw== X-Gm-Message-State: AOJu0YwGEPZgW8eLYo2pTbTGUArx4OrsMusmMLVVgekxlf6T27lzOFe3 TC1xF8Ls0Ey/DNSt88ulxuOT0IW41hbS6Run5xJMfZuvU/Iuvwrr1D8XDOPskOxVfK4kgjb3LEk 66vfDmsvYX/vAS+r5Vg0H1o+RKftJkMjHh/GjXV7PUsaZo3wTpx3O8A== X-Gm-Gg: ATEYQzzuRhoMAa8t6tqxrYDUBsfta6V2vb0xEaWFDzmkiH44sbCmHcaWg18HF+Azu3P rVk14lQa3+W6N+OrrzrrvXMfMx+pI25Wis8AwCq9yG43uASiOpkLGU0+Wqf88Eq8w7JjTjtsagm PwnvQjl2N9Up5sx/3GRKQgHlzL+z2LkIN42cKnkcCr/WwUVDU3M1+Wmljvkf3TzxLYh7amc/yoz 3TZByV2t4LDnVyVfCtStVHxzrKF8iWf/S4yyvRMqQuUj529QXh5niFNlud8vok4bjp2r5bBE4Qw HtLoKYOaKKpAWZxoK0npI+lwMsEw/isFXi5l3jTm54z/hsda0DOJvNFlvwTSDpGREak= X-Received: by 2002:a05:6512:64e2:b0:5a1:381b:fae1 with SMTP id 2adb3069b0e04-5a1626fb7d3mr4385392e87.10.1773755436514; Tue, 17 Mar 2026 06:50:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jakub Wartak Date: Tue, 17 Mar 2026 14:50:25 +0100 X-Gm-Features: AaiRm51fR2pdQ8q_KGOzR27FmU3H_6vk8DHdvP2Mqerzon6Xi4ARVRAUJzQi5YA Message-ID: Subject: Re: Need help debugging SIGBUS crashes To: "Peter 'PMc' Much" Cc: pgsql-hackers@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Tue, Mar 17, 2026 at 1:27=E2=80=AFPM Peter 'PMc' Much wrote: > > Hello, > please excuse I am writing here, I wrote earlier to the users list > but got no answer. > > I am observing repeated SIGBUS crashes of the postgres backend binary > on FreeBSD, starting at Feb 2, every couple of weeks. > The postgres is 15.15, the FreeBSD Release was 14.3, the crashes > happen in malloc(). > > The crashes happened on different PG clusters (running off the same > binaries), so they cannot be pinpointed to a specific application. > > After following a few red herrings, I figured that I had patched > into the NUMA allocation policy in the kernel at Dec 18, so I > obviousley thought this being the actual cause for the crashes. But > apparently it isn't. I removed the patches that would relate to > malloc() (and left only those relating to ZFS) - and after some > days got another crash. > > So, yesterday I upgraded to FreeBSD 14.4, removed all my patches > for NUMA, and in addition disabled NUMA entirely with > vm.numa.disabled=3D1 > and added debugging info for libc. I intended to also add debugging > to postgres - but tonight I already got another crash: the problem > is apparently not related to NUMA. [..] > frame #6: 0x0000000829687afd libc.so.7`__je_arena_extent_alloc_large(= tsdn=3D, arena=3D0x00003e616aa00980, usize=3D32768, alignment= =3D, zero=3D0x0000000820c5bedf) at jemalloc_arena.c:448:12 > frame #7: 0x00000008296afca0 libc.so.7`__je_large_palloc(tsdn=3D0x000= 03e616a889090, arena=3D, usize=3D, alignment=3D64= , zero=3D) at jemalloc_large.c:47:43 > frame #8: 0x00000008296afb02 libc.so.7`__je_large_malloc(tsdn=3D, arena=3D, usize=3D, zero=3D) at jemalloc_large.c:17:9 [artificial] [..] Not an answer from a regular FreeBSD guy, but more questions: So have you removed those ZFS patches or not? (You said You reverted only NUMA ones)? Maybe those ZFS patches they corrupt some memory and jemalloc j= ust hits those regions? I would revert the kernel to stock thing as nobody woul= d be able to tell otherwise what's happening there :) Are You using hugepages? The jemalloc stack also contains "_large_" so can = we assume jemalloc is using hugepages ? I don't know if that might help, but last time I hunted down SIGBUS [0] it = was due to our incorrect patches (causing NUMA hugepages imbalances across node= s; our patch has some pause there, but what I did to track it down was to stack trace to Linux's kernel do_sigbus() routine via eBPF). Possibly You could hijack/ detect some traps and/or hijack some routines using DTrace that's in FreeBS= D and that would get some hints? -J. [0] - https://www.postgresql.org/message-id/CAKZiRmww2P6QAzu6W%2BvxB89i5Ha-= YRSHMeyr6ax2Lymcu3LUcw%40mail.gmail.com