public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chris Hofstaedtler <[email protected]>
To: Thomas Munro <[email protected]>
Cc: [email protected]
Subject: Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free
Date: Tue, 7 Apr 2026 10:53:36 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+hUKG+u6BCEJH27qTJd-SPxtehEpt=RbLc_ON7itYvrfVTc8A@mail.gmail.com>
References: <[email protected]>
	<CA+hUKG+u6BCEJH27qTJd-SPxtehEpt=RbLc_ON7itYvrfVTc8A@mail.gmail.com>

* Thomas Munro <[email protected]> [260328 08:03]:
>On Thu, Feb 26, 2026 at 11:28 AM PG Bug reporting form
><[email protected]> wrote:
>> #0  0x0000564830c3c9d1 in dsa_free (area=0x5648621a9280, dp=<optimized out>)
>> at ./build/../src/backend/utils/mmgr/dsa.c:845
>> 845     ./build/../src/backend/utils/mmgr/dsa.c: No such file or directory.
>> (gdb) bt full
>> #0  0x0000564830c3c9d1 in dsa_free (area=0x5648621a9280, dp=<optimized out>)
>> at ./build/../src/backend/utils/mmgr/dsa.c:845
>>         segment_map = 0x5648621a92b8
>>         pageno = 0
>>         span_pointer = 0
>>         span = <optimized out>
>>         superblock = <optimized out>
>>         object = <optimized out>
>>         size = <optimized out>
>>         size_class = <optimized out>
>> #1  0x0000564830917ae1 in ExecHashTableDetach (hashtable=0x564862147fa0) at
>> ./build/../src/backend/executor/nodeHash.c:3439
>>         i = <optimized out>
>>         pstate = 0x7ef9970b5040
>
>Hmm, it's not supposed to be possible for two backends to reach the
>dsa_free() at nodeHash.c:3439.  But it does smell a bit like that
>that's what happened... hmmm. We were in _RUN phase, and then this
>barrier should only let one process through to _FREE.  I'd be
>interested in the contents of pstate->build_barrier in frame 1 if you
>get another core file with the same stack in it.

Sorry for the lack of followup. Currently we are thinking our 
problem matches the bug described in https://postgr.es/m/[email protected]
("[BUG + PATCH] DSA pagemap out-of-bounds in make_new_segment 
odd-sized path").

>Can you share the
>outline of the query plan?   Does it happen to have a full/right PHJ
>in it?

It appears we forgot to capture more of the logs (before they expired).
Here is a plan from a smaller machine carrying a similar dataset -
unfortunately I cannot guarantee it's the same plan, although it seems close.
Indeed it has a "Parallel Hash Right Join" node.


  Nested Loop  (cost=13030498.63..46923224.94 rows=21144 width=36)
    ->  Nested Loop  (cost=13030498.19..46911677.41 rows=21144 width=44)
          ->  GroupAggregate  (cost=13030497.75..46808339.62 rows=54669 width=2120)
                Group Key: subscriber.id
                Filter: (last_agg(nsr.voicemail_present ORDER BY nsr.sort) = 'true'::text)
                ->  Gather Merge  (cost=13030497.75..19559670.85 rows=53993659 width=23)
                      Workers Planned: 5
                      ->  Sort  (cost=13029497.68..13056494.51 rows=10798732 width=23)
                            Sort Key: subscriber.id, nsr.sort
                            ->  Parallel Hash Left Join  (cost=10633766.78..11767970.44 rows=10798732 width=23)
                                  Hash Cond: (subscriber.id = subscription.subscriber_id)
                                  Filter: ((nsr.postpaid IS NULL) OR (nsr.postpaid = (subscription.id IS NOT NULL)))
                                  ->  Hash Left Join  (cost=10585690.41..11619799.44 rows=11184278 width=24)
                                        Hash Cond: (subscriber.id = subscriber_options.subscriber_id)
                                        Filter: ((nsr.option_ids IS NULL) OR (nsr.option_ids <@ subscriber_options.option_ids))
                                        ->  Parallel Hash Right Join  (cost=9142123.87..10141640.84 rows=13177920 width=49)
                                              Hash Cond: ((optionvalue_lso.subscriber_id = subscriber.id) AND (optionvalue_lso.option_id = nsr.optionvalue_option_id) AND (optionvalue_lso.value = nsr.optionvalue_value))
                                              Filter: ((nsr.optionvalue_option_id IS NULL) OR (optionvalue_lso.id IS NOT NULL))
                                              ->  Parallel Seq Scan on lnk_subscriber_option optionvalue_lso  (cost=0.00..728660.20 rows=1901347 width=21)
                                                    Filter: ((activationdate <= now()) AND (deletiondate > now()))
                                              ->  Parallel Hash  (cost=8769950.27..8769950.27 rows=13177920 width=62)
                                                    ->  Parallel Hash Join  (cost=2888653.49..8769950.27 rows=13177920 width=62)
                                                          Hash Cond: (lnk_subscriber_simcard.subscriber_id = subscriber.id)
                                                          Join Filter: (((nsr.brand_id IS NULL) OR (nsr.brand_id = subscriber.brand_id)) AND ((nsr.registered IS NULL) OR (nsr.registered = subscriber.registered)) AND ((nsr.customer_category_ids IS NULL) OR (basedata.customercategory_id = ANY (nsr.customer_category_ids))) AND ((nsr.subscriberstatus_ids IS NULL) OR (lnk_subscriber_status.subscriberstatus_id = ANY (nsr.subscriberstatus_ids))))
                                                          ->  Nested Loop  (cost=1148075.17..6707980.64 rows=34348223 width=126)
                                                                Join Filter: ((simcard.imsi >= nsr.imsi_start) AND (simcard.imsi <= nsr.imsi_end))
                                                                ->  Parallel Hash Join  (cost=1148075.17..1298129.01 rows=1807801 width=12)
                                                                      Hash Cond: (lnk_subscriber_simcard.simcard_id = simcard.id)
                                                                      ->  Parallel Seq Scan on lnk_subscriber_simcard  (cost=0.00..145308.36 rows=1807801 width=8)
                                                                            Filter: ((activationdate <= now()) AND (deletiondate > now()))
                                                                      ->  Parallel Hash  (cost=1118684.52..1118684.52 rows=2351252 width=12)
                                                                            ->  Parallel Seq Scan on simcard  (cost=0.00..1118684.52 rows=2351252 width=12)
                                                                ->  Materialize  (cost=0.00..7.56 rows=171 width=138)
                                                                      ->  Seq Scan on network_settings_rule nsr  (cost=0.00..6.71 rows=171 width=138)
                                                          ->  Parallel Hash  (cost=1717950.74..1717950.74 rows=1810207 width=21)
                                                                ->  Parallel Hash Left Join  (cost=778201.62..1717950.74 rows=1810207 width=21)
                                                                      Hash Cond: (subscriber.id = lnk_subscriber_basedata.subscriber_id)
                                                                      ->  Parallel Hash Join  (cost=342951.95..1270264.89 rows=1810207 width=17)
                                                                            Hash Cond: (lnk_subscriber_status.subscriber_id = subscriber.id)
                                                                            ->  Parallel Bitmap Heap Scan on lnk_subscriber_status  (cost=102927.10..1025488.25 rows=1810207 width=8)
                                                                                  Recheck Cond: (deletiondate > now())
                                                                                  Filter: (activationdate <= now())
                                                                                  ->  Bitmap Index Scan on idx_subscriber_status_deletiondate  (cost=0.00..100664.35 rows=9051037 width=0)
                                                                                        Index Cond: (deletiondate > now())
                                                                            ->  Parallel Hash  (cost=216653.82..216653.82 rows=1869682 width=9)
                                                                                  ->  Parallel Seq Scan on subscriber  (cost=0.00..216653.82 rows=1869682 width=9)
                                                                      ->  Parallel Hash  (cost=413397.43..413397.43 rows=1748179 width=12)
                                                                            ->  Parallel Hash Left Join  (cost=228746.63..413397.43 rows=1748179 width=12)
                                                                                  Hash Cond: (lnk_subscriber_basedata.basedata_id = basedata.id)
                                                                                  ->  Parallel Seq Scan on lnk_subscriber_basedata  (cost=0.00..180061.84 rows=1748179 width=12)
                                                                                        Filter: ((now() >= activationdate) AND (now() < deletiondate))
                                                                                  ->  Parallel Hash  (cost=201487.39..201487.39 rows=2180739 width=8)
                                                                                        ->  Parallel Seq Scan on basedata  (cost=0.00..201487.39 rows=2180739 width=8)
                                        ->  Hash  (cost=1434914.41..1434914.41 rows=692170 width=36)
                                              ->  Subquery Scan on subscriber_options  (cost=1345069.22..1434914.41 rows=692170 width=36)
                                                    ->  HashAggregate  (cost=1345069.22..1427992.71 rows=692170 width=36)
                                                          Group Key: lnk_subscriber_option.subscriber_id
                                                          Planned Partitions: 8
                                                          ->  Index Scan using idx_option_subscriber_combined on lnk_subscriber_option  (cost=0.57..1033129.48 rows=9506735 width=8)
                                                                Index Cond: ((activationdate <= now()) AND (deletiondate > now()))
                                  ->  Parallel Hash  (cost=46189.54..46189.54 rows=150946 width=8)
                                        ->  Parallel Hash Join  (cost=14649.64..46189.54 rows=150946 width=8)
                                              Hash Cond: (lnk_subscription_status.subscription_id = subscription.id)
                                              ->  Hash Join  (cost=1.27..31144.93 rows=150946 width=4)
                                                    Hash Cond: (lnk_subscription_status.subscriptionstatus_id = subscriptionstatus.id)
                                                    ->  Parallel Seq Scan on lnk_subscription_status  (cost=0.00..30421.01 rows=211324 width=8)
                                                          Filter: ((activationdate <= now()) AND (deletiondate > now()))
                                                    ->  Hash  (cost=1.14..1.14 rows=10 width=4)
                                                          ->  Seq Scan on subscriptionstatus  (cost=0.00..1.14 rows=10 width=4)
                                                                Filter: activestate
                                              ->  Parallel Hash  (cost=11245.39..11245.39 rows=272239 width=8)
                                                    ->  Parallel Seq Scan on subscription  (cost=0.00..11245.39 rows=272239 width=8)
          ->  Index Scan using idx_lnk_subscriber_msisdn_subscriber_deletiondate on lnk_subscriber_msisdn  (cost=0.44..1.87 rows=1 width=8)
                Index Cond: ((subscriber_id = subscriber.id) AND (deletiondate > now()))
                Filter: (activationdate <= now())
    ->  Index Scan using pkey_msisdn on msisdn  (cost=0.43..0.53 rows=1 width=20)
          Index Cond: (id = lnk_subscriber_msisdn.msisdn_id)


For now our plan is to wait for the next 18.x stable release, and then reenable
enable_parallel_hash. Then see if it occours again.

As a side-note, my colleagues noted that the problematic query has 
drastic better timings with enable_parallel_hash = off. Something on our list
for future exploration.

Best,
Chris







reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: BUG #19416: Backend SIGSEGV in ExecShutdownHashJoin/ExecHashTableDetach/dsa_free
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox