public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tomas Vondra <[email protected]>
To: Chengpeng Yan <[email protected]>
Cc: [email protected] <[email protected]>
Cc: John Naylor <[email protected]>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: Wed, 10 Dec 2025 00:30:47 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>



On 12/9/25 20:20, Tomas Vondra wrote:
> On 12/2/25 14:04, Chengpeng Yan wrote:
>> Hi,
>>
>>
>>
>>> On Dec 2, 2025, at 18:56, Tomas Vondra <[email protected]> wrote:
>>>
>>> I think a much broader evaluation will be needed, comparing not just the
>>> planning time, but also the quality of the final plan. Which for the
>>> starjoin tests does not really matter, as the plans are all equal in
>>> this regard.
>>
>>
>> Many thanks for your feedback. 
>>
>> You are absolutely right — plan quality is also very important. In my
>> initial email I only showed the improvements in planning time, but did
>> not provide results regarding plan quality. I will run tests on more
>> complex join scenarios, evaluating both planning time and plan quality.
>>
> 
> I was trying to do some simple experiments by comparing plans for TPC-DS
> queries, but unfortunately I get a lot of crashes with the patch. All
> the backtraces look very similar - see the attached example. The root
> cause seems to be that sort_inner_and_outer() sees
> 
>     inner_path = NULL
> 
> I haven't investigated this very much, but I suppose the GOO code should
> be calling set_cheapest() from somewhere.
> 

FWIW after looking at the failing queries for a bit, and a bit of
tweaking, it seems the issue is about aggregates in the select list. For
example this TPC-DS query fails (Q7):

 select  i_item_id,
        avg(ss_quantity) agg1,
        avg(ss_list_price) agg2,
        avg(ss_coupon_amt) agg3,
        avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and
       ss_cdemo_sk = cd_demo_sk and
       ss_promo_sk = p_promo_sk and
       cd_gender = 'F' and
       cd_marital_status = 'W' and
       cd_education_status = 'Primary' and
       (p_channel_email = 'N' or p_channel_event = 'N') and
       d_year = 1998
 group by i_item_id
 order by i_item_id
 LIMIT 100;

but if I remove the aggregates, it plans just fine:

 select i_item_id
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and
       ss_cdemo_sk = cd_demo_sk and
       ss_promo_sk = p_promo_sk and
       cd_gender = 'F' and
       cd_marital_status = 'W' and
       cd_education_status = 'Primary' and
       (p_channel_email = 'N' or p_channel_event = 'N') and
       d_year = 1998
 group by i_item_id
 order by i_item_id
 LIMIT 100;

The backtrace matches the one I already posted, I'm not going to post
that again.

I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.


regards

-- 
Tomas Vondra






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Add a greedy join search algorithm to handle large join problems
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox