Re: Add a greedy join search algorithm to handle large join problems

public inbox for [email protected]  
help / color / mirror / Atom feed

From: John Naylor <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Chengpeng Yan <[email protected]>
Cc: lakshmi <[email protected]>
Cc: Pavel Stehule <[email protected]>
Cc: [email protected] <[email protected]>
Cc: Robert Haas <[email protected]>
Subject: Re: Add a greedy join search algorithm to handle large join problems
Date: Thu, 14 May 2026 12:59:55 +0700
Message-ID: <CANWCAZZkHwULLrZhyXbsyi5adXjwDTKFX0PZ9Z5pep1zh2=gAw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CANWCAZY529EPHyo1kLnEzjFBq-UaDPc3KErK=ApqDZZ1Oc-XHg@mail.gmail.com>
	<CAFj8pRCO5ocbr-wFWx5QsKdfkW-=XuQ6zkW5FES7ERQZQHtpwQ@mail.gmail.com>
	<[email protected]>
	<CAFj8pRASJuRQKHOoBTnR5aRUeRKpNAmrYQcBrQb=yqeZ_8me9Q@mail.gmail.com>
	<CAEvyyTi1M6JhHb6sR+xK-kp2bezMoADSC+RY2A+DbdEn+_BLxA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CANWCAZbbTazxGeMU=qdyi1kBr_Nkjv1n6vZR-hW30QbqVqkx1Q@mail.gmail.com>
	<[email protected]>
	<CANWCAZaetVkaZnR_fxw1DAUrSJ+sQ1PT6UFdwHarHUNNPzWueg@mail.gmail.com>
	<[email protected]>
	<CANWCAZZqm9wcn_W0S=V_WSt2h6zGUSQTfz3nFOUoj97=r5z_9A@mail.gmail.com>
	<[email protected]>

On Wed, May 13, 2026 at 6:54 PM Tomas Vondra <[email protected]> wrote:
>
> On 5/13/26 06:34, John Naylor wrote:
> > On Tue, May 12, 2026 at 8:42 PM Tomas Vondra <[email protected]> wrote:

> >> To got to the GEQO/GOO code, people have to adjust the limits, so that
> >> (join_collapse_limit >= geqo_threshold). AFAIK almost no one does that,
> >> so most join problems are smaller that geqo_threshold and so handled by
> >> regular DP (for smaller subproblems).
> >>
> >> But let's say someone adjusts the GUCs, gets to GOO, and it handles a
> >> prefix of the problem using the DP approach. How is that different from
> >> keeping the (join_collapse_limit < geqo_threshold) and not even getting
> >> to GOO? Why not to just leave join_collapse_limit to a low value?
> >
> > One flavor of the second idea above (found in the literature) is to
> > start with DP, and after some new GUC limit (under the hood: # of
> > times calling make_join_rel), pick one of the incomplete subproblems
> > and finish with a heuristic search. It switches strategy on the fly,
> > rather than choosing upfront. That's the difference from now, although
> > I'm not sure offhand how join_collase_limit chooses its subproblems
> > out of the bigger problem.
> >
>
> Not sure, for two reasons.
>
> How often do we expect this to be used? AFAIK GEQO is not used very
> often in practice, because geqo_threshold is so much higher than
> join_collapse_limit. So even huge joins get divided into problems
> handled by DP. We might invest a lot of time and complexity into
> something that gets used extremely rarely ...

That's a risk to keep in mind.

In cases where planning time is a substantial part of the runtime
(like in your thread on OLTP star joins), a user could lower the
threshold for greedy search, since it's fast. That's not an option
currently because GEQO is slow.

With a dynamic strategy, I imagine join_collapse_limit would mean
something different (one technique in the literature is to refine the
greedy output on subproblems of size 4-7), or be replaced with a
different tuning knob.

Anyway, these questions reinforce that this is potential follow-up work.

> Also, couldn't this easily lead to unpredictable planning behavior? Say
> the budget relies on counting "make_join_rel" calls, or something like
> that. Then someone adds or removes an index on one of the tables, and we
> suddenly start to consider fewer/more candidate orders. Or maybe someone
> refactors the code a little bit, moving the calls. And suddenly, we hit
> the threshold at a different place. But maybe it's OK for a heuristics?

That's a good question.

--
John Naylor
Amazon Web Services

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Add a greedy join search algorithm to handle large join problems
  In-Reply-To: <CANWCAZZkHwULLrZhyXbsyi5adXjwDTKFX0PZ9Z5pep1zh2=gAw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox