public inbox for [email protected]  
help / color / mirror / Atom feed
From: =?utf-8?B?Y2NhNTUwNw==?= <[email protected]>
To: =?utf-8?B?Sm9obiBOYXlsb3I=?= <[email protected]>
Cc: =?utf-8?B?U2FtaSBJbXNlaWg=?= <[email protected]>
Cc: =?utf-8?B?SGVpa2tpIExpbm5ha2FuZ2Fz?= <[email protected]>
Cc: =?utf-8?B?cGdzcWwtaGFja2Vycw==?= <[email protected]>
Subject: Re: Support loser tree for k-way merge
Date: Mon, 8 Dec 2025 14:46:40 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CANWCAZZ6KpsL+oEJFyQxqMMNv7FHRxPqUGafn9sqJzg_BmH5Qg@mail.gmail.com>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAA5RZ0u=WDa-v+8Y_yE4WeRS8VtmcmQGuWVjZFCqZcXx-hEP3w@mail.gmail.com>
	<CANWCAZYKgc4xFbBXtePw_XY+a+Ao_-CxZ+Z8YWSrK2B1HqtdWg@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CANWCAZZ6KpsL+oEJFyQxqMMNv7FHRxPqUGafn9sqJzg_BmH5Qg@mail.gmail.com>

> I summarized the number of comparisons needed for different 'k':
>
> k = 2, heap: 1, loser tree: 1
> k = 3, heap: 2, loser tree: [1, 2]
> k = 4, heap: [2, 3], loser tree: 2
> k = 5, heap: [2, 4], loser tree: [2, 3]
>
> So if k < 5, the loser tree is never worse than the heap for any input data.

> Please explain your notation. For starters, does "comparison" refer to
> sortkey comparison or does it include checking for sentinel?

The "k" is "k-way merge", and the "comparisons" are the number of tuple comparisons
during one heap's sift down or loser tree's adjustment.

> If loser tree can't early return, why is the number not always a constant?

Because the leaf nodes are in the bottom two levels of the loser tree, and different levels
have different number of comparisons.

> If "k" is very small, I'm guessing the merge step is small compared to
> sorting the individual runs, in which case it matters less which one
> to use. That's just a guess, though -- we need structured testing.

Yeah, the loser tree at most reduce one tuple comparison in each adjustment if k < 5.

> If you, the patch author, cannot demonstrate how to choose this
> setting, what makes you think our users can? (That said, a temporary
> GUC is useful for testing)

Yeah, users may have no ideas about the characteristics of their data, so it's hard for them
to choose the setting.

> Here's a half-baked idea: If the regressions are mostly in
> low-cardinality inputs, is it possible to add a fast path that just
> checks if the current key is the same as the last one?

For heap, it reduces one tuple comparison if the keys are same and increase one if not.
For loser tree, it reduces many tuple comparisons (maybe tree's height - 1?) if the keys
are same and increase one if not. The bad case is all keys are different, so we still need
to decide when to use the fast path, it's hard I think.

--
Regards,
ChangAo Chen


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Support loser tree for k-way merge
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox