Re: scale parallel_tuple_cost by tuple width

public inbox for [email protected]  
help / color / mirror / Atom feed

From: David Rowley <[email protected]>
To: Tom Lane <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: scale parallel_tuple_cost by tuple width
Date: Tue, 31 Mar 2026 11:51:35 +1300
Message-ID: <CAApHDvpOPs-Ywcze5=eyi4s5hO1NM9RA8No20Q=s+0L3LiorHw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On Tue, 31 Mar 2026 at 03:17, Tom Lane <[email protected]> wrote:
>
> Andrew Dunstan <[email protected]> writes:
> > While investigating a performance issue, I found that it was extremely
> > difficult to get a parallel plan in some cases due to the fixed
> > parallel_tuple_cost. But this cost is not really fixed - it's going to
> > be larger for larger tuples. So this proposal adjusts the cost used
> > according to how large we expect the results to be.
>
> Unfortunately, I'm afraid that this is going to produce mostly
> "garbage in, garbage out" estimates, because our opinion of how wide
> tuples-in-flight are is pretty shaky.  (See get_expr_width and
> particularly get_typavgwidth, and note that we only have good
> statistics-based numbers for plain Vars not function outputs.)
> I agree that it could be useful to have some kind of adjustment here,
> but I fear that linear scaling is putting way too much faith in the
> quality of the data.

(I suspect you're saying this because of the "Benchmark 2" in the text
file, which contains aggregates which return a varlena type, of which
we won't estimate the width very well for...)

Sure, it's certainly true that there are cases where we don't get the
width estimate right, but that's not stopped us using them before. So
why is this case so much more critical?  We now also have GROUP BY
before join abilities in the planner, which I suspect must also be
putting trust into the very same thing. Also, varlena-returning
Aggrefs aren't going to be the Gather/GatherMerge targetlist, so why
avoid making improvements in this area because we're not great at one
of the things that could be in the targetlist?

For the patch and the analysis: This reminds me of [1], where some
reverse-engineering of costs from query run-times was done, which
ended up figuring out what we set APPEND_CPU_COST_MULTIPLIER to. To
get that for this case would require various tests with different
tuple widths and ensuring that the costs scale linearly with the
run-time of the query with the patched version. Of course, the test
query would have to have perfect width estimates, but that could be
easy enough to do by populating a text typed GROUP BY column and
populating that with all the same width of text for each of the tests
before increasing the width for the next test, using a fixed-width
aggregate each time, e.g count(*). The "#define
PARALLEL_TUPLE_COST_REF_WIDTH 100" does seem to be quite a round
number. It would be good to know how close this is to reality.
Ideally, it would be good to see results from an Apple M<something>
chip and recent x86. In my experience, these produce very different
performance results, so it might be nice to find a value that is
somewhere in the middle of what we get from those machines. I suspect
having the GROUP BY column with text widths from 8 to 1024, increasing
in powers of two would be enough data points.

David

[1] https://postgr.es/m/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B=ZRh-rxy9qxfPA5Gw@mail.gmail.com

view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: scale parallel_tuple_cost by tuple width
  In-Reply-To: <CAApHDvpOPs-Ywcze5=eyi4s5hO1NM9RA8No20Q=s+0L3LiorHw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox