MIME-Version: 1.0
References: 
 <CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com>
 <CAKZiRmxtJAFG7e1+Vs9B8ngON=AOzJbuws+1ZeH4LsbJh5AzoQ@mail.gmail.com>
 <CA+TgmoY9Ne_Sh10u6LSPc3wvOQPLp3kF9nBp3nqJEG2JGF2QiA@mail.gmail.com>
 <CA+Tgmoa57S6mP=aTOXH2-gDAL4TMO1WbGgrHSg0s6J4zUH=04g@mail.gmail.com>
 <DEB1SAGVDX7S.JMV96X4R0RCS@gmail.com>
 <CA+Tgmoaf__2B0BUL+vrg28P+3buX=Ti-kybqkHiLTtFrrCfzuA@mail.gmail.com>
 <CA+TgmoYpcLNOuypOTdgCSLW7FuA=t6BtB3meTARHX2-Dj_81xQ@mail.gmail.com>
 <DEFNP3WX7BLX.1X7DS1P9X6JIF@nmfay.com>
 <CA+TgmoZjv9OyFu1Gkt78w0vWEti8S33w8trYHmErf-GMmGSi=w@mail.gmail.com>
 <DELPC4B56Y5A.3OBZ3WCH3T4VX@nmfay.com>
 <CA+TgmoaOSBQD9Ux4eG40w723ZN=c0J7p-+oX4+J8urUeyLMo5w@mail.gmail.com>
 <CAOYmi+=g+MMoOpWkk2weXWKJcKH0eKey8gKHHdH0dF4Tiawrhw@mail.gmail.com>
 <CA+TgmobwaT=PXPDDrgDup+jA8KHBbkxigtziD-zNzAKKkQYVgQ@mail.gmail.com>
 <CAOYmi+mOmEW=amDRQMfw6-Fb3ZmDEQFaJzwk8Bc8W8DzaP85XQ@mail.gmail.com>
 <CA+TgmoaX2AMW4cdFM3OngBJxmxpkdmzF33R7-CWhvRLfucbFMg@mail.gmail.com>
 <CAOYmi+k4AyWCQHK=XVF99KVDuFkqxcADao61OWGLxu0nRYMONQ@mail.gmail.com>
 <CA+TgmoZ0x3ym_oueXRWzbM_=6ucKoPZVGj3rRMLBDC_FnetXDw@mail.gmail.com>
 <CAP53Pkycc=7N2bLzVT3x+qE1JamvRZWev5tFjdLJ1+-AV3Di+Q@mail.gmail.com>
 <CA+TgmoaKhuD91RnazbRyGkmP7--JdNq8oNDC3UcgTZSWbMxC7w@mail.gmail.com>
 <CAP53Pkw5-wMEeDJXFmqo_RTyL_spzCXb7HHKrbSnQqokVoZcNQ@mail.gmail.com>
 <CA+Tgmob-69bzbdi3U_QtebqAf6u1y8js=5=oNK639csVe1VbhA@mail.gmail.com>
 <CA+TgmoaZMOikxK=LqS+Jn+835h9S139JLGk-3LyETVXw5W5j=w@mail.gmail.com>
 <1136161.1769654478@sss.pgh.pa.us>
 <CAP53PkwZ1ZTMARKg6iEfAw9qzBhkjBitj-9gr_Jvy7k2AwGgWA@mail.gmail.com>
 <CAMbWs4--NuEUFE_xTo991TRXaZryE29jarJPDnVxoaQOYdt7tA@mail.gmail.com>
 <CA+TgmobzR+XMGbRosVPbjHbSo4+cgJn=qZK6w05aF1sbj=C+9Q@mail.gmail.com>
 <CA+TgmoawzvCoZAwFS85tE5+c8vBkqgcS8ZstQ_ohjXQ9wGT9sw@mail.gmail.com>
 <CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com>
 <CAK98qZ2RzbgCHrSg4zLkvpzyBam_X6te-KF8w1+_vON9BAVMEw@mail.gmail.com>
 <CA+TgmoaCdsuvNn6T6SfQ_0YD2Hh2+hgTXh9fTGHQhPg1zvy2rQ@mail.gmail.com>
 <CAK98qZ1zWzRB0ABG7ULzTeWKRR5C7-FxLqM-6v8wQDiFM+DNAg@mail.gmail.com>
 <CA+Tgmob7ozJAs5SU6bD2RfAt4w_AmsMGz-aaVA6WeLXHkBypOg@mail.gmail.com>
 <CAK98qZ1J42RoAsHnYWGPPmHziZmzmqE=Lp_O6WJ-9aKK2qjikA@mail.gmail.com>
 <CA+TgmoYjcBA6dw3nwiyfDzPXTCrxTZPXDMrc2TrDJcL1cPK6iA@mail.gmail.com>
 <CA+TgmoYru-vxoTKfwjQby30r2OkTXfb18Km_=VLs6qk8Akr0-g@mail.gmail.com>
 <CAKFQuwYxTMx1e1dvVLd5vs8CO2Xe__YRCW0m4v2UErG=5Lu7kQ@mail.gmail.com>
 <CA+TgmoaPH3rOmH1zw8s10Az3TQKRbDnF3dKYZCAFvx_DgFV5_Q@mail.gmail.com>
 <CAKFQuwZ7CRbvmJ60EjvEUJP8X9hWVCHJz8yJZ0-xRpYwL=fugA@mail.gmail.com>
 <CA+TgmoYsG2pAOG4YYWwnGsaL+QfOZopi_m+f_hZqBZ3meNEkjg@mail.gmail.com>
 <CAKFQuwY4i6xXCRHd=ccrFz0sQOfiWRxhobiBrzyK0rzu1-Y33Q@mail.gmail.com>
 <CA+Tgmobbj_TaCsYmr1grJBTRKaFaxFfotXn1T6LSXs9GQ8_Kyw@mail.gmail.com>
 <CA+TgmoZzM2i+p-Rxdphs4qx7sshn-kzxF91ASQ5duOo0dFRXLQ@mail.gmail.com>
 <CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com>
In-Reply-To: 
 <CAP53PkzKeD=t90OfeMsniYrcRe2THQbUx3g6wV17Y=ZtiwmWTQ@mail.gmail.com>
From: Robert Haas <robertmhaas@gmail.com>
Date: Mon, 16 Mar 2026 16:51:09 -0400
Message-ID: 
 <CA+TgmoYMpO=+EQwDhMoH=zmTGbB6TD5-_9bciNEoDArP8JjSMw@mail.gmail.com>
Subject: Re: pg_plan_advice
To: Lukas Fittl <lukas@fittl.com>
Cc: "David G. Johnston" <david.g.johnston@gmail.com>,
	PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CA%2BTgmoYMpO%3D%2BEQwDhMoH%3DzmTGbB6TD5-_9bciNEoDArP8JjSMw%40mail.gmail.com>
Precedence: bulk

On Fri, Mar 13, 2026 at 4:39=E2=80=AFAM Lukas Fittl <lukas@fittl.com> wrote=
:
> From a code review perspective, I found some minor issues:
>
> - Identifiers that are named like advice types cause an error (so I
> can't name my table "hash_join")

Thanks, I extracted and committed your fix for that issue.

> - pgpa_destroy_join_unroller does not free the inner_unrollers values
> (which I think goes against the intent of cleaning up the allocations
> right away)

Yeah, that makes sense.

> - pgpa_parser uses pgpa_yyerror, but that function doesn't use
> elog(ERROR) like the main parser does - I think it'd be more
> consistent to use YYABORT explicitly, e.g. how we do it in cubeparse

Perhaps, but I think this needs more thought, because you haven't done
anything about the calls in pgpa_scanner.l. I think right now, the
operating principle is that parsing continues after an error and that
we mustn't crash as a result, though the resulting tree will probably
be invalid for practical use. If we're going to change that to stop on
the spot, I think we should probably do it for both lexing and
parsing, and think about whether that leads to any other changes or
simplificatons.

> - pgpa_scanner accepts integers with underscores, but incorrectly uses
> a simple strtoint on them (which would fail), instead of pg_strtoint32
> / pg_strtoint32_safe

OK.

> - pgpa_walk_recursively uses "0" for the boolean value being passed to
> a recursive call in one case (should be "false")

OK.

> From a usability perspective, I do wonder about two things when it
> comes to users specifying advice directly (which will happen in
> practice, e.g. when debugging plan problems):
>
> 1) The lack of being able to target scan advice (e.g. SEQ_SCAN) to a
> partition parent is frustrating - I believe we discussed partitioning
> earlier in the thread in terms of gathering+applying it, but I do
> wonder if we shouldn't at least allow users to specify a partitioned
> table/partitioned index, instead of only the children. See attached
> nocfbot-0002 for an idea what would be enough, I think.

I'm not on board with this without a lot more study. I've been down
this road before, and it can easily end in tears. Examining the
partition structure on the fly can have a performance cost, and it
might even have security ramifications or at least bugs if there are
concurrent modifications to the partitioning structure happening.
Also, the test_plan_advice framework doesn't do much to tell us
whether this actually works. Also, I understand the frustration and
I'm sure we'll want to introduce various forms of wildcards, but I
think there will be a lot of opinions about that should actually look
like. One can, as you've done here, follow index links from child to
parent. One can do a wildcard match on the index name. One could want
to specify an index on a particular column rather than a specific
name, to survive index renamings. I wouldn't be surprised if there are
other ideas, too. Three weeks before feature freeze is not the time to
be taking an opinionated position on what the best answers will
ultimately turn out to be. It's easy to write a tool that will spit
out matching index advice for all indexes involving in a partitioning
hierarchy, and I think that's what people should do for now. Or,
perhaps they should use the generated advice from actual plans instead
of writing hand-crafted advice.

We always have the option to add more to this in the future, but
taking things out is not half so easy.

> 2) I find the join strategy advice hard to use - pg_hint_plan has
> hints (e.g. HashJoin) that allow saying "use a hash join when joining
> these two rels in any order", vs pg_plan_advice requires setting a
> specific outer rel, which only makes sense when you want to fully
> specify every aspect of the plan. I suspect users who directly write
> advice will struggle with specifying join strategy advice as it is
> right now. We could consider having a different syntax for saying "I
> want a hash join when these two rels are joined, but I don't care
> about the order", e.g. "USE_HASH_JOIN(a b)". If you think that's
> worthwhile I'd be happy to take a stab at a patch.

I'd be inclined to classify that as a design error in pg_hint_plan,
but maybe I'm just not understanding something. Under what
circumstances would you know that you wanted two tables to be joined
via a hash join but not care which one was on which side of the join?
The rels on the two sides are treated very differently. One of them
needs to be small enough to fit in the hash table and should be one
where repeated index lookups aren't better (else, you should be
advising a nested loop and an index scan). The other can be basically
anything. What I know (in a situation where I might write some advice
manually) is that I want a certain table to be the one that goes into
the hash table, not that there should be a hash join someplace in the
general vicinity of one of the tables.

Also, there's a definitional question here. What exactly does
USE_HASH_JOIN(a b) mean? Possible definitions:

1. "a" must be joined directly to "b" without any other tables on
either side, and a hash join must be used to perform that join.
2. either "a" must appear on the inner side of a hash join with "b"
somewhere on the other side, possibly accompanied by other tables, or
the reverse
3. the plan must contain at least one hash join where "a" and "b"
appear on opposite sides of the join

Suppose I have a fact table f and three dimension tables d1, d2, and
d3. If I wrote USE_HASH_JOIN(f d1) USE_HASH_JOIN(f d2) USE_HASH_JOIN(f
d3), what happens? Under definition 1, the advice fails, because f
must be first joined to either d1, d2, or d3, and whichever one is
chosen, the other advice can't now be satisfied. Under definition 2, I
will definitely end up with hash joins all the way through, but it's
possible that the driving table will be one of the dimension tables,
which can be first joined to f and then to the other tables. Under
definition 3, I'm not even guaranteed to end up with all hash joins: I
can join d1, d2, and d3 to each other in any way I like, and then hash
join the result of that to f in either direction, and the rule is
satisfied for all three advice items.

None of those possibilities sound right. I argue that definitions 1
and 3 produce such absurd results in this scenario that they're not
even worth any further consideration, but I can see someone arguing
that definition 2 doesn't sound so bad. After all, you could still fix
it to achieve the probably-expected outcome by also writing
JOIN_ORDER(f). But that only works here because we know what we want
the driving table to me. Suppose alternatively that we have two large
tables B1 and B2 and a small table S, and we've figured out that the
planner tends to like to use the index on table S when it really ought
to be using a hash join, but we trust its judgement as to how to join
B1 and B2. With HASH_JOIN() as I've implemented it, that's easy: just
write HASH_JOIN(S). With your USE_HASH_JOIN, how do I do that exactly?
If I write USE_HASH_JOIN(B1 S), definition 2 permits a plan like this:

Hash Join
  -> Nested Loop
    -> Seq Scan on B2
    -> Index Scan on S
  -> Hash
    -> Seq Scan on B1

To me, that looks a heck of a lot like my USE_HASH_JOIN() locution
just didn't do anything. It certainly didn't do what I intended it to
do, and the only way I can make sure that something like this doesn't
happen is to *also* constrain the join order. If I'm willing to decide
which of B1 and B2 should be the driving table, then I can write
JOIN_ORDER(B1 B2 S) or JOIN_ORDER(B2 B1 S) and now everything is
fixed. But this seems 100% backwards given your stated goal: you want
to be able to constraint the join method without constraining the join
order. As I see it, the problem here is that a symmetric USE_HASH_JOIN
directive either uses something like definition 1 which is too tight a
constraint, or definition 2 or 3 which are extremely weak constraints
that essentially allow the planner to satisfy the constraint using
some other part of the plan tree.

Now, of course, I got to pick these examples, so I picked examples
that prove my point. Maybe there are examples where a "one side or the
other" constraint actually works better. But I don't know what those
examples are. When I've experimented this kind of thing, I've found
that I never get the results that I want because the planner just does
something stupid that technically satisfies the constraint but is
nothing like what I actually meant. If you know of examples where my
definitions suck and the "one side or the other" definition produces
great results, I'd love to hear about them ... although I would have
loved to hear about them even more 4.5 months ago when I first posted
this patch set and already had the phrase "useless in practice" in the
README on exactly this topic. This is exactly why I put the patches up
for design review before they were fully baked.

> For v20-0001, from a quick conceptual review:
>
> I find the two separate GUC mechanisms for local backend vs shared
> memory a bit confusing as a user (which one should I be using?).
> Enabling the shared memory mechanism on a system-wide basis seems like
> it'd likely have too high overhead anyway for production systems?
> (specifically worried about the advice capturing for each plan, though
> I haven't benchmarked it)
>
> I wonder if we shouldn't keep this simpler for now, and e.g. only do
> the backend local version to start - we could iterate a bit on
> system-wide collection out-of-core, e.g. I'm considering teaching
> pg_stat_plans to optionally collect plan advice the first time it sees
> a plan ID (plan advice is roughly a superset of what we consider a
> plan ID there), and then we could come back to this for PG20.

The shared version is rather useful for testing, though. That's
actually why I created it initially: turn on the shared collector, run
the regression tests, and then use SQL to look through the collector
results for interesting things. You can't do that with the local
collector.

> To help assess impact, I did a quick test run and looked at three
> not-yet-committed patches in the commitfest that affect planner logic
> ([0], [1] and [2]), to see if they'd require pg_plan_advice changes
> (master with v20-0002 applied). Maybe I picked the wrong patches, but
> at least with those no pg_plan_advice changes were needed with the
> test_plan_advice test enabled.

Nice.

> On the code itself:
> - Is there a reason you're setting
> "pg_plan_advice.always_store_advice_details" to true, instead of using
> pg_plan_advice_request_advice_generation?
> - I wonder if we could somehow detect advice not matching? Right now
> that'd be silently ignored, i.e. you'd only get a test failure when we
> generate the wrong advice that causes a plan change in the regression
> tests.

I think it already does more than what you seem to be thinking:
test_plan_advice checks the advice feedback, too. However, it's also
true that even more could be done. The code proposed here checks that
all advice is /* matched */ without being /* failed */ or /*
inapplicable */, but I have code locally that checks the reverse,
namely that every decision that, during the re-plan, every decision
could have been constrained by advice actually was. I felt it was a
bit too late to add that to what I was submitting for v19, but that
decision could certainly be revisited. Taking it even further, we
could do structural comparisons of the before-and-after plans, or we
could search for disabled nodes aside from what default_pgs_mask
should imply. I'm not very confident that those things are worth the
code they would take to implement, though. The existing checks found a
lot of bugs, but I think whatever is still wrong is probably not
super-likely to be found by additional cross-checks. That could be
wrong; it's just a hunch.

> For v20-0003, initial thoughts:
>
> I think getting at least a basic version of this in would be good, as
> a server-wide way to set advice for queries can help people get out of
> a problem when Postgres behaves badly - and we know from pg_hint_plan
> (which has a hint table) that this can be useful even without doing
> any kind of parameter sniffing/etc to be smart about different
> parameters for the same query.

Yep.

> The name "stash" feels a bit confusing as an end-user facing term.
> Maybe something like "pg_apply_advice", or "pg_query_advice" would be
> better? (though I kind of wish we could tie it more closely to "plan
> advice", but e.g. "pg_plan_advice_apply" feels too lengthy)

I've heard that other people also find "stash" not as intuitive as it
could be, and I'm open to changing it. However, whatever we call this
has to make its relationship to pg_plan_advice clear. If we have
pg_plan_advice and pg_query_advice, which one is the core module and
which one is automatically supplying advice? You can't tell from the
name, which I think will be confusing. In the long run, I suspect we
will end up with a moderately-large pile of tools for either capturing
advice or automatically supplying it, and we should think about how
we'd like all those to get named. Right now, I suppose we might end up
with something like this:

pg_plan_advice: The core. There can only be one.

different ways of capturing advice strings: pg_collect_advice,
pg_capture_advice, pg_save_advice, pg_baseline_advice, ....

different ways of supplying advice strings: pg_stash_advice,
pg_auto_advice, pg_lookup_advice, pg_store_advice,
pg_advice_from_query_comment, ...

I don't love this, because the pattern I see developing here is that
module names will either be very long or they'll just take a word from
an existing module name (like "collect") and replace it with a synonym
(like "capture"). That's not going to produce anything very mnemonic,
so it would be nice to do better. But I think the route to doing
better has to be to get more specific with the naming, not less. Like,
if we renamed pg_stash_advice to
pg_lookup_in_memory_advice_by_query_id, then the name tells you
EXACTLY what it does, which is great. Unfortunately, the name is also
very long, which is why I didn't go that route.

--=20
Robert Haas
EDB: http://www.enterprisedb.com