Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2EuN-00075a-2l for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Mar 2026 20:51:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2EuM-00Co8u-0u for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Mar 2026 20:51:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2EuL-00Co8m-2x for pgsql-hackers@lists.postgresql.org; Mon, 16 Mar 2026 20:51:25 +0000 Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2EuI-00000000UHk-3f7o for pgsql-hackers@lists.postgresql.org; Mon, 16 Mar 2026 20:51:25 +0000 Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-b93698bb57aso24318866b.0 for ; Mon, 16 Mar 2026 13:51:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773694282; cv=none; d=google.com; s=arc-20240605; b=FAs41TcgQThTCpDIRQdk3kkhiQk5ulS0KDbtuat62huTAgQJLWLksucnvRxkZlabsS moVAe6OtyJ/2i0mCqM2H6JjljTsQMo6v1f3iJ6WwLz+0uxkWLoQbdu5/wcCmOOXXpbgW BoW6Ty4UMu8ykRF1n5xMnf7MbA3TI6n9RVkqy7OCiCf1eUV+fMr1MAIwNtgxLbbXsYBY j34EkleERpUg0sDb9cCGk/iIhFPIoc37A72Mub+IDRiHE2G1888m2FkerXI6RApin5TA VSXCXS5iVSPq2eEsAsBtl/fTO55Fhz0iQYMAHSX2l0PjqzGryyLTdjOyGOoQRWp9ip29 Rt7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GxehdV9qwH2TnZg/w/FfMCe1AoTSFgt4nzRcQzdIK5M=; fh=o5P7BajRZxzWUSmWpez+NBfDAJ1QUSSPXF09zFt+abQ=; b=kZYgqX/jOWroRi6PNlvvcJIIvWBzc+O6hOMW9KeAobubfgMMk9N/cL5nPDcNeDwa9c 9jbXrgrW//pgQQIo1SfdKlotfGqwSHYTkisVcFvD0VCwRJ9n9Co0rhWkjQDpWzIhZeHR q/gSDrSIdifRE8EANvtH8Ku6XxrFVB5+bu34oLkpSjnGNwqJkXMgcNMAlfqDZM7f55UF wEKNJvn9QghLu9XdzZpGR7/BXMRdioD3JZqYrUgPG3EBM+bz5UnBf5xAzEYU/aG1Eccw Lpm2gQQxJW8zXwTLUx9hdEYZrue4OnoxMkCdhSVsRmaugou0ZrXpRwkh9xE4rh10TCUJ DGmQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773694282; x=1774299082; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GxehdV9qwH2TnZg/w/FfMCe1AoTSFgt4nzRcQzdIK5M=; b=cBeIKoPxdGWB/Oc98wsenX3RbvBZ8HxTsacuYL+J5Bqo7gLQh737y/a+EkshIlqecd ZBF7x4/jt5+JsJrJ6WGGZhXIn1BvvatywUoFkvyK087sr82G0Hzb4wB1Le/Og9LJoMo1 O12yvkhOltwYrEux9cGRX5LstGfIuhO3oel7nBSPWgvHyjMoezotq5lzoHgyWuFkMURM m/ALU4EgFtUqW6mdqvdAoTccDcicMy2+O9RtcWa8yZ18Y1Nq9iQHs3q/xHbLF2tumING N+zUwXP82c8RT6zRfo7C9E/Ds+WZXumkU9PnIpW96WAaSauTy/2LVIciuK5+3EiVgywi ZRtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773694282; x=1774299082; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=GxehdV9qwH2TnZg/w/FfMCe1AoTSFgt4nzRcQzdIK5M=; b=Tl4/n3eYHXV1X2gPNunK7YGsNEiQ2CVxSse8P9rI4KjYyNXKolScEZxQLwp9w9QYFZ IV/3lMOfbdDWdna2FLka2GavCEjzgqPZH6Wkz/gtfk7i9+++BM1oYmRKtX6t/au7e8TG qcKkqolGq+sq9uX5OcTMW98E/iyfuFEswjT0K7DTfbdlMwXbpOcNXGb3HTDMOqUKeXdy GpY0/9HSXQwVF5Y1Ne5IGEoCUXeNdH32Rmmngk8+p/1v64J25J9YR+J3F4PBVGChlNMW DETnzN3k/501ruMxVBFDqY+BtNiqFpsh3XLqIndDmUVqz4/tEudTNzIwls3z3Zsg5aWn qN+A== X-Forwarded-Encrypted: i=1; AJvYcCUKjZ2IB6vmP0ACj0WmxM5Te/slEXGGcNTLaWnf6LXnDGlMg09CXRgJneBRlW7q1tF2oj+raRvZwuOHCh0M@lists.postgresql.org X-Gm-Message-State: AOJu0Yxwu4a5cNfjXCo3JUPLG8x5rFNQy6kPqKA8p13W1slTMVn5Ku2q faowdsMaXa8Z7CiCPe89Q8omFO8aoI3Wxb7Dvg0W8jGlhEuts18shHEEC/Ew+m31zrqtqy+wHOO 8csxGzuwmIwc7i3I8+5C/f2RGQIncAbc= X-Gm-Gg: ATEYQzyx2CmKHDPcy1QAFpMBfj1pAzdI6J1EOMzT4XMUlKwFcQM3qQgNAeDWa7QM930 nlMK7T1FFOd/KYeDpc/PXOPA7xqWxP2dBW6uQ8wGlQQGvno1AwgGVTUx1zPvVzrc+eKFZoET3qv 6pOwCncff5EXpjwmMJVcNr7d8CBUe3VHjoPVtGK7rFQOfqsJP1J3S58WQWggESYYNG69EwCh4qK +BCxSaBuGKT4dOQ9qRqpVqIkP+SHxXj2R8qMRAYl99go+DoltDIk/gIyo2PHYeYu8EDrJtZFdVN US8Af8kXAD78nNi7XNi/M+GcF3/L1Xu2V4j16WU= X-Received: by 2002:a17:907:968b:b0:b97:1d24:c000 with SMTP id a640c23a62f3a-b97d6dac0fdmr69262466b.24.1773694281343; Mon, 16 Mar 2026 13:51:21 -0700 (PDT) MIME-Version: 1.0 References: <1136161.1769654478@sss.pgh.pa.us> In-Reply-To: From: Robert Haas Date: Mon, 16 Mar 2026 16:51:09 -0400 X-Gm-Features: AaiRm51bI-7mmKWKhWGoZmvhHjQcgtWYDd-7DYYlfMf0B-Q0vSl5dX9wDiRYGWI Message-ID: Subject: Re: pg_plan_advice To: Lukas Fittl Cc: "David G. Johnston" , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Mar 13, 2026 at 4:39=E2=80=AFAM Lukas Fittl wrote= : > From a code review perspective, I found some minor issues: > > - Identifiers that are named like advice types cause an error (so I > can't name my table "hash_join") Thanks, I extracted and committed your fix for that issue. > - pgpa_destroy_join_unroller does not free the inner_unrollers values > (which I think goes against the intent of cleaning up the allocations > right away) Yeah, that makes sense. > - pgpa_parser uses pgpa_yyerror, but that function doesn't use > elog(ERROR) like the main parser does - I think it'd be more > consistent to use YYABORT explicitly, e.g. how we do it in cubeparse Perhaps, but I think this needs more thought, because you haven't done anything about the calls in pgpa_scanner.l. I think right now, the operating principle is that parsing continues after an error and that we mustn't crash as a result, though the resulting tree will probably be invalid for practical use. If we're going to change that to stop on the spot, I think we should probably do it for both lexing and parsing, and think about whether that leads to any other changes or simplificatons. > - pgpa_scanner accepts integers with underscores, but incorrectly uses > a simple strtoint on them (which would fail), instead of pg_strtoint32 > / pg_strtoint32_safe OK. > - pgpa_walk_recursively uses "0" for the boolean value being passed to > a recursive call in one case (should be "false") OK. > From a usability perspective, I do wonder about two things when it > comes to users specifying advice directly (which will happen in > practice, e.g. when debugging plan problems): > > 1) The lack of being able to target scan advice (e.g. SEQ_SCAN) to a > partition parent is frustrating - I believe we discussed partitioning > earlier in the thread in terms of gathering+applying it, but I do > wonder if we shouldn't at least allow users to specify a partitioned > table/partitioned index, instead of only the children. See attached > nocfbot-0002 for an idea what would be enough, I think. I'm not on board with this without a lot more study. I've been down this road before, and it can easily end in tears. Examining the partition structure on the fly can have a performance cost, and it might even have security ramifications or at least bugs if there are concurrent modifications to the partitioning structure happening. Also, the test_plan_advice framework doesn't do much to tell us whether this actually works. Also, I understand the frustration and I'm sure we'll want to introduce various forms of wildcards, but I think there will be a lot of opinions about that should actually look like. One can, as you've done here, follow index links from child to parent. One can do a wildcard match on the index name. One could want to specify an index on a particular column rather than a specific name, to survive index renamings. I wouldn't be surprised if there are other ideas, too. Three weeks before feature freeze is not the time to be taking an opinionated position on what the best answers will ultimately turn out to be. It's easy to write a tool that will spit out matching index advice for all indexes involving in a partitioning hierarchy, and I think that's what people should do for now. Or, perhaps they should use the generated advice from actual plans instead of writing hand-crafted advice. We always have the option to add more to this in the future, but taking things out is not half so easy. > 2) I find the join strategy advice hard to use - pg_hint_plan has > hints (e.g. HashJoin) that allow saying "use a hash join when joining > these two rels in any order", vs pg_plan_advice requires setting a > specific outer rel, which only makes sense when you want to fully > specify every aspect of the plan. I suspect users who directly write > advice will struggle with specifying join strategy advice as it is > right now. We could consider having a different syntax for saying "I > want a hash join when these two rels are joined, but I don't care > about the order", e.g. "USE_HASH_JOIN(a b)". If you think that's > worthwhile I'd be happy to take a stab at a patch. I'd be inclined to classify that as a design error in pg_hint_plan, but maybe I'm just not understanding something. Under what circumstances would you know that you wanted two tables to be joined via a hash join but not care which one was on which side of the join? The rels on the two sides are treated very differently. One of them needs to be small enough to fit in the hash table and should be one where repeated index lookups aren't better (else, you should be advising a nested loop and an index scan). The other can be basically anything. What I know (in a situation where I might write some advice manually) is that I want a certain table to be the one that goes into the hash table, not that there should be a hash join someplace in the general vicinity of one of the tables. Also, there's a definitional question here. What exactly does USE_HASH_JOIN(a b) mean? Possible definitions: 1. "a" must be joined directly to "b" without any other tables on either side, and a hash join must be used to perform that join. 2. either "a" must appear on the inner side of a hash join with "b" somewhere on the other side, possibly accompanied by other tables, or the reverse 3. the plan must contain at least one hash join where "a" and "b" appear on opposite sides of the join Suppose I have a fact table f and three dimension tables d1, d2, and d3. If I wrote USE_HASH_JOIN(f d1) USE_HASH_JOIN(f d2) USE_HASH_JOIN(f d3), what happens? Under definition 1, the advice fails, because f must be first joined to either d1, d2, or d3, and whichever one is chosen, the other advice can't now be satisfied. Under definition 2, I will definitely end up with hash joins all the way through, but it's possible that the driving table will be one of the dimension tables, which can be first joined to f and then to the other tables. Under definition 3, I'm not even guaranteed to end up with all hash joins: I can join d1, d2, and d3 to each other in any way I like, and then hash join the result of that to f in either direction, and the rule is satisfied for all three advice items. None of those possibilities sound right. I argue that definitions 1 and 3 produce such absurd results in this scenario that they're not even worth any further consideration, but I can see someone arguing that definition 2 doesn't sound so bad. After all, you could still fix it to achieve the probably-expected outcome by also writing JOIN_ORDER(f). But that only works here because we know what we want the driving table to me. Suppose alternatively that we have two large tables B1 and B2 and a small table S, and we've figured out that the planner tends to like to use the index on table S when it really ought to be using a hash join, but we trust its judgement as to how to join B1 and B2. With HASH_JOIN() as I've implemented it, that's easy: just write HASH_JOIN(S). With your USE_HASH_JOIN, how do I do that exactly? If I write USE_HASH_JOIN(B1 S), definition 2 permits a plan like this: Hash Join -> Nested Loop -> Seq Scan on B2 -> Index Scan on S -> Hash -> Seq Scan on B1 To me, that looks a heck of a lot like my USE_HASH_JOIN() locution just didn't do anything. It certainly didn't do what I intended it to do, and the only way I can make sure that something like this doesn't happen is to *also* constrain the join order. If I'm willing to decide which of B1 and B2 should be the driving table, then I can write JOIN_ORDER(B1 B2 S) or JOIN_ORDER(B2 B1 S) and now everything is fixed. But this seems 100% backwards given your stated goal: you want to be able to constraint the join method without constraining the join order. As I see it, the problem here is that a symmetric USE_HASH_JOIN directive either uses something like definition 1 which is too tight a constraint, or definition 2 or 3 which are extremely weak constraints that essentially allow the planner to satisfy the constraint using some other part of the plan tree. Now, of course, I got to pick these examples, so I picked examples that prove my point. Maybe there are examples where a "one side or the other" constraint actually works better. But I don't know what those examples are. When I've experimented this kind of thing, I've found that I never get the results that I want because the planner just does something stupid that technically satisfies the constraint but is nothing like what I actually meant. If you know of examples where my definitions suck and the "one side or the other" definition produces great results, I'd love to hear about them ... although I would have loved to hear about them even more 4.5 months ago when I first posted this patch set and already had the phrase "useless in practice" in the README on exactly this topic. This is exactly why I put the patches up for design review before they were fully baked. > For v20-0001, from a quick conceptual review: > > I find the two separate GUC mechanisms for local backend vs shared > memory a bit confusing as a user (which one should I be using?). > Enabling the shared memory mechanism on a system-wide basis seems like > it'd likely have too high overhead anyway for production systems? > (specifically worried about the advice capturing for each plan, though > I haven't benchmarked it) > > I wonder if we shouldn't keep this simpler for now, and e.g. only do > the backend local version to start - we could iterate a bit on > system-wide collection out-of-core, e.g. I'm considering teaching > pg_stat_plans to optionally collect plan advice the first time it sees > a plan ID (plan advice is roughly a superset of what we consider a > plan ID there), and then we could come back to this for PG20. The shared version is rather useful for testing, though. That's actually why I created it initially: turn on the shared collector, run the regression tests, and then use SQL to look through the collector results for interesting things. You can't do that with the local collector. > To help assess impact, I did a quick test run and looked at three > not-yet-committed patches in the commitfest that affect planner logic > ([0], [1] and [2]), to see if they'd require pg_plan_advice changes > (master with v20-0002 applied). Maybe I picked the wrong patches, but > at least with those no pg_plan_advice changes were needed with the > test_plan_advice test enabled. Nice. > On the code itself: > - Is there a reason you're setting > "pg_plan_advice.always_store_advice_details" to true, instead of using > pg_plan_advice_request_advice_generation? > - I wonder if we could somehow detect advice not matching? Right now > that'd be silently ignored, i.e. you'd only get a test failure when we > generate the wrong advice that causes a plan change in the regression > tests. I think it already does more than what you seem to be thinking: test_plan_advice checks the advice feedback, too. However, it's also true that even more could be done. The code proposed here checks that all advice is /* matched */ without being /* failed */ or /* inapplicable */, but I have code locally that checks the reverse, namely that every decision that, during the re-plan, every decision could have been constrained by advice actually was. I felt it was a bit too late to add that to what I was submitting for v19, but that decision could certainly be revisited. Taking it even further, we could do structural comparisons of the before-and-after plans, or we could search for disabled nodes aside from what default_pgs_mask should imply. I'm not very confident that those things are worth the code they would take to implement, though. The existing checks found a lot of bugs, but I think whatever is still wrong is probably not super-likely to be found by additional cross-checks. That could be wrong; it's just a hunch. > For v20-0003, initial thoughts: > > I think getting at least a basic version of this in would be good, as > a server-wide way to set advice for queries can help people get out of > a problem when Postgres behaves badly - and we know from pg_hint_plan > (which has a hint table) that this can be useful even without doing > any kind of parameter sniffing/etc to be smart about different > parameters for the same query. Yep. > The name "stash" feels a bit confusing as an end-user facing term. > Maybe something like "pg_apply_advice", or "pg_query_advice" would be > better? (though I kind of wish we could tie it more closely to "plan > advice", but e.g. "pg_plan_advice_apply" feels too lengthy) I've heard that other people also find "stash" not as intuitive as it could be, and I'm open to changing it. However, whatever we call this has to make its relationship to pg_plan_advice clear. If we have pg_plan_advice and pg_query_advice, which one is the core module and which one is automatically supplying advice? You can't tell from the name, which I think will be confusing. In the long run, I suspect we will end up with a moderately-large pile of tools for either capturing advice or automatically supplying it, and we should think about how we'd like all those to get named. Right now, I suppose we might end up with something like this: pg_plan_advice: The core. There can only be one. different ways of capturing advice strings: pg_collect_advice, pg_capture_advice, pg_save_advice, pg_baseline_advice, .... different ways of supplying advice strings: pg_stash_advice, pg_auto_advice, pg_lookup_advice, pg_store_advice, pg_advice_from_query_comment, ... I don't love this, because the pattern I see developing here is that module names will either be very long or they'll just take a word from an existing module name (like "collect") and replace it with a synonym (like "capture"). That's not going to produce anything very mnemonic, so it would be nice to do better. But I think the route to doing better has to be to get more specific with the naming, not less. Like, if we renamed pg_stash_advice to pg_lookup_in_memory_advice_by_query_id, then the name tells you EXACTLY what it does, which is great. Unfortunately, the name is also very long, which is why I didn't go that route. --=20 Robert Haas EDB: http://www.enterprisedb.com