Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvbqf-00A6ad-0z for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Feb 2026 13:56:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvbqe-00Cm7L-0q for pgsql-hackers@arkaria.postgresql.org; Thu, 26 Feb 2026 13:56:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvbqd-00Cm7C-33 for pgsql-hackers@lists.postgresql.org; Thu, 26 Feb 2026 13:56:11 +0000 Received: from mail-ej1-x634.google.com ([2a00:1450:4864:20::634]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvbqa-00000001OoB-3ZvT for pgsql-hackers@lists.postgresql.org; Thu, 26 Feb 2026 13:56:11 +0000 Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-b935a74b7c2so103719866b.3 for ; Thu, 26 Feb 2026 05:56:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772114169; cv=none; d=google.com; s=arc-20240605; b=f95nVnaLxLKNY3Ah+FSWzng90a3h2P/thdEjCTdYHjQ8g3AOtCd8RB7jzjwSp73Q+G EIJS6xD4b3oxUz9v9Y7uJe9g5C5jFHqR+Z4igZ1MwHv4qUXvkQ2B7giyiojtgrnhtkUx RYYcMbGvfnDGEdSxXeoYOVIUnuIqZbXtiqafOvfC84jbZwK/p6HrQon5b9Soye6u7PgP vnT9mjr7SQqNApst2YgNe4dGT58rto/9Vz42XLeZUxcmaW3YnKB2IkeYJ6eFwqQdeTtV J40ufwgoIX6VrogQwC8rPivbp1ctTIUm12qPuwlyCAbptw4D5VdZh021wWvJqnN+n5Gn qgeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=u7VbZ8S9S54Bcey3hNhxk7vvSMqlPFKMYnOo9oS0Z0Q=; fh=j6Qhi6hZcKZKHng99Wn9KupXXG1do8LIfyeW47fdLOg=; b=b7OZ0gMbNQN3lbdPVpj7EWaYZAKTGJL0zJ5wzT3oHd2/WLSVLAzIdJ1PwBp7SmSqYj Z8ePxOTOTzsbAfYNhPPkRlVySHOtRJL6YnHRbs8G96YfNT6UBy5dshCITKqe+a1e1LGk K0Hg2BoCljuA5y2Dg1raS1ho1PTn5r6aUhzpH8WHgps6SYiCQuvMN5sVY7lq/LhB3akE cmQtVB8pyFSR/mUuRwM6Zi9qf4YTXXmu/ZbvJJbUYXp9sKisoQiV1kLGG3oxhLsOQJbx Ke2TuWAUnUyKRRz/gKEtxiP6BVvtShf/CkKmySvTEZksLPh5xP5LxcS9e1yb1pV3hRnX SYTw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772114169; x=1772718969; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=u7VbZ8S9S54Bcey3hNhxk7vvSMqlPFKMYnOo9oS0Z0Q=; b=dlpE9vzGhU8hr81Vhp9cwOKygNZNogaN6OGJNaP11oc/7mG/wyfol4mHOPEvmatJ8m 1XuSqN3RyVd3KLEZBaiKk0UbNttU0ShCrNcVpl4h0g0i4SIdgszxEJU2vkJ1nmZQ90zf d6/ZXiaUyF1/HRJW6hW+hPbK0XW1SQ7E1p3VlnYD/yhfqBTVXVf3bUQ7ZA7+HB3BdR+Z qadfb4rQ2oYwlO+kLUVar+ILu58pb9Smv+uAg92Te6R3Z6JoixuY7kedPKGYCHP/GYrz UI7gCF4eRU4HpXewtHw+vHIVjZQdw5ym2JrMfEO4Lpkz1C8Eihon0eMzI4XYZMSFrft0 XQhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772114169; x=1772718969; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=u7VbZ8S9S54Bcey3hNhxk7vvSMqlPFKMYnOo9oS0Z0Q=; b=xU9UcCl+epauBLAYLdj8pjvuf8KmmqUc2Iay08ZHiZ933ydiFPujRdW+t7yjC8sI+M plwuO0PgFuBPzgf9kZPTb05FGJT62GsEDe9AsHZw624EQ0Qr4118NgPr4daqtqAgReMN Wx8dFqvFTQh/FOi992s6tBdOrnE3k6GiTIpsJqrIA4luctXJLz7XJ91xeq8tIhZXclQO wpaaIPNEJcPKLnT04aW9abrtRAUo55hEmdACuGymEr5Ev1zwmxL04TP358ogQnRo3xEq cXSqAdj4cwl0OZRwNtBQF6nW/+zjTTSopytyUwSHpjsz5JhZqr3kdxf3ikJXDBnNW24l 8dVg== X-Forwarded-Encrypted: i=1; AJvYcCUxcJhFL/26ZHXKZ9tLotXjoQtZQD0fmuzx4j3yoasQAqSQgfL+T1q7Kbxhl42TMg6ycumEJ2/CYNX/XMPv@lists.postgresql.org X-Gm-Message-State: AOJu0YyEEFzFAvFLYR3NA8Ola7OCw8gpHsz+2/DjLoxQvnRSSEpOkSPS mZH8lP/LNyDX9nUO4qdh2+ipJTtodr2nq97EzZYeSYWzG3fin5fDJu0+gV8sq40iXr7k4KbF+CP kUiAWAsni9lX/f7jGnuYpn4MKRsuGVp4= X-Gm-Gg: ATEYQzxk2AQwoyObfxazIGxE/6ltPgG6gLtLdBZsdtz0soqB2/7fQjiiiMfT3UODths q2R39GlwRayXNI+1YLBDebgCAPB57FX3vNe26jq/P2yB1Dej1vS1yFkCIZ257Ax5s+twNrxBNzi WJsMT8qLRQmPdlGxsXAEroBVmTuz3LeNnq8HOcVb4hGrWbBuV2Z/NZn8GyduGCj8kqfaSEvIQku jabQzMhX1K0mxeUWTwYsofl0sqBQhCcl09/Q9+t4HvN27A7O5MAv+KIB+wJREVNPPmsy4lfhAlD 9neE40mfM9CGboA+na8uXyUBjGzzOld/1WhAa+o= X-Received: by 2002:a17:907:a45:b0:b88:504a:f381 with SMTP id a640c23a62f3a-b935b528ffemr173047166b.23.1772114168561; Thu, 26 Feb 2026 05:56:08 -0800 (PST) MIME-Version: 1.0 References: <1136161.1769654478@sss.pgh.pa.us> In-Reply-To: From: Robert Haas Date: Thu, 26 Feb 2026 08:55:56 -0500 X-Gm-Features: AaiRm51k21RwSUFjsqYSWhy8nKiLHEZwu2MJbYzaSFiPjYAfZciMmApUmle5KOQ Message-ID: Subject: Re: pg_plan_advice To: Alexandra Wang Cc: Richard Guo , Lukas Fittl , Tom Lane , Jacob Champion , Dian Fay , Matheus Alcantara , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Thanks, Alex, for the review. On Mon, Feb 23, 2026 at 8:10=E2=80=AFPM Alexandra Wang wrote: > While reading the patch, I noticed several places where we infer > planner intent from currently observed plan shapes (e.g., Gather > projection, Agg used for semijoin uniqueness, Append elision, etc.), > which I quote below: [....] > All of the interpretations above look reasonable, but they made me > wonder how modules like this detect planner-behavior drift over time > if we add new plan node types, add or change plan shapes, or even a > new debug GUC. In my personal opinion, this is one of the really key issues for this patch and deserves more discussion, but it feels like it's been very difficult to get any senior hackers to pay attention to this patch. I'm not sure whether that's because it's a huge patch, or they just don't want to touch it with a ten foot pole, or they're just busy with their own work, but it's a little hard as the patch author to know how reasonable the decisions you've made are until you hear what other people think of them. In terms of keeping things working, I think that test_plan_advice has a pretty good chance of catching future breakage. If somebody adds a new planner optimization that breaks the plan tree walk that pg_plan_advice does, they should also be adding test cases for it. If test_plan_advice can't generate plan advice any more after that change, the patch will cause a test failure there, or if it can generate plan advice but that plan advice doesn't properly apply to planning, again a test failure will result. But as to whether test_plan_advice is a good enough test or whether more things need to be added for people to feel comfortable, well, that's a question. Also, there's the issue that even if test_plan_advice is good enough, or can be made so, that still potentially burdens future patch authors with the task of updating pg_plan_advice for whatever they do. That's not unique to this patch, of course; any patch that expands the capabilities of PostgreSQL in a new direction has that issue to some degree. To take a few examples from my own past work, consider wait events or parallel query. But there's still a judgement call to made about whether the patch adds too much burden for what we get out of it. Personally, I think that some kind of user control over the planner is one of the really big things that experienced PostgreSQL users want, and therefore I'm inclined to believe it's worth it. But most people are in favor of their own patches, so that's not really surprising. > I realize this isn't specific to this patch, but I'm curious whether > something like monitoring a possible set of parent-child plan node > patterns has ever been considered, to make structural changes easier > to notice when those invariants stop holding. I don't think this is > actionable for this patch, but I'm just recording the thought here > while reading. One thing I've been thinking about is that we might want to consider annotating some planner nodes with a little more information so that pg_plan_advice doesn't have to infer quite as much. For example, if we annotated Agg nodes with their purpose (regular agg, eager agg, partitionwise agg, semijoin uniqueness) and their relid set, pg_plan_advice would have an easier time figuring out what's going on and could throw errors if unexpected things happen instead of delivering wrong results, possibly silently. I have avoided doing that kind of thing so far because I wanted to keep the core changes as small as possible, but there's an argument that I've taken too much risk there. One of the other things that bugs me about this patch is that if it turns out that there are things in the advice generation machinery that are not fully reliable, it may not be possible to fix those without an ABI break. That is, we can fix any problem we might have by having the core planner publish more information, but we've been trying really hard lately to avoid changing structure definitions or function signatures in minor releases, and I can imagine that it wouldn't be that hard for pg_plan_advice to be shown to have some problem that can't be fixed any other way. So what happens if I commit this and then such a bug shows up? Do we compromise on ABI stability, or do we just leave the bug unfixed for the life time of that release and fix it in the next major release, or what? I suppose it's hard to know in the abstract, but this is another area where getting some opinions from other committers would be awfully useful. > I also noticed that several tests in pg_plan_advice are not listed in > the Makefile. Specifically, they are semijoin.sql, syntax.sql, > prepared.sql, and local_collector.sql. I ran them manually, and they > all passed. Yeah, thanks, I actually also found this myself while doing some refactoring, shortly before you sent this email. Will fix. > In pgpa_walk_recursively(): > > /* > > * Check the future_query_features list to see whether this was previou= sly > > * identified as a plan node that needs to be treated as a query featur= e. > > * We must do this before handling elided nodes, because if there's an > > * elided node associated with a future query feature, the RTIs associa= ted > > * with the elided node should be the only ones attributed to the query > > * feature. > > */ > > foreach_ptr(pgpa_query_feature, qf, walker->future_query_features) > > { > > if (qf->plan =3D=3D plan) > > { > > active_query_features =3D list_copy(active_query_features); > > active_query_features =3D lappend(active_query_features, qf); > > walker->future_query_features =3D > > list_delete_ptr(walker->future_query_features, plan); > > break; > > } > > } > > Should this be deleting "qf" instead of "plan"? Right now the list is > not emptied after the plan tree walk as intended. Yeah, I think you're right. I don't think that mistake breaks anything, it just means the list grows needlessly long, but I'll fix it. > In pgpa_decompose_join(): > > if (found_any_outer_gather && > > ... > > if (found_any_inner_gather && > > "found_any_{outer,inner}_gather" should be dereferenced. Yikes, good catch. > In pgpa_trim_shared_advice(): > > /* Don't leave stale pointers around. */ > > memset(&chunk_array[remaining_chunk_count], 0, > > sizeof(pgpa_shared_advice_chunk *) > > * (total_chunk_count - remaining_chunk_count)); > > Should the element size be "sizeof(dsa_pointer)", consistent with the > preceding memmove()? Yes, thanks. > In pg_plan_advice_advice_check_hook(): > > if (error !=3D NULL) > > { > > GUC_check_errdetail("Could not parse advice: %s", error); // > > return false; > > } > > > > MemoryContextSwitchTo(oldcontext); > > MemoryContextDelete(tmpcontext); > > > > return true; > > If an error occurs, do we leak the memory context? Should we also > switch and delete memory context before returning false? I think this should be mostly harmless. The parent context, at least in most cases, shouldn't be anything that long-lived, and when it's destroyed, it will take this context with it. I think there are some cases where the parent context can be longer-lived, but they shouldn't happen often enough for this to become a significant issue. On the other hand, there's also no reason that I can see not to tighten this up. I changed it like this: if (error !=3D NULL) GUC_check_errdetail("Could not parse advice: %s", error); MemoryContextSwitchTo(oldcontext); MemoryContextDelete(tmpcontext); return (error =3D=3D NULL); > In pg_get_collected_local_advice(), we skip rows when: > > > if (!member_can_set_role(userid, ca->userid)) > > continue; > > Do we need to do the same check for pg_get_collected_shared_advice()? Hmm, yeah, probably a good idea. > In pgpa_planner_append_feedback(), "StringInfoData buf;" is unused. ... > The "ExplainState *explain_state" field and the "MemoryContext > trove_cxt" field in "struct pgpa_planner_state" are both unused. Thanks, deleted this things. --=20 Robert Haas EDB: http://www.enterprisedb.com