Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w3w9i-001hVT-1b for pgsql-hackers@arkaria.postgresql.org; Sat, 21 Mar 2026 13:14:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w3w9f-00AQe3-1s for pgsql-hackers@arkaria.postgresql.org; Sat, 21 Mar 2026 13:14:16 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w3w9f-00AQdv-0l for pgsql-hackers@lists.postgresql.org; Sat, 21 Mar 2026 13:14:15 +0000 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w3w9c-00000000LP2-3KGe for pgsql-hackers@lists.postgresql.org; Sat, 21 Mar 2026 13:14:15 +0000 Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-669462b0fecso1224292a12.3 for ; Sat, 21 Mar 2026 06:14:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774098852; cv=none; d=google.com; s=arc-20240605; b=B19L1OMPHbRLveQffgzH9ZT/hLErp1v3uw7u+G80ITKNcPLTvVJHaApJQ9XUUkOWFo 4Jx+HyUbt+UyTaP0NBkOD2zvdf3i4xFiecTQ+SiN6XA83cEI+7QCN/4I7xHIh9NZGCsR BNYoq9HLOTdmxvEVmBVnIb5FCjEG04yFis6uPBW89xE5dDG1F9i3XOgDfUfVsqBokqgF eBs9xmi8I3g3f8VyVloWCRpzQD9azr69dVxfts/1V0ghseM6tzwiGInk4Cb/ECIKdcAp kvs7IZBs55qYzvKcDAoSXcuuFSrtLOaQj9i+a030E6bUZ7nJ/3e+O01F6jBaphTIus1r v64w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=K56gou81sbgbuFh4j1tdPiteedKCvRail7SWFVvjrzY=; fh=YhmxPC0NYagsxVMppzYMgO0XMWsRSWzMegLjmIklu8U=; b=OI4mzn3EkfhJgCERcOlVsE5vcoOU1jhpbFg7guf2BuqCEZ3pKZ1t8jvvm2BffoqZWI TN7F9im9ZXpwj2KxMzfm8QHqR1SOGf38ahwCa0e/Fr+W/cd49btgDIyVhPobk5VUmPn+ 3yf4sAJdL2/DJWOd98FqGm7igdBXX/HZHv/CJSO/65vtWzvaHwGYTZEwi2KsZbv/oHVw FxzkjkZW/vts7kNX7pdtW2tEzbonSWmXTDcSvDkAGhV8rsq42v5THtUYcuZxqQdExsgj dhIW8S1dSeDNjWyFFSy8K3F3weczil/v4IpNry6dGFmc5U2CnWc4+1/BCkU3UxS7mnsF /xrQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774098852; x=1774703652; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=K56gou81sbgbuFh4j1tdPiteedKCvRail7SWFVvjrzY=; b=lrQOOOqtAZQHuCkFDkC7KgyEi7IjKDD8kIfW1/ahCajE/Ya1RQ+cZqkouxcjSb4vN7 ncZLkXguPxTYgb+E4ApM88R9KiG8Sk8mNo8rEl1X55qUgEgRG3OjyZorhsOPeYQlj+u9 aDy7ZpFgDeGuKfhmuHYa4qGNYE8eqBPMzTQPu8lhsZGunT4zzrlpz6Z19rmwXPHk9deK QSg7nOZMR8kzbamGhfAfL2rDZ/o3moy3hAGtYpgvMTO/uDK+9UGXp8UnptMxQMmtcQQz SOJWQwngNNzjGgd2N+5bHu3O/NMk8sEnY9eJNhDmjB0zYjcW4Tpa4vfRoVZI8o4RIBi4 CLAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774098852; x=1774703652; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=K56gou81sbgbuFh4j1tdPiteedKCvRail7SWFVvjrzY=; b=RGIIc+3Y+7s5DI0biPLCkYaokjIXEiBOj2ULjqkHaU5IT6NpNbdTZDIB6Xhqd8IYKT G4IgDjkMYST7V/nhacp2t70cxWWc1Yjh+ebfhvvze3YLkE8JP9hilOjo/beTD5huu6Wy DYSNKSBgz5IqXfOqeE/7OXhk6FAMbtmd0Jz804UhVjbUqaB2E4jNNX1leQ0dPRg+Czoe 9pSP/14uBMEi7P9hzy4uOTc7eX05ljPZDImAqdyDvHc3hhXWPg4r3Y/dvTu+xI80hL/t 3m8dzUxTpYqGyxqGYrOco+6SySDEIJ9EpghJmUYKdyZiBsPnLZdJWu/EIO/J8eVkzfl9 aRFQ== X-Gm-Message-State: AOJu0Yyj87CnykcvNjr4YSPrQn2Jhff+9Y/VRgIN9855AQvbmv20aHeR AWmTJz10BKjcLjUY5LsaWVC5WiCdgeBZb7G1H2jWdJSmMr0oKnhGVl/9/HDiU9EuQWsQgHopMIB J7VGW1L0w8RrN6lXQdGX5aWGx56qF+z1LZYT8 X-Gm-Gg: ATEYQzzYYYOyykAbCdw0nIOK4TviYQDUEfo/088XcBlPukajqMNt5yWiwQL0ImcRoi+ Xk4wkwRJ4y+7mhArPUTiSGnHPf+0/+9WSAPu7diwBfSvGOoG05olaAMDemCW3+hbbumpMxuTOJp Xcd8wjA+3zTgX7jGCjUsjhAJYADbutsie5UTaBqGXpsVYxai89szDqg4YL5htxqFjcoClSPQdDX OBxMyhKZDaER5hcizbF3HZqqeGkXi/RT6KnlBZiePn4Bu+jK/LhBm+wv0iITE+w8eg9+ITMZsVJ i+8m6HIyDNz9LhH6ZRUhzPm+8hxGqvfrJVi+AS4= X-Received: by 2002:a17:906:e86:b0:b97:f2b3:49b8 with SMTP id a640c23a62f3a-b982f4deef9mr338976066b.47.1774098851849; Sat, 21 Mar 2026 06:14:11 -0700 (PDT) MIME-Version: 1.0 References: <1136161.1769654478@sss.pgh.pa.us> <1299934.1773938807@sss.pgh.pa.us> In-Reply-To: From: Robert Haas Date: Sat, 21 Mar 2026 09:13:59 -0400 X-Gm-Features: AaiRm52rzY48UAwhp0rfcI_Sg_fWY7yyNmDddVlxj7NA0aTBVHHNJBgRAtLJxMI Message-ID: Subject: Re: pg_plan_advice To: Tom Lane Cc: PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Mar 19, 2026 at 4:38=E2=80=AFPM Robert Haas = wrote: > schnauzer's got a failure that looks like this: > > +WARNING: supplied plan advice was not enforced > +DETAIL: advice SEQ_SCAN(pg_trigger@exists_1) feedback is "matched, fail= ed" > +advice NO_GATHER(pg_trigger@exists_1) feedback is "matched, failed" > > And an older run on skink has a failure that looks like this: > > +WARNING: supplied plan advice was not enforced > +DETAIL: advice SEQ_SCAN(pg_trigger@exists_6) feedback is "matched, fail= ed" > +advice NO_GATHER(pg_trigger@exists_6) feedback is "matched, failed" I spent a bunch of time looking into this. I don't have a definitive answer to the question of what to do about it yet, but I wanted to write down some initial thoughts. First, I think that I broke fix_alternative_subplan when I added disabled_nodes. Before, when disabling things just affected the cost, the logic here would have taken what was disabled into account in picking between alternatives. Now it doesn't, because I didn't add a disabled_nodes field to SubPlan. You can see that it breaking with this test case: CREATE TABLE t1 (a int); CREATE TABLE t2 (a int); CREATE INDEX ON t2(a); INSERT INTO t1 SELECT generate_series(1, 1000); INSERT INTO t2 SELECT generate_series(1, 100000); ANALYZE; EXPLAIN (VERBOSE, COSTS ON) SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.a =3D t1.a) OR t1.a < 0; SET enable_seqscan =3D off; SET enable_indexonlyscan =3D off; EXPLAIN (VERBOSE, COSTS ON) SELECT * FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.a =3D t1.a) OR t1.a < 0; With the currently-committed code, you get a hashed SubPlan both times, and it's just disabled in the second case. But there's a perfectly good non-hashed variant that is not disabled: scan t2 using a parameterized Index Scan. So that sucks. I have a small patch to fix this, which I'll post later. I don't think we can or should do anything about this in released branches, but we should fix it in master regardless of what happens to pg_plan_advice or test_plan_advice. Second, in terms of actually fixing the problem, I think the issue here is that the scope I chose for pg_plan_advice doesn't quite fit the goal of making test_plan_advice the canonical way of testing this code. In order to keep this project manageable in size, I decided that, for the first version, pg_plan_advice wasn't going to try to control anything above the level of scan/join planning, so for example we don't care what kind of aggregation the planner chooses. That should be OK for test_plan_advice, because test_plan_advice works by checking if all the advice applied cleanly, and since we didn't emit any advice about the aggregation method in the first place, there can't be any problem with applying it later. In other words, if the aggregation method chosen does flip between consecutive planning cycles, it should not cause a test_plan_advice test failure. This same general principle applies to a bunch of other cases too, like set operation planning: if there's more than one way to do it, the planner can change its mind and test_plan_advice should not care. However, there's an important exception: if something changes above the level of scan/join planning that affects what rels are involved in scan/join planning, then a plan change will cause a test_plan_advice failure. The failures above are of that type: the way the AlternativeSubPlan machinery works is that the query gets copied and replanned, and each plan has a separate plan name. So the final plan has either pg_trigger@exists_5 or pg_trigger@exists_6 in it, but never both. There's one other case that I think is similar, which is the MinMaxAggPath stuff: if we choose the MinMaxAggPath, then the final plan will have something like t1@minmax_1 where it otherwise would have had just t1, or perhaps t1@minmax_1 instead of t1@something_else. These cases have something in common, which is that they are the only cases where we make a new PlannerInfo to try replanning part of the query in a second way. That's the pattern that causes breakage here. I would be remiss if I did not mention that Jakub Wartak was poking at me about the MinMaxAggPath case a while back, but I dismissed it as out of scope, which was accurate, but that's because I didn't think at the time that it would destabilize test_plan_advice. Now, I think it could, although I don't think we have seen any such failures in the buildfarm yet. Perhaps a concurrent statistics change is more likely to flip the hashed/non-hashed SubPlan decision than the choice of whether to use MinMaxAggPath. One approach that I considered here is to try to unify the "sibling" relations. If the final plan is bound to contain either pg_trigger@exists_5 or pg_trigger@exists_6, maybe the advice shouldn't actually think there are two separate things. Maybe instead of subqueries exists_1 ... exists_6 we should end up with subqueries exists_1 ... exists_3, with each name used twice. That amounts to deciding that the patch to give each subplan a unique name got it wrong. While this idea has some appeal, ultimately I think it's a loser, because addressing the problem this way wouldn't actually fix the test_plan_advice instability we're currently seeing, or at least not necessarily. For example, in the test case earlier in this email, INDEX_SCAN(t2@whatever) can only be applied if the non-hashed subplan is chosen, because we won't consider a plan index scan to even be a possibility for a full table scan. An index-only scan is considered, but not a plain index scan. This means that even if we flattened the rels in each pair of siblings together and called each pair by the same name, we could still see failures to apply advice cleanly. Moreover, that's assuming that optimizations like self-join elimination, left join removal, and partition pruning always apply in the same way to both copies, which doesn't sound like a safe assumption at all. A second approach is to try to control the initial conditions of the two planning cycles better. Possibly just running the tests serially instead of in parallel would get that done, but that seems too slow to consider and, even if we did it anyway, I'm not all that sure that an autoanalyze or autovacuum in the background couldn't mess us up. Or, alternatively, if we could keep the second planning cycle from seeing different statistics than the first, I think that would do it, but I don't think there's a feasible way of doing that. So I'm left with the idea that to get test_plan_advice to be fully stable on these slower machines, it will probably be necessary to make it control which AlternativeSubPlan is chosen and whether a MinMaxAggPath is chosen or not. I have some ideas about how to accomplish that in a reasonably elegant fashion without adding too much new machinery, but I need to spend some more time validating those ideas before committing to a precise course of action. More soon. --=20 Robert Haas EDB: http://www.enterprisedb.com