Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8RhL-000XTA-1O for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 23:43:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8RhK-008lfs-09 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 23:43:38 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8RhJ-008lfk-2S for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 23:43:38 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w8RhI-00000000GU5-17Wp for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 23:43:37 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 632NhXEA3683431; Thu, 2 Apr 2026 19:43:33 -0400 From: Tom Lane To: Robert Haas cc: Alexander Lakhin , Lukas Fittl , PostgreSQL Hackers Subject: Re: pg_plan_advice In-reply-to: References: <1299934.1773938807@sss.pgh.pa.us> <0afba1ce-c946-4131-972d-191d9a1c097c@gmail.c! om> Comments: In-reply-to Robert Haas message dated "Fri, 27 Mar 2026 08:55:41 -0400" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3683429.1775173413.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Thu, 02 Apr 2026 19:43:33 -0400 Message-ID: <3683430.1775173413@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk My animal sifaka just showed an all-new type of test_plan_advice failure [1]: diff -U3 /Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/regress/expect= ed/limit.out /Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/te= st_plan_advice/tmp_check/results/limit.out --- /Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/regress/expected/li= mit.out 2026-04-02 12:35:13 +++ /Users/buildfarm/bf-data/HEAD/pgsql.build/src/test/modules/test_plan_a= dvice/tmp_check/results/limit.out 2026-04-02 12:49:59 @@ -5,6 +5,8 @@ SELECT ''::text AS two, unique1, unique2, stringu1 FROM onek WHERE unique1 > 50 ORDER BY unique1 LIMIT 2; +WARNING: supplied plan advice was not enforced +DETAIL: advice INDEX_SCAN(onek public.onek_unique1) feedback is "matched= , inapplicable, failed" two | unique1 | unique2 | stringu1 = -----+---------+---------+---------- | 51 | 76 | ZBAAAA =3D=3D=3D EOF =3D=3D=3D [12:50:02.062](11.620s) not ok 1 - regression tests pass This is unlike the other cases we've been looking at: no sub-selects, no GEQO, not even any joins. There is pretty much only one plausible plan for that query, so how did it fail? After looking around, I think the likely explanation is that the concurrently-run alter_table.sql test feels free to mess with the set of indexes on onek. It doesn't drop onek_unique1, but it does momentarily rename it: ALTER INDEX onek_unique1 RENAME TO attmp_onek_unique1; ALTER INDEX attmp_onek_unique1 RENAME TO onek_unique1; I've not looked closely at pg_plan_advice, but if it matches indexes by name then it seems there's a window here where the advice would fail to apply. Also, further down we find ALTER TABLE onek ADD CONSTRAINT onek_unique1_constraint UNIQUE (unique1); ALTER INDEX onek_unique1_constraint RENAME TO onek_unique1_constraint_foo; ALTER TABLE onek DROP CONSTRAINT onek_unique1_constraint_foo; which means there's a window there where there are two plausible indexes to choose. Will test_plan_advice cope if the transient one is chosen? We could imagine dodging this problem either by having alter_table.sql test some purpose-built table instead of a shared one, or by having it do these hacks inside transactions so that other sessions can't see the intermediate states. But I'm quite resistant to that answer, because I think part of the point here is to ensure that concurrent DDL doesn't misbehave. (Which it doesn't: these test fragments have been there since 2018 and 2012 respectively, and not caused issues AFAIK.) Preventing our tests from exercising concurrent DDL in order to satisfy test_plan_advice is not a good plan IMO. There's also the prospect of a long tail of whack-a-mole as we locate other places in the tests where this sort of thing happens occasionally. So I'm not sure what to do here, but we have a problem. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=3Dsifaka&dt=3D= 2026-04-02%2016%3A35%3A13