Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSaZ1-003HLP-2u for pgsql-hackers@arkaria.postgresql.org; Thu, 28 May 2026 13:14:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wSaYy-00CCQz-2U for pgsql-hackers@arkaria.postgresql.org; Thu, 28 May 2026 13:14:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSaYy-00CCQq-1A for pgsql-hackers@lists.postgresql.org; Thu, 28 May 2026 13:14:17 +0000 Received: from mail-wr1-f48.google.com ([209.85.221.48]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wSaYx-00000001AaX-11wa for pgsql-hackers@postgresql.org; Thu, 28 May 2026 13:14:16 +0000 Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-45eea4c0649so500921f8f.0 for ; Thu, 28 May 2026 06:14:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779974054; cv=none; d=google.com; s=arc-20240605; b=R4djHwtgIcb3EWRl9gRTGpSG4oaoWpAYvBc8Mq7UTROI5pp4uWBhfeDzMtgIf9rBZJ w1p6ysCup7sc1j4BwcrkBzpM6kH8yJ5tKOXEG2zDqfmFXmBvvXtnxkhUK+ZjyHf3F0aj goxhmnrvjpf5ooX9mJCZUHre9T4or0qoN2VbdPiQSYJhQwVLcuOseX7QAii9kB+m+yst 11OTqN9vrm7/FhmadVa/iQ/txAYGp9PkR15S5laepD3qbHJ/zhdO84ZVTp01PN74EGx9 /OfRwog2D42ACAqG+cZdkcTQ9EYRuw8ZMPG69UZwdnasmF2jzQUUvzrT17AL7N7Afgnj nP8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=N1HoCKPBimpB1AiVwPn4qgo0tlAd+AHaWvYJp9TR40A=; fh=JtKN8TFTcxKNDIl4hyGGIbEaD64U7YHzT5UhmIjtM8g=; b=Li/nhJXzJtbIhEk5j44tDldD7jyyeDpw3T11T01esdvN4QWNhgyd8+afwMFWQWWAfX cl/lasG4mgHmwZULMDNJdo8EuBi2wWSRBJM3+oKqU3Sgd9GS7xhCfxqCkqrGr8b4t1BJ PxRN1UcUMS79az2iMwdS2QshJ93a4FXg0UpX7Pv1PQNUmQs4QtHHXJKXwQtZgx2hK+9J EBT4ZAXcuJo1esjBJgigFrLWGpwa98GKPUu979WCxDzSkz2/p/cXIPmKTrXFDCqGpCSD a/SevEHe3jQPRaRft+W2y2ORGU6C4FXXRIYGTpAl45oNzvHPlFAWrtN+bWGax5nOWQ4j y+2Q==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779974054; x=1780578854; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=N1HoCKPBimpB1AiVwPn4qgo0tlAd+AHaWvYJp9TR40A=; b=a2W0tJw5nSyK9Qm+F2a7/BZOAKmslpX3sPN3ANhXCfKTwXX2adfx3mRNtoRH1+Oh0J NgJfVO8be1XhV9iWw3hvOsKNWmNPYROxbUs6tXiobL+6eXT+hle1cxsO2NnBPCBZ0h8P DW/tOGyeKaBCKo5uv2Tph08Vjm1RTv21Zj2yZMqRFII4D83Y7BKHVXk3Mlmn0GJ9Yd3V iwIYjYWtvCEZ87qhfMK+MOcL+0yxwQGfZzeRyx4Hhs9RUFVb214E85eQtS7qjnVB+O/p gaahL9sYJ5lpNjDkHVB3SxaA8S7jR4xTN2sXi6QmqEGlmBIwsDWSK+FKb/aPRMBly2Hn U+bQ== X-Forwarded-Encrypted: i=1; AFNElJ/pMHUuNvckyWn0mHg1kwdIjixX7Sqn62VLYYp9dkfpVMnfc9CKdvr8JW2jsjjmlSjgNkMa7Y0H9FLfP5iQ@postgresql.org X-Gm-Message-State: AOJu0YwmX2vYVcz9tA4/krNRqTEQOJYHGXZJ99TCKtjPlTJNIHEJ3Uvz wChi/Fmw2fIe9DmUpUWiuV/jg9mK74jmHzGB3hKufnCFoJmDkB56z+dL+M5UHiXQ0pRc8FTmcWD Q5X8+tnwkZEuziMIppC6S4iOMTSPjmXk= X-Gm-Gg: Acq92OEilWOlG1vYWyGYW+qEt4XoZsFq8oRgnpwGQKVwNsdtq01yapnLrSjkJn9MfDe 0chYRhBkoBhDDeoOgiWIUQWpCkVzJhxMwrhw6juLV2kYiJxcdB/A8a+6KTCRYCCCbvrHAHihND2 uK66vQiftM9Ce9qXLRruYBQmC7bXXQ5UGpPZyBRDKXNm/WrdFXQtx5FWtdzkAF3C5tHeKC6FBS3 YnxCL3fzTUlwRZAjl/m/gtI+FoFQ3rTYMfnOaF4xWq5GR01m59wkpqzIrBqlfmC5tx/E99yZPvW P+E6iPRt4ttyvdh385T2roDjExl5 X-Received: by 2002:a5d:59c4:0:b0:44a:9b52:8891 with SMTP id ffacd0b85a97d-45eb36909d9mr42050631f8f.11.1779974053363; Thu, 28 May 2026 06:14:13 -0700 (PDT) MIME-Version: 1.0 References: <54c35fb9-da3a-4754-ab8c-46ed0b612465@vondra.me> <684c70d7-180e-461d-9377-600c2db581ba@vondra.me> <605328.1747710381@sss.pgh.pa.us> <691261.1747755511@sss.pgh.pa.us> In-Reply-To: From: Thom Brown Date: Thu, 28 May 2026 14:13:46 +0100 X-Gm-Features: AVHnY4Lmf6C4iDxoJ9tNGhZqmxYJnNg1PC6JygZX6UfjPUt1aef4c14hh1HSVQg Message-ID: Subject: Re: generic plans and "initial" pruning To: Amit Langote Cc: Chao Li , Tom Lane , Tender Wang , Alexander Lakhin , Tomas Vondra , Robert Haas , Alvaro Herrera , Andres Freund , Daniel Gustafsson , David Rowley , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 28 May 2026 at 09:14, Amit Langote wrote: > > Hi Thom, > > On Wed, May 27, 2026 at 9:03=E2=80=AFPM Thom Brown wrote= : > > > > On Sat, 4 Apr 2026 at 13:11, Amit Langote wro= te: > > > > > > Attached is a redesigned version. While working on the previous > > > design, I grew increasingly uncomfortable with CachedPlanPrepData -- > > > it was smuggling executor state out of GetCachedPlan() through an > > > out-parameter, which papered over the real problem: GetCachedPlan() > > > was doing too much. The main change in this version is architectural: > > > GetCachedPlan() no longer acquires execution locks. Callers now own > > > that responsibility, which is natural because each call site iterates > > > stmt_list differently and manages execution state in its own way -- > > > and it lets them choose between conservative lock-all and > > > pruning-aware locking where appropriate. > > > > > > Non-portal call sites remain on the conservative path for now. > > > _SPI_execute_plan requires care around snapshot setup, which happens > > > after plan fetch rather than before. SQL functions have a different > > > issue: init_execution_state() fetches the plan while postquel_start() > > > handles execution, with execution_state containers in between, making > > > it harder to thread a prepped QueryDesc through. The portal path and > > > EXPLAIN EXECUTE cover the most common > > > prepared-statement-with-partitions workloads; the remaining sites can > > > be converted incrementally. > > > > > > This is now starting to feel closer to what Tom suggested back in > > > January 2023 [1], where he proposed getting rid of > > > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing > > > lock acquisition out to callers. He noted that "we'd be pushing the > > > responsibility for looping back and re-planning out to fairly > > > high-level calling code" and that "we'd definitely be changing some > > > fundamental APIs." That is the direction I came around to over the > > > last couple of weeks while wrestling with CachedPlanPrepData. The > > > reverted approach also tried to follow Tom's direction but moved > > > locking into ExecutorStart(), which forced it to handle plan > > > invalidation from inside the executor by mutating the CachedPlan > > > in-place. This version moves locking out to the callers instead, so > > > the executor and plan cache never reach into each other. > > > > > > The series is now four patches: > > > > > > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds > > > AcquireExecutorLocks() as a caller-facing function with validity chec= k > > > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the > > > portal retry logic. All callers are converted. No behavioral change. > > > > > > 0002: Refactor executor's initial partition pruning setup. Cleanup > > > only, no behavioral change. > > > > > > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors > > > range table init, permission checks, and initial pruning out of > > > InitPlan(). Scaffolding for 0004; all callers still go through the > > > normal ExecutorStart() path. > > > > > > 0004: Use pruning-aware locking for single-statement cached plans. > > > Adds ExecutorPrepAndLock() which locks unprunable relations, runs > > > ExecutorPrep() to determine surviving partitions, then locks only > > > those. Extends PortalLockCachedPlan() with a pruning-aware path for > > > eligible plans. Multi-statement CachedPlans (from rule rewriting) > > > always use conservative locking. In principle, this could be relaxed > > > if the planner can prove that no pruning expression reads state > > > modified by an earlier statement, but that is left for a future patch= . > > > Includes regression tests. > > > > > > In case it's not clear, I'm not targeting v19 at this point. I'd lik= e > > > to get this into v20 CF1 and would welcome review from anyone > > > interested. > > > > After not having looked at this in close to 2 years, I thought I'd > > give it another look. > > Thanks for taking a look. > > > Not found any user-facing issues, and I'm liking > > seeing so few locks in pg_locks. I can see that with pruning disabled, > > the fallback works, pruning-aware locking is working via SPI through > > plpgsql, running ALTER between executions and also invalidating > > indexes force replans, and it's looking good. > > > > But I also think there might be a bug in patch 0001, but I'd > > appreciate checking my reasoning because I'm not fully confident I've > > been diligent enough. > > > > When PortalStart() opens a SELECT cursor that's backed by a cached > > plan, it does roughly the following. It builds a queryDesc (an > > executor-side struct), one of whose fields is a pointer into the plan > > tree inside the portal's cached plan. Then it calls > > PortalLockCachedPlan() to acquire the necessary locks, and finally > > hands the queryDesc over to the executor. > > > > My worry is about what happens if the cached plan turns out to be > > stale, for instance because someone ran DDL on a referenced table. In > > that case PortalLockCachedPlan() throws the old plan away (via > > ReleaseCachedPlan) and fetches a freshly-built replacement, updtating > > the portal's own pointers to match. But the queryDesc from earlier > > isn't touched. Its plan pointer still references the old, now-released > > plan. From what I can see, once that old plan's last reference is > > dropped its memory can be freed, which would leave the executor > > reading from freed memory in the next step. > > > > The bit I'm least sure about is whether the old plan's memory really > > does get reclaimed straight away when its refcount hits zero. If > > something keeps it alive longer then this isn't a bug, or at least not > > as bad as I'm making out. I had a look but couldn't convince myself > > either way from the code alone. To actually hit this you'd need a > > cursor on a cached plan, plus an invalidation arriving in the small > > window between the portal being set up and the cursor being opened. > > The race condition is brief, and I've not been able to hit it in > > testing. > > > > The thing that got me thinking this is real: patch 0004 modifies > > PortalLockCachedPlan() so that whenever it replans, it also rebuilds > > the queryDesc. That's pretty much the fix I'd expect for this, which > > makes me suspect somebody hit it at some point. But 0004 only applies > > that fix on the new pruning-aware code path, and it was mentioned in > > the thread that 0001 to 0003 might land before 0004. If so, master > > would carry the bug in the gap between the two. > > > > I suspect a way to deal with it would be to move the CreateQueryDesc > > call in the SELECT case to after PortalLockCachedPlan() returns, which > > is what the other portal strategies already seem to do. Alternatively, > > you could bring 0004's changes in this area into 0001 and have > > PortalLockCachedPlan() always rebuild the queryDesc when it replans. > > > > If I've got this wrong and there's some lifetime mechanism I missed > > that keeps the old plan's memory alive, then it's a non-issue and I'm > > misreading the code. If I have got it wrong, could you please add > > comments to make what is going on clearer? > > It's a real bug. > > You're right that if PortalLockCachedPlan() replans, the QueryDesc > created before the call still points at the old PlannedStmt from the > released plan. And yes, 0004 happens to fix it by rebuilding the > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are > broken on their own. > > Attached is an updated set with the fix: CreateQueryDesc now runs > after PortalLockCachedPlan() returns, as you suggested. That said, > I'll probably focus first on settling the plancache refactoring that > spun off from this thread [1], and then start a new thread for the > pruning-aware locking work on top of it, incorporating parts of this > series. Thanks. I've done another pass. I see a reference to AcquireExecutorLocksUnpruned(), but I can't find this function. Is this supposed to be AcquireExecutorLocksPrepared()? And also I have a question about the new firstResultRels code If I've followed it right, the bit in setrefs.c records the lowest-numbered RT index from leaf_result_relids as the per-ModifyTable fallback that's used when all real targets get pruned away, and the executor side looks it up via linitial_int(node->resultRelations). For that to work those two have to pick the same RT index, and the comment justifies it with "partition expansion preserves RT index order". Where is that preservation guaranteed? And with the assertion in ExecInitModifyTable: Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti)); With writable CTEs producing more than one ModifyTable node the list has several entries, so all the assert really checks is that some recorded entry matches, not that the one recorded for this particular node matches. If that's correct, then in a case where the wrong entry happened to line up the right relation wouldn't be locked and nothing would complain. Is there something that keeps these in order somewhere? Thom