Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSuUb-000BlS-0Y for pgsql-hackers@arkaria.postgresql.org; Fri, 29 May 2026 10:31:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wSuUY-002Eqs-0j for pgsql-hackers@arkaria.postgresql.org; Fri, 29 May 2026 10:31:02 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wSuUX-002Eqj-2s for pgsql-hackers@lists.postgresql.org; Fri, 29 May 2026 10:31:02 +0000 Received: from mail-wr1-f41.google.com ([209.85.221.41]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wSuUV-000000007WO-2dTH for pgsql-hackers@postgresql.org; Fri, 29 May 2026 10:31:01 +0000 Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-45eee266c6cso874217f8f.1 for ; Fri, 29 May 2026 03:30:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780050657; cv=none; d=google.com; s=arc-20240605; b=EqWTav2iLx45+t4z+bwafmlp4GUxIHyFdaNEe67khXf7Zo1s+no9sa99brCkmyGciJ xm+fDP2w3UmEuRHA8R1f/jLyhYJovMRmkVZricSxW6Y+FSK2wR0YZPk+IpkmiYvizuYx deya5kegVr4UU1AHqDLDUenM9+vwb1rc/yeDk2z+CgDDLugq01O1O/pl9bp4p+CwsAwW KyTFdTBXMiDT+DacMfXxJhJ7Ai0ZTlWvfLJKNE+QCxCCClibBg4gIQNudxgl2jteBEvj NEMaGU7tiPp5ZIPSzFogkezxu0rzMLwzxkWckeTZQktfUOH2FAa+x99p82KrwHgV19At eVkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=3bB2lBdMnzy58RxmPqgB9KuRCsWV5blGwY4RRc60qBw=; fh=geQcAAx7sa77yUK3CRGv6GN30ltJK5V1b4ClwfbAJK8=; b=kT5VLtGT1B0tkmUWPQDVtWfOY6eX1kDk4N4CHPZmxEB8GYG0SWpZKnb1gW5CtJabdP hEo4mLRIu+uJZfiOO+zUC9LontKzFpW73K/+kWmLbNMjfAxYETwtnHObQuBgC9CdkdHi XesMVJA6QlnT0a3H76srPrVb3HGnpRAUM+meSOGcXZzfgy6UiPJnmKH7ByXQiwlocGzY lHaduJunckxNoTEy6qFy/huORbS0nn2sTycVrxIHhAyiRM+LEdjSLtdsK0THo1jZ0Q// zVgkAs8lkhmNlNg58iqs4VVFF51bT/e+o+65zJC5wHh8Dd9HZpbZxefLrfhTBXYWvX2U 3YZQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780050657; x=1780655457; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=3bB2lBdMnzy58RxmPqgB9KuRCsWV5blGwY4RRc60qBw=; b=jJRboMdAwfzuRBsV6ZTYwEO7I9A4Is7W1fixhSdj44U1oM3ecW2x+iHTrz7VpFNabh 0rZl4aC2B4jQtr/XD4C4gFwZmg0xTkWKE2yUaBAVufYSgETxl2nM+vMsOGs70uzNddA7 N4hUrSZl9EA8iEV7DVsJIu++0RAlhiWRxyyEz5CnHDk9hsEtL4tEKzwMdS41WzgaU96Y Awk2MWs2NQRanl7gPolekNTeS+gsfYx2FZgOK7kvQUL7UWA9Z4KWoC9wp/fxdRRXJ7bT mse4COsXm70U1a3ZHOv4ibLC7o9zqDLFUG66HI9t5KiRK7D4pydlbGgZTrxCdyuLdE2q 19Zg== X-Forwarded-Encrypted: i=1; AFNElJ9nb1+oKVomGXJNkXl1v6OgVoRo7Xy4SAuYGDA6y5tqwrSYMR6l6qNnRZysPkF2KRl9ITyZyIQH5RqAr6LG@postgresql.org X-Gm-Message-State: AOJu0Yxja8ywIKuYLJ72jxitTwKb6d95D+pm2Yudky0NxjH+qXgnHEkS 8ZiCdb6wZbpc0XWhO7asQ2TG0AVBJdTjKWe/CWR4HZIedXr//m1m3HjbMtYkQKGPs/zl26P+WMT 8vWWCLhArWwl9eD0WQM72nuPtCc0fyBA= X-Gm-Gg: Acq92OE5zj0gwro6ynJCyA2De3LXNBrVR4f3LKyOF+fZa/OxxEepuXwny01t8mXgdGv 5LrTGPQNJcVOiD0mYedKve/bVA22ejzj984S3Oj2jnnw2+KoHmOaMuRjFgzk7iv9Hlu9AAO0Dq6 m4GUDT91G8YNnhQxjP7jOesDwI3T+rHm3H6Twm2VbVIfiCesc6716jRXkIqHHkW5MGof7A3Eesy cy5J+VUntVu3vOclff8Z9w9WALaDNFEaBbsvJJ70g+ai2nklatC9geFWLZM7yC8jRN/ginjmCRd Efu2r0cFT5jgsiRvnQ== X-Received: by 2002:a05:600c:3b10:b0:490:5872:e641 with SMTP id 5b1f17b1804b1-4909c0cfe40mr41659625e9.18.1780050657194; Fri, 29 May 2026 03:30:57 -0700 (PDT) MIME-Version: 1.0 References: <54c35fb9-da3a-4754-ab8c-46ed0b612465@vondra.me> <684c70d7-180e-461d-9377-600c2db581ba@vondra.me> <605328.1747710381@sss.pgh.pa.us> <691261.1747755511@sss.pgh.pa.us> In-Reply-To: From: Thom Brown Date: Fri, 29 May 2026 11:30:29 +0100 X-Gm-Features: AVHnY4KRsgVpJWNDDW0eO7jix3Xv3ljCEAO8Xb-uO0eWJWc_yeIwn8YZ99BcktY Message-ID: Subject: Re: generic plans and "initial" pruning To: Amit Langote Cc: Chao Li , Tom Lane , Tender Wang , Alexander Lakhin , Tomas Vondra , Robert Haas , Alvaro Herrera , Andres Freund , Daniel Gustafsson , David Rowley , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 29 May 2026 at 09:57, Amit Langote wrote: > > On Thu, May 28, 2026 at 10:14=E2=80=AFPM Thom Brown wrot= e: > > On Thu, 28 May 2026 at 09:14, Amit Langote wr= ote: > > > It's a real bug. > > > > > > You're right that if PortalLockCachedPlan() replans, the QueryDesc > > > created before the call still points at the old PlannedStmt from the > > > released plan. And yes, 0004 happens to fix it by rebuilding the > > > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are > > > broken on their own. > > > > > > Attached is an updated set with the fix: CreateQueryDesc now runs > > > after PortalLockCachedPlan() returns, as you suggested. That said, > > > I'll probably focus first on settling the plancache refactoring that > > > spun off from this thread [1], and then start a new thread for the > > > pruning-aware locking work on top of it, incorporating parts of this > > > series. > > > > Thanks. > > > > I've done another pass. I see a reference to > > AcquireExecutorLocksUnpruned(), but I can't find this function. Is > > this supposed to be AcquireExecutorLocksPrepared()? > > You're right, stale comment. It should say > AcquireExecutorLocksPrepared(). Fixed. > > > And also I have a question about the new firstResultRels code > > > > If I've followed it right, the bit in setrefs.c records the > > lowest-numbered RT index from leaf_result_relids as the > > per-ModifyTable fallback that's used when all real targets get pruned > > away, and the executor side looks it up via > > linitial_int(node->resultRelations). For that to work those two have > > to pick the same RT index, and the comment justifies it with > > "partition expansion preserves RT index order". Where is that > > preservation guaranteed? > > The ordering comes from expand_inherited_rtentry(), which adds child > partitions to the range table sequentially in partition bound order. > Since ModifyTable.resultRelations is built from the same expansion, > its first element is the lowest-numbered RT index among the leaf > partitions for that node. That is the same value > bms_next_member(leaf_result_relids, -1) returns from the Bitmapset, > because Bitmapset iteration returns members in ascending order. I've > added a comment in setrefs.c pointing to expand_inherited_rtentry() as > the source of this guarantee. > > > And with the assertion in ExecInitModifyTable: > > > > Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti)); > > > > With writable CTEs producing more than one ModifyTable node the list > > has several entries, so all the assert really checks is that some > > recorded entry matches, not that the one recorded for this particular > > node matches. If that's correct, then in a case where the wrong entry > > happened to line up the right relation wouldn't be locked and nothing > > would complain. Is there something that keeps these in order > > somewhere? > > This is a fair observation -- the Assert checks membership in the > global list rather than per-node correspondence. But node A's rti > can't accidentally pass the Assert by matching an entry recorded for > node B. Each ModifyTable node gets its own partition expansion with > distinct RT entries. In a writable CTE like: > > WITH upd1 AS (UPDATE t SET ...), > upd2 AS (UPDATE t SET ...) > UPDATE t SET ... > > each UPDATE creates a separate set of leaf partition RT entries -- > upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE > gets 11,12,13. The global firstResultRels list would be [5, 8, 11]. > When ExecInitModifyTable falls back to linitial_int(resultRelations) > for a given node, it finds that node's own entry, because the RT index > sets are disjoint across nodes. > > That said, it's worth being explicit about what protections exist at > each layer, since this is safety-critical code: > > 1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in > firstResultRels unconditionally. So regardless of which rti a > ModifyTable node falls back to, the relation will be locked. > > 2. ExecGetRangeTableRelation() has two checks when opening a relation. > For non-result relations (isResultRel=3Dfalse), it checks > es_unpruned_relids and raises an ERROR in release builds if the > relation was pruned. For result relations (isResultRel=3Dtrue), that > check is intentionally skipped -- it has to be, because at least one > result relation per ModifyTable node must remain openable even when > all partitions are pruned, since executor code paths like ExecMerge() > and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized > (see commit 28317de723b). The remaining protection for result > relations is Assert(CheckRelationLockedByMe()) inside table_open, > which fires in debug builds. > > 3. I've tightened ExecInitModifyTable to close this gap: the > all-pruned fallback path now raises an elog(ERROR) in release builds > if linitial_int(resultRelations) is not found in firstResultRels, > rather than just an Assert. This gives result relations a > production-visible check comparable to what es_unpruned_relids > provides for scan relations. > > So the net effect is that for scan relations, opening a > pruned-and-unlocked relation is caught by an ERROR in production via > es_unpruned_relids. For result relations on the all-pruned fallback > path, it's now also caught by an ERROR in production via the > firstResultRels check in ExecInitModifyTable. The locking in > AcquireExecutorLocksPrepared() ensures the relation is always locked > regardless. > > Thanks again for the review. A close look at these aspects by someone > other than me is very useful. Ah, the disjoint RT-entries point is what I was missing. I'd been reading firstResultRels as a flat list where in theory any entry could line up with any node's lookup, which is what made the assert feel potentially insufficient. If each ModifyTable's expansion produces its own non-overlapping set of leaf RT indexes then membership in the global list really is equivalent to membership in this node's own entry, and the assert is sufficient as it stands. Walking through the writable-CTE case helped. Thanks Thom