Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uSatG-009yeF-5o for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Jun 2025 12:30:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uSatD-001F3E-4T for pgsql-hackers@arkaria.postgresql.org; Fri, 20 Jun 2025 12:30:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uSatC-001F36-Pn for pgsql-hackers@lists.postgresql.org; Fri, 20 Jun 2025 12:30:39 +0000 Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uSatB-0037dF-1I for pgsql-hackers@postgresql.org; Fri, 20 Jun 2025 12:30:39 +0000 Received: by mail-pj1-x1030.google.com with SMTP id 98e67ed59e1d1-3138d31e40aso1628703a91.1 for ; Fri, 20 Jun 2025 05:30:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750422635; x=1751027435; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WULBnkISPZxxwU8WMY3wxka+l2Bfm3M+nI4xBUwxZGk=; b=TaHIQH5POoAbPTBtFkb0+xFLjDbGrltPPrtaAiMo0syKsIFq54m6+5M5pkN31/PVWc s7btWPNT4tpj/FnjmKDdfLdncB8VXrbUiSqvxitHfRRuVb02Pxgk9b2YJJPQyhxsTeJj NwHYCUJf05raBOMKUUuiltp2jZAJgXTv82lWkqyNKmHlePJtjRbIz4519Ho75kKwyGI4 Oj9+HqgqC7NUd6GtjkokLI8+K3mEu/zk/Cobxcl0ngkFay9UgXbbqnm4VlXOC56UYTz7 6cLIyxfR0+EtEJm/H0xH7KSdGewWld85WRdBwx2BrsV80cHmAJoHobKy74MlUzaS/8tG 5S6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750422635; x=1751027435; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WULBnkISPZxxwU8WMY3wxka+l2Bfm3M+nI4xBUwxZGk=; b=pzdZHEAUt6ZYVBZnd9LzRtkFa2Nf/kX6BYuyWfIxSmT1ApSZCNdXRNr8SgmAw7vv4G d6RnoWodhxPGSpFunzxEUJj20yn3+RnCz1WiZGvWZ18xOXc5abM+59Vy6uwKywzvwR0v 2C4SP8il3yYZ8jdXuMM/6fR1zs19dGrKrm04L7sN5UsENEKf1KwVVyhne8ZV55BIqTm4 gYpm243dWEBOT9bq2G0p5knzl5bYOgke+DN0CNFCMseCdyWEZHsAA12Gz5dhtD6l5MAV Nkvqpu4IXBcFcF3vq+2DVvIxy0onTlcWRBxhOTBb7KSHgzDxWG1kozRB5SzeHxWqIeom wUrA== X-Forwarded-Encrypted: i=1; AJvYcCUGiB6xCauzqiaFLCjLJxPLH+UiDpQCXl3q7DvM25N7H1jZd06pfj7683t+q8S/F1wxez4XW0RZfo2BFwOB@postgresql.org X-Gm-Message-State: AOJu0YwKJh401x66CWTkc04KCRTBW8fCECvV+U5TIXoYuXFcsIlpj3wg qm54YXRNagfIcgzB0Zzx3kJK4H8odYLzZw+u88CZQBQVkgkXvD33eZbsIV0FfoFgcN3Vlg+d9gc Tj5bZ4r1Ejj48RubaE7V2jrt5hQM2AAE= X-Gm-Gg: ASbGncsD9KUHdoyO43FvzU5rUkIPdiXD7Fam6JZmh1y940A+A3qSjxsdEthNDOniEWU Ib6K6QApTweEaInjH6FsJUvJWFf9nxvVhqEJ8VNHdfhxZqX1yoORraI83zupLT5U5UUbZy5kiRv 4BKz46ucJOl07crz8pZ51uskMaAa4w6D+gFm01/E6qFFK6 X-Google-Smtp-Source: AGHT+IF/PFxh0lxs4RXRie2WFBN7BbcdhnrQa2x79OrrjXbFH9QiPu8yJ6RYlf/nFYxMYe5Skopml9R9vnCdpxqRUPQ= X-Received: by 2002:a17:90b:3508:b0:312:639:a064 with SMTP id 98e67ed59e1d1-3159d8e0b60mr4462269a91.28.1750422634920; Fri, 20 Jun 2025 05:30:34 -0700 (PDT) MIME-Version: 1.0 References: <54c35fb9-da3a-4754-ab8c-46ed0b612465@vondra.me> <684c70d7-180e-461d-9377-600c2db581ba@vondra.me> <605328.1747710381@sss.pgh.pa.us> <691261.1747755511@sss.pgh.pa.us> In-Reply-To: From: Amit Langote Date: Fri, 20 Jun 2025 21:30:18 +0900 X-Gm-Features: AX0GCFtpQWM4cl_gu31VIe7zhYnsp0ZZrUHYnBrIpWqK7eECvNff958epGNasgI Message-ID: Subject: Re: generic plans and "initial" pruning To: Tom Lane Cc: Tender Wang , Alexander Lakhin , Tomas Vondra , Robert Haas , Alvaro Herrera , Andres Freund , Daniel Gustafsson , David Rowley , PostgreSQL Hackers , Thom Brown Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, May 22, 2025 at 5:12=E2=80=AFPM Amit Langote wrote: > I have pushed out the revert now. > > Note that I=E2=80=99ve only reverted the changes related to deferring loc= ks on > prunable partitions. I=E2=80=99m planning to leave the preparatory commit= s > leading up to that one in place unless anyone objects. For reference, > here they are in chronological order (the last 3 are bug fixes): > > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt > d47cbf474ec Perform runtime initial pruning outside ExecInitNode() > cbc127917e0 Track unpruned relids to avoid processing pruned relations > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly > 28317de723b Ensure first ModifyTable rel initialized if all are pruned > > I think separating initial pruning from plan node initialization is > still worthwhile on its own, as evidenced by the improvements in > cbc127917e. I've been thinking about how to address the concerns Tom raised about the reverted patch. Here's a summary of where my thinking currently stands. * CachedPlan invalidation handling: The first issue is the part of the old design where a CachedPlan invalidated during executor startup -- while locking unpruned partitions -- was modified in place to replace the stale PlannedStmts in its stmt_list with new ones obtained by replanning all queries in the enclosing CachedPlanSource's query_list. I did that mainly to ensure that replanning happens as soon as the executor discovers the plan is invalid, instead of returning to the caller and requiring them to go back to plancache.c to trigger replanning. There were many issues with making that approach work in practice, because different callers of the executor have different ways of running plans from a CachedPlan -- with pquery.c in particular being hard to refactor cleanly to support that flow. The first alternative I came up with is to place only the query whose PlannedStmt is being initialized into a standalone CachedPlanSource and create a corresponding standalone CachedPlan. "Standalone" here means that both objects are "saved" independently of the original CachedPlanSource and CachedPlan, but are still tracked by the invalidation callbacks. But thinking about it more recently, what's actually important is not whether we construct a new CachedPlan at all, but simply that we replan just the one query that needs to be run, and use the resulting PlannedStmt directly. The planner will have taken all required locks, so we don't need to register the plan with the invalidation machinery -- concurrent invalidations can't affect correctness. In that case, the replanned PlannedStmt can be treated as transient executor-local state, with no need to carry any of the plan cache infrastructure along with it. To support that, I further assume that, because replanning and execution happen essentially back-to-back, there's no opportunity for role-based or xmin-based invalidation (as is checked for a CachedPlan in CheckCachedPlan()) to affect the plan in between. If that reasoning holds, then we don't need to register the replanned statement with the invalidation machinery at all. Because we wouldn't have touched the original CachedPlan at all, the stale PlannedStmts in it wouldn't be replaced until the next GetCachedPlan() call triggers replanning. I'm willing to accept that as a tradeoff for a less invasive design to handle replanning in the executor. Finally, it's worth noting that the executor is always passed the entire CachedPlan, regardless of which individual statement is being executed. Without per-statement validity tracking, it's hard for the executor to tell whether replanning is actually needed for a given query when the CachedPlan is marked invalid (is_valid=3Dfalse), making it impossible to selectively replan just one. To support that, what I would need is validity tracking at the level of individual PlannedStmts -- and perhaps even Querys -- in the source's query_list, with the current is_valid flag effectively serving as the logical AND of all the individual flags. We didn't need that in the old design, because we'd replace all statements to mark the CachedPlan valid again -- though Tom was right to point out flaws in the assumption that setting is_valid like that was actually safe. * ExecutorStart() interface damage control: The other aspect I=E2=80=99ve been thinking about is how to contain the changes required inside ExecutorStart(), and limit the disruption to ExecutorStart_hooks in particular, while keeping changes for outside callers narrowly scoped. In the previous patch, pruning, locking, and invalidation checking were all done inside InitPlan(), which is called by standard_ExecutorStart() -- an implementation choice that was potentially disruptive to extensions using ExecutorStart_hook. Since such hooks are expected to call standard_ExecutorStart() to perform core plan initialization, they would have to check afterward whether the plan had actually been initialized successfully, in case an invalidation occurred during InitPlan(). That wasn=E2=80=99t optional, and = it made it easy for hook authors to miss the fact that standard_ExecutorStart() could return without initializing the plan, breaking expectations that were previously reliable. Separately, for top-level callers of the executor, the patch introduced a new entry point, ExecutorStartCachedPlan(), to avoid requiring each caller to implement its own replanning loop. But that approach was also awkward, since it required switching to a nonstandard function just to get correct behavior. What I=E2=80=99m thinking now is that we should instead move the logic for pruning, deferred locking, and replanning directly into ExecutorStart() itself. In the reverted patch, callers were affected mainly because they had to choose between ExecutorStart() and a new entry point, ExecutorStartCachedPlan(), which existed solely to handle invalidation and replanning. That divergence from the standard API made things awkward at the call site. In contrast, the design I=E2=80=99m proposing avoids any need for new execu= tor entry points -- ExecutorStart() retains its original signature and behavior, with the added benefit that replanning and pruning are now handled internally before hooks or standard initialization logic are invoked. The design requires moving some code from standard_ExecutorStart() -- specifically the code that sets up the EState and parameters -- and from InitPlan() -- namely, the parts that initialize the range table, partition pruning state, and perform ExecDoInitialPruning(). The callers of ExecutorStart() do still need to ensure that they pass the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc via CreateQueryDesc(). The executor=E2=80=99s external API remains unchange= d. Importantly, this restructuring would not require any behavioral changes for existing ExecutorStart_hook implementations. From a hook=E2=80= =99s point of view, this is a code motion change only. Hooks are still invoked at the same point, but they=E2=80=99re now guaranteed to receive a plan that is valid and ready for execution. This avoids the control flow surprises introduced by the reverted patch -- specifically, the need for hooks to detect whether standard_ExecutorStart() had completed successfully -- while preserving the executor=E2=80=99s API and execution contract as they exist in master. I=E2=80=99ll hold off on writing any code for now -- just wanted to lay out this direction and hear what others think, especially Tom. -- Thanks, Amit Langote