Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuGUC-005uH3-Ax for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Sep 2025 20:23:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uuGUB-0027ED-FO for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Sep 2025 20:23:11 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uuGUA-0027E5-JZ for pgsql-hackers@lists.postgresql.org; Thu, 04 Sep 2025 20:23:11 +0000 Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1uuGU9-000Yuf-07 for pgsql-hackers@lists.postgresql.org; Thu, 04 Sep 2025 20:23:10 +0000 Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id 17BC27A0563; Thu, 4 Sep 2025 16:23:08 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-06.internal (MEProxy); Thu, 04 Sep 2025 16:23:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1757017387; x=1757103787; bh=e3Npj/4cBi YIJNGLKPm0p8IpU7IbTh5mNsn2oxJK/SY=; b=jS6kHxKDqW+vt/dOof/1/hAwIb Tthibj7T7EiBadnTMBMChlgISXUCbgxu7qTdPxI4JLa6tzXC/pE7uQ+RPx6eZ3OD gnYUQEZWgs8WkN8vTIEi3HQioEMcGIIDBFiN722Dxc5w1gLUuPjQ1acmet2nnc1C S33qgGEKuqnmxj+yj9N/7Ta39rhyyXIpHcJMSFVbxzRcqe8Cuvz3alS2kNf/VxDu lR33w8Xi5BxdTbadu/SN3VAQYFqZ5J1f0qhxX/MmoFA5zZrDp7bEKNtqRBHaOxbS l74gZR9Ih//iqmw9mtbEYZ7iRDplruvUJ6hPZUva5ipsDzks11pPAtVDCpKA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t= 1757017387; x=1757103787; bh=e3Npj/4cBiYIJNGLKPm0p8IpU7IbTh5mNsn 2oxJK/SY=; b=WAtMb+M0Z9Sng7/IzGdgwMMlhEjMAti0e6scgv01qLCCHTi4a7x OI9bFeEzZHq/vcqfe46MjBqxixm9Znxujyco4N9L6zugt2GgmvunCv5AYjjyJ1rb SUi2S87nRRJURoZC8G0ZL8np9gUwyU0V7v5ETEuq6plDyi/vzzpEIjpvN4CpKR1U fgzZLSXwEkOW0zBAWfCQnBdiBIdQm+I83Vz80aYxzkjuAyBY3re3GY7bVnzv65VJ hhphO8lILw5SnTRJQjWG7LTuDkd3LNqJ9SPjNofDa0o6VyA5YHUhvB4Aah/2X3pf pnDruSyvtg+xto80BJZljXYZIy73o5VGkZA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdeggdeileehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceurghi lhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurh epfffhvfevuffkfhggtggujgesthdtsfdttddtvdenucfhrhhomheptehnughrvghsucfh rhgvuhhnugcuoegrnhgurhgvshesrghnrghrrgiivghlrdguvgeqnecuggftrfgrthhtvg hrnhepfeffgfelvdffgedtveelgfdtgefghfdvkefggeetieevjeekteduleevjefhueeg necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprghnug hrvghssegrnhgrrhgriigvlhdruggvpdhnsggprhgtphhtthhopedvpdhmohguvgepshhm thhpohhuthdprhgtphhtthhopehluhhkrghssehfihhtthhlrdgtohhmpdhrtghpthhtoh epphhgshhqlhdqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 4 Sep 2025 16:23:07 -0400 (EDT) Date: Thu, 4 Sep 2025 16:23:06 -0400 From: Andres Freund To: Lukas Fittl Cc: PostgreSQL Hackers Subject: Re: Stack-based tracking of per-node WAL/buffer usage Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-08-31 16:57:01 -0700, Lukas Fittl wrote: > Please find attached a patch series that introduces a new paradigm for how > per-node WAL/buffer usage is tracked, with two primary goals: (1) reduce > overhead of EXPLAIN ANALYZE, (2) enable future work like tracking estimated > distinct buffer hits [0]. I like this for a third reason: To separate out buffer access statistics for the index and the table in index scans. Right now it's very hard to figure out if a query is slow because of the index lookups or finding the tuples in the table. > 0001: Separate node instrumentation from other use of Instrumentation struct > > Previously different places (e.g. query "total time") were repurposing > the per-node Instrumentation struct. Instead, simplify the Instrumentation > struct to only track time, WAL/buffer usage, and tuple counts. Similarly, > drop the use of InstrEndLoop outside of per-node instrumentation. Introduce > the NodeInstrumentation struct to carry forward the per-node > instrumentation information. It's mildly odd that the two types of instrumentation have a different definition of 'total' (one double, one instr_time). > 0003: Introduce stack for tracking per-node WAL/buffer usage > From 4375fcb4141f18d6cd927659970518553aa3fe94 Mon Sep 17 00:00:00 2001 > From: Lukas Fittl > Date: Sun, 31 Aug 2025 16:37:05 -0700 > Subject: [PATCH v1 3/3] Introduce stack for tracking per-node WAL/buffer usage > diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c > index b83ced9a57a..1c2268bc608 100644 > --- a/src/backend/executor/execMain.c > +++ b/src/backend/executor/execMain.c > @@ -312,6 +312,7 @@ standard_ExecutorRun(QueryDesc *queryDesc, > DestReceiver *dest; > bool sendTuples; > MemoryContext oldcontext; > + InstrStackResource *res; > > /* sanity checks */ > Assert(queryDesc != NULL); > @@ -333,6 +334,9 @@ standard_ExecutorRun(QueryDesc *queryDesc, > if (queryDesc->totaltime) > InstrStart(queryDesc->totaltime); > > + /* Start up per-query node level instrumentation */ > + res = InstrStartQuery(); > + > /* > * extract information from the query descriptor and the query feature. > */ > @@ -382,6 +386,9 @@ standard_ExecutorRun(QueryDesc *queryDesc, > if (sendTuples) > dest->rShutdown(dest); > > + /* Shut down per-query node level instrumentation */ > + InstrShutdownQuery(res); > + > if (queryDesc->totaltime) > InstrStop(queryDesc->totaltime, estate->es_processed); Why are we doing Instr{Start,Stop}Query when not using instrumentation? Resowner stuff ain't free, so I'd skip them when not necessary. > diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c > index d286471254b..7436f307994 100644 > --- a/src/backend/executor/execProcnode.c > +++ b/src/backend/executor/execProcnode.c > @@ -823,8 +823,17 @@ ExecShutdownNode_walker(PlanState *node, void *context) > > /* Stop the node if we started it above, reporting 0 tuples. */ > if (node->instrument && node->instrument->running) > + { > InstrStopNode(node->instrument, 0); > > + /* > + * Propagate WAL/buffer stats to the parent node on the > + * instrumentation stack (which is where InstrStopNode returned us > + * to). > + */ > + InstrNodeAddToCurrent(&node->instrument->stack); > + } > + > return false; > } Can we rely on this being reached? Note that ExecutePlan() calls ExecShutdownNode() conditionally: /* * If we know we won't need to back up, we can release resources at this * point. */ if (!(estate->es_top_eflags & EXEC_FLAG_BACKWARD)) ExecShutdownNode(planstate); > +static void > +ResOwnerReleaseInstrStack(Datum res) > +{ > + /* > + * XXX: Registered resources are *not* called in reverse order, i.e. we'll > + * get what was first registered first at shutdown. To avoid handling > + * that, we are resetting the stack here on abort (instead of recovering > + * to previous). > + */ > + pgInstrStack = NULL; > +} Hm, doesn't that mean we loose track of instrumentation if you e.g. do an EXPLAIN ANALYZE of a query that executes a function, which internally triggers an error and catches it? I wonder if the solution could be to walk the stack and search for the to-be-released element. Greetings, Andres Freund