Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wAEuK-002C51-09 for pgsql-hackers@arkaria.postgresql.org; Tue, 07 Apr 2026 22:28:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wAEuI-002tiv-1Z for pgsql-hackers@arkaria.postgresql.org; Tue, 07 Apr 2026 22:28:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wAEuI-002tim-08 for pgsql-hackers@lists.postgresql.org; Tue, 07 Apr 2026 22:28:26 +0000 Received: from mail-qk1-x731.google.com ([2607:f8b0:4864:20::731]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wAEuG-000000016yM-1iPM for pgsql-hackers@lists.postgresql.org; Tue, 07 Apr 2026 22:28:25 +0000 Received: by mail-qk1-x731.google.com with SMTP id af79cd13be357-8cfc5941028so924831785a.1 for ; Tue, 07 Apr 2026 15:28:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775600902; cv=none; d=google.com; s=arc-20240605; b=FY7ehPyhWkw3wYu4pWOtVmdpK23QsD6RBYSplMaIwhgX7r9iEn4Ba08QVGXKZrefpU aVcHZ02ryW4usZIsCW5i2lJTh4b3Vv/z3Dk7erdxdVXROV0dRyjuciRmVaGhLKla9zkd ThAJheJ28nfuA8hzpXQ3W03EUQkKJusVqQEEDy9U/y+Bkj9cgr7WzXB/RhF6HqabfkSU hz/UPxFPTImUqyR/dDxsCcPODVZCB7MByYt7ozT382gdvbXVDUqTlGa3cvjBROBANrXT CcHsuZH0go4szM+0wEKmIrUMN3c3lkWUMw4cluHLpH02GEoelFnqxqx8nLWcdwHlhpG4 Nd8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Eux48GjkTF+ZI5andsmtOGryzAI6nRYX3VhsLo911QE=; fh=RaW6yj/wXbNAt3m0BR61ceqNzdcVGqKwDanbGkF0QzA=; b=IZQtmuAnVAxGFnfjktdTzkydp4t3JmGmWBBZ1RBceU3IPmivPWZHsIZlqWeMuekZOo xt/Pqzb/G2gluBfbUx+t+yK9cGLOxA9yQzTnQvMDr4H7/BdnA87M+rAsDbZ47qJXhpKY 6lVoRBjAAVU+g71xpV+RouSSTs5Bp6A/ByHSHzToQDwlEZ5SJV1H/wu4CtFtI5pHfMNO NwDD0+IJsxPHHLzpUgGEwT5rEX2+TKW613Ft9V1HcidfYvcChgh76+Sqq9kmp7dV8Wh8 USsYALInSV9ucCyOK/GqU+2ZiZng3yIWmuF6+gW8nVZYoMJ6aSW9hPZUvAYMk8opyNQO u7eg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fittl.com; s=google; t=1775600902; x=1776205702; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Eux48GjkTF+ZI5andsmtOGryzAI6nRYX3VhsLo911QE=; b=SjMkoq1EF4/FPvPuOCKETuF7nalstWuhJ++Mbsp2LJrD1ZQomzq0KchmyeL6uP+5Ne 4+KuuF2iYnN8VEWqu6IAb2j884K1Rm/SL/TDlzwnsMdeH3O5gdV2E7j814qLS4kv1/p/ z5rb++7qZuAAXMkgvEPzjr6BHDLFqE/5k+Vgg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775600902; x=1776205702; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Eux48GjkTF+ZI5andsmtOGryzAI6nRYX3VhsLo911QE=; b=ZRtVJWQB03Br/MiBybEhCTHvoxrYRrYWV1oiw7uOjAJgIoRLtfj65x8UYJydXl80/C Wnh5JjT4PpUVg6MAU8Xa5t9B9KZGKM71Id4WpxgNIP2a04HFEGtvGXABwtWi/1w4KL3r Phey+ALTheUvF0nypaRUf0RStRxJEfot/OcDC0EWQwMn1XF08dS1WRdk1HYdtNk417oW /fgm9jZ6d5BxxJcHNfnEZrKHoHUJdreJqFRwaRKfHnQNrWmiYYTpjn0VozgbLL2d1XIz jrnsOrPlOYO3KU+xhuh4mF4ltmAtRS+WDOzvKaaHQ7BnXflxntKB+etxepYe//FX4N/J B+XA== X-Forwarded-Encrypted: i=1; AJvYcCXsG1weNI4e5024dSTjUd6nZSBsxnDNVz+z6jMs3dQaXJVciYKEqpI02fvHBndt9UOLKNxzM4iAk3R41TnU@lists.postgresql.org X-Gm-Message-State: AOJu0Yx/hI6wYF33Fqe+RV8Bl97PrzjFxJwjb9iWSbpbnzBjIliqhO1w hvE6FCqWPR2VAZbWlxM6tCFlVzawE8aLx0oY2Sq5DVx0+9h9OpINMAcVm6ae9a/YigzUxoVqxBO Y74iLzROFA8fSyiXZdpY+j3eQO2IPwb8NcyRCFLv+ X-Gm-Gg: AeBDiesRrvLVDqJD30q2NMpC4+mghSTVIwAF8MWhVlqiR6ol+OoRAe0H8RuG3K8kpLP V9v+uGdy0ZI5o/Bs3pg3jXSJxOhhW4LJRB/K/StWN+36zbyQhnYMPIhTSUlRxFOY2u4Hhuwy9lJ IGmhQ5nVUUaFmb5sWGrGeLwH66o7H/PHPWQArthEK7TU6rK7p92cWhWHWsK1yMqShnUP0XgXOhJ rQ4MG0phUNiTi96YZSn4D/Fh9CO7pvYiFQBDrPM/2VJUpdCDhMSLYX4NGzTD2kU8rWYCnhGw6Mu +Pj5vzHr5vy8OEjN3eN0PLWI1hWFGpuaIVVLgt4alqa78sCRXDwlAiWD0iConRgT+V0OoRlo X-Received: by 2002:a05:620a:29c1:b0:8ca:3c67:8925 with SMTP id af79cd13be357-8d41bfc4cb6mr2704827885a.64.1775600902536; Tue, 07 Apr 2026 15:28:22 -0700 (PDT) MIME-Version: 1.0 References: <57biou6l65r7gr4nunoe6lignz2x6m3w45gihoypaez4pc46di@txj3bakhj66l> <3xbje45m5knff52mye5dfnrjdnwv7it2bzmqac3jqe66fvop4a@xvhy6zx7n6sb> In-Reply-To: From: Lukas Fittl Date: Tue, 7 Apr 2026 15:27:45 -0700 X-Gm-Features: AQROBzC9MbF3qGGU09JtLHsJRf-y38Lth0_shgyyXWAyiKfovc109bEA5f4veEA Message-ID: Subject: Re: Stack-based tracking of per-node WAL/buffer usage To: Andres Freund Cc: Heikki Linnakangas , PostgreSQL Hackers , Tomas Vondra , Peter Smith , Zsolt Parragi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Apr 7, 2026 at 3:19=E2=80=AFPM Andres Freund w= rote: > > Hi, > > On 2026-04-07 13:30:11 -0700, Lukas Fittl wrote: > > 0001 is the change to make queryDesc->totaltime be allocated by > > ExecutorStart instead of plugins themselves, and adds a > > queryDesc->totaltime_options to have plugins request which level of > > summary instrumentation they need. This change is pretty simple, and > > could still make sense to get into 19. Because of the earlier > > Instrumentation refactoring that was pushed (thanks!) we're already > > asking extensions allocating queryDesc->totaltime to modify their use > > of InstrAlloc, so I think we might as well clean this up now. > > Hm. That's a fair argument. They indeed would have to again change next > release > > It's not a complicated change and removes more lines than it adds. > > I guess one thing I'm not sure is whether the fields shouldn't be renamed= at > the same time: > > a) To prevent extensions from continuing to set it, most of them do not t= est > against assert enabled builds. With a different name they would get a > compiler error. > > b) "totaltime" and "totaltime_options" are pretty poor descriptors of tra= cking > query level statistics. If everyone has to change anyway, this is a g= ood > occasion. > > 'query_instr[_options]'? > > > Any opinions? I think renaming makes sense - both to make sure extensions reconsider how they use it, and because "totaltime" is a bad name anyway, because its not just about timing (and hasn't been for many releases). "query_instr[_options]" seems reasonable to me, although we could drop the "query_" since it'd be "queryDesc->query_instr" vs "queryDesc->instr". > > > 0002 is just ExecProcNodeInstr moved to instrument.c, as Andres had > > suggested previously. We still get some quick performance wins from > > doing that (see end of email), and again, its a simple change, so > > could be considered if someone has bandwidth remaining. I've added a > > later patch that then does the more complex inlining and gets us the > > full speed up. > > Here it needs a few more inlines to get the full performance, otherwise i= t > doesn't inline all the helpers. I think on balance I didn't like the > prototype in instrument.h, that's too widely included, and it might even = cause > some circularity issues. It seems better to do the somewhat ugly thing a= nd > have the prototype be in executor.h. Yeah, that makes sense. > > > 0002 measurements (with current master and TSC clock source used for > > timing, best of three): > > > > CREATE TABLE lotsarows(key int not null); > > INSERT INTO lotsarows SELECT generate_series(1, 50000000); > > VACUUM FREEZE lotsarows; > > With the somewhat more extreme benchmark I used in the rdtsc thread and t= he > added inline mentioned above I see a bit bigger wins. See the attached > explainbench.sql - it doesn't quite cover all the combinations, but I thi= nk it > gives a good enough overview. > > c=3D1 pgbench -f ~/tmp/explainbench.sql -P5 -r -t 10 > > master: > statement latencies in milliseconds and failures: > 200.800 0 SELECT pg_prewarm('pgbench_accounts'); > 0.098 0 PREPARE query AS SELECT * FROM pgbench_account= s OFFSET 5000000 LIMIT 1; > 212.010 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING= OFF) > 268.648 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING= ON) > 232.421 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING O= FF) > 283.531 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING O= N) > 0.030 0 DEALLOCATE query; > > > 0002: > > statement latencies in milliseconds and failures: > 201.558 0 SELECT pg_prewarm('pgbench_accounts'); > 0.103 0 PREPARE query AS SELECT * FROM pgbench_account= s OFFSET 5000000 LIMIT 1; > 188.696 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING= OFF) > 244.479 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING= ON) > 223.773 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING O= FF) > 266.947 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING O= N) > 0.034 0 DEALLOCATE query; > > That's something like 4-12%. > > Pretty nice for a patch that just adds a few lines around and adds a few > inlines. Agreed. > > @@ -334,6 +334,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int efl= ags) > > > > if (auto_explain_enabled()) > > { > > + /* We're always interested in runtime */ > > + queryDesc->totaltime_options |=3D INSTRUMENT_TIMER; > > > - queryDesc->totaltime =3D InstrAlloc(INSTRUMENT_AL= L); > > Not that it's going to make a significant difference, but it is nice that= this > now would need to track less. Yup. > > Kinda wonder about having > EXPLAIN (ANALYZE BUFFERS totals_only, WAL totals_only) ...; > > in plenty cases that'd be all one needs, at substantially lower cost. True. I don't like the name "totals_only", but I like the concept. Today someone has to go to pg_stat_statements to get just the total numbers, without running them for all nodes with EXPLAIN ANALYZE (and incurring its overhead). Thanks, Lukas --=20 Lukas Fittl