Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w30RR-000oHu-2V for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 23:36:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w30RP-00FUZU-2W for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 23:36:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w30RP-00FUZL-1M for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 23:36:43 +0000 Received: from mail-qv1-xf2d.google.com ([2607:f8b0:4864:20::f2d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w30RM-000000011Yh-2KdF for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 23:36:42 +0000 Received: by mail-qv1-xf2d.google.com with SMTP id 6a1803df08f44-89c55a0a470so5750576d6.0 for ; Wed, 18 Mar 2026 16:36:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773876999; cv=none; d=google.com; s=arc-20240605; b=Zym7p2TtsDnUAu26JNUikw1ADQVGVS4W46ef3TRrt1jLMcYjy3CNq61CYMFFOPrkrp 7R05BklQ34hGSkZpVNlOfQhCZvO/UqpJNea8XPJcwWqdwu2k2wJBpErhLCa7OL+j0heA +dEDmkXtyH0E+0OTRzlebaIfVNoe7+Ohm0LWBb9sIJtYNdkViRa+ZlBnVJi26ONeIj+A j94cJjzWFxz6LgkNFN3ayc+dif66pV67wjJWH4rhytl+ITYr/ns9BKLNB+L8C8pPFBPI mbFMHMdap46x7X+/udtbLhjNbAqnakCkwwH+RvASQjUJhy9+JcZSwkui5t7pdAFEofIf +Ptw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=L1H16wCDljVfaNijB0USG090uSWvD0SvB36v499SRj4=; fh=EKO2KrSnR0sQyp7otMpFCbFnqjFc30nwVnS+knl8zss=; b=BAFevFOhQwb/S8fuIsjZ176Y29Gf5L6/nYxsgHyf6GJ7Sn15gp4gUUDFYh2BTDtu5E IWuxFRL+K3vyjB37UoilzC2XR+zC0aIwz1mzF08tES3nVCvAJM5BWj039lzHZP07o8Sj XBFf7MDMbLKDY0eaX0z5nCJNKtcbT31c8wxVTDnG+XusMrz+wPXa+H7jJj/+BFNzc6Yg e+peH4Kbb/AsbUKLcw/QJR4a12NrKjKGUeSQo8zdVkBB3jULS8/6owgG3ewLeuxD2Kci PfFOOUWPjZ9Ebbo5fXv3A9z33fviuvPP1Y/rjdNGiWEW76DXlKNgVNrqSNiUO2GRuuDg 0V6w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fittl.com; s=google; t=1773876999; x=1774481799; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=L1H16wCDljVfaNijB0USG090uSWvD0SvB36v499SRj4=; b=YejRWbWV297ICcBLD6nQ/2xcdKCkstTCmx5AUJxbX19odOFKlwoK2Pba8ZEr5N9RjV NY0LSJvJ+jvEsStgdKQTorR8hqmWwJpIEEZXqunsMfQ/4AF/lBoKwPZ2QZrlEnW5aeOI JeCWO40DkJa4VZdbcOeJmR+vRozL0IqkY7kuQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773876999; x=1774481799; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=L1H16wCDljVfaNijB0USG090uSWvD0SvB36v499SRj4=; b=To5ciqZ8j8Y09CBLeJHG/6qCJlo5Y/THnnVFmo1brdiy4CFRjKOo168Bcg4FCSjYrA 2Esg2d9w2ta93oWd5O3k56l1Of5uBXyNIl4xVvk8u79wZwsfHw3+YDuYjq3N12r3Q/0y zd9Uzse0SYZH8+GCyUfyr6ZfUyW/W8Hy81jGL8Ve1JQfu9lUqjxjJSwe6PLJt8y61nTR rPFfukEgL5jOIkxFCse75rvgaseOZ0fsaxPgTXpOZHv71PRq5bEGj4Bkjf/OekVFF8LD t4lFmCDJekyvwYSXmjXc0ZAvF2gDoJsTGU+NFsKptOSiLcc9v88hm+dHghsFi0VlNidZ 4NFA== X-Forwarded-Encrypted: i=1; AJvYcCXtNZ0SDPBqQYB6ToPgF+4hnylF0wzqb70WMzVHV7dUuoTBbIiMPjRye0Ik6j5hfOmMYvcEnQ64Hr+o+EGd@lists.postgresql.org X-Gm-Message-State: AOJu0YyDH5ZOm++J5pX2DWEEhjzj7jvEtRfp1drTo2CtqS3NIlM5YZQP YdXuMnEXu6d3/zIxomlqbR1DcbpP4xBZfutdVpXdFC4NKy0b8rZowoldOrR5FKpHwWyggqXAhnk oCdgdhEaHbI+SPjRS5hIYeu2luI6nz3/Z1LEL7vup X-Gm-Gg: ATEYQzzrrq+vK46z8nMOFtyHUfD/TF4Q0l0h6rn6p/EluVGkWTWmUlSmI601k4wKlWM DB6zjDJJ8eQZIF/Uiebv8TIPD8F1DGY+F5yjdM9xJLb9lIkPuyPqwDEVWgS45yfMrg+dUX/UNsZ +3988g/Jg5kUVsDWjsUkGuuqmWZbuMy5PfQ9mkvn20TrfWiBaNVgXvjSVF64GpL3dmQhu6sd2En mPforj2Pg5ON27gqlZQ+3VCAKdyVsY2lgacQCiKilVlqtF2hBgfVvoCpuJ9TSrU/MPorYA2NNbu lmkDZnE5++E4sxiayXslBEu5Up4n0RLyPn7pcMCfi6n+Qlw9qwNeOnVcxdXoTrtMA0WiNQcc X-Received: by 2002:a05:6214:3f8d:b0:89c:551e:1cd8 with SMTP id 6a1803df08f44-89c6b5f2dafmr85851526d6.62.1773876999316; Wed, 18 Mar 2026 16:36:39 -0700 (PDT) MIME-Version: 1.0 References: <06086cb4-881b-4f5a-96af-f275220ff52d@vondra.me> In-Reply-To: From: Lukas Fittl Date: Wed, 18 Mar 2026 16:36:03 -0700 X-Gm-Features: AaiRm52DUtKKpFFOVClhi38weQv41IApq1oQ6DN_AL5EJbnouZH_Z1L2mQB3Sn4 Message-ID: Subject: Re: Stack-based tracking of per-node WAL/buffer usage To: Zsolt Parragi Cc: Tomas Vondra , PostgreSQL Hackers , Andres Freund , Peter Smith Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Mar 18, 2026 at 1:49=E2=80=AFPM Zsolt Parragi wrote: > > + instr_stack.stack_space *=3D 2; > + instr_stack.entries =3D repalloc_array(instr_stack.entries, > Instrumentation *, instr_stack.stack_space); > > Can't this also cause issues with OOM? repalloc_array failing, but we > already doubled stack_space. > The initialization above uses the same order, but that should be safe > as entries is initially NULL. Yeah, that's fair, I think its reasonable to do the update of instr_stack.stack_space after the allocation succeeded. I also think its good to do that even for the initialization case - it seems extremely unlikely, but the fact that entries is NULL doesn't protect us against a subsequent InstrPushStack running after an OOM and the check whether InstrStackGrow should be called runs based on the stack space, not the entries array. I have a fix for that staged for the next iteration. > + * 2) Accumulate all instrumentation to the currently active instrumenta= tion, > + * so that callers get a complete picture of activity, even after an = abort > ... > + if (idx >=3D 0) > + { > + while (instr_stack.stack_size > idx + 1) > + instr_stack.stack_size--; > + > + InstrPopStack(instr); > + } > > There seems to be one more bug in this: > > 1. EXPLAIN ANALYZE fires a trigger > 2. The trigger function throws ERROR, InstrStopTrigger never runs > 3. ResOwnerReleaseInstrumentation runs but only checks > unfinalized_children, not triggers > 4. InstrStopFinalize discards the trigger entry > 5. Trigger instrumentation information shows 0 Hmm, so I think you're correct that a trigger function error would cause any stack-based instrumentation from the trigger to get lost. In practice that doesn't matter today, since triggers never capture WAL/buffer usage data (only timing), but its maybe a design flaw because trigger instrumentation is its own thing without a defined relationship with the stack, unlike NodeInstrumentation which is registered to the query that handles the aborts. In a sense this is a similar situation to the EXPLAIN (SERIALIZE) per-tuple handling we talked about previously - we have instrumentation that's related to a query, but its not per-node, so using NodeInstrumentation doesn't really make sense. I could imagine three ways forward, if we want to address that now (vs documenting that this isn't handled but effectively not a problem): 1) We add a second list for unfinalized trigger instrumentations on QueryInstrumentation, and accumulate that too in ResOwnerReleaseInstrumentation 2) We call InstrStopFinalize in ExecCallTriggerFunc if an error was thrown from the trigger function (i.e. turn the existing PG_FINALLY block into a PG_CATCH) 3) We generalize the QueryInstrumentation children handling to allow other types of Instrumentation (i.e. not just NodeInstrumentation) I think (3) would be ideal and would let us deal with EXPLAIN (SERIALIZE) too, but is complicated by the fact that ResOwnerReleaseInstrumentation needs to have reference to the full allocated struct (not just the Instrumentation contained within) so it can call pfree to avoid leaking memory until top transaction end. In a prior iteration we had the Instrumentation allocated separately inside NodeInstrumentation (so its a pointer and can thus be freed independently / replaced with a copy on clean exit), which allows ResOwnerReleaseInstrumentation to just deal with Instrumentation, but that becomes inconvenient when dealing with parallel workers. There is an alternate design I had considered, which is to basically keep two copies of Instrumentation in NodeInstrumentation: One is a pointer to the running instrumentation (allocated in a memory context that survives long enough in an abort, and which will be freed upon abort or clean exit), and one is a direct member of the containing struct (like we have it today), and gets updated via a memcpy() upon a clean exit. I think that'd make the API easier to use and the same concept could then be applied to TriggerInstrumentation, but the big downside is that we'd be doubling memory usage because whilst we're running we'd both have the allocation in the higher memory context, and the direct member of the containing struct (to return the result to the caller). Because of that, I feel like we should do (1) or (2) for now - but I'll also wait if Andres or others have additional feedback on 0005 before proceeding with further changes. I also do think that the 0001-0004 patches are good to be committed unless anyone had additional feedback there. Thanks, Lukas -- Lukas Fittl