Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wTuk9-000pTm-0V for pgsql-hackers@arkaria.postgresql.org; Mon, 01 Jun 2026 04:59:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wTuk5-008rsw-2j for pgsql-hackers@arkaria.postgresql.org; Mon, 01 Jun 2026 04:59:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wTuk5-008rsn-1V for pgsql-hackers@lists.postgresql.org; Mon, 01 Jun 2026 04:59:13 +0000 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wTuk3-00000000b4U-1miX for pgsql-hackers@postgresql.org; Mon, 01 Jun 2026 04:59:13 +0000 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-36dac5d5da0so274555a91.2 for ; Sun, 31 May 2026 21:59:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780289948; x=1780894748; darn=postgresql.org; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:from:subject:cc:to:message-id:date:from:to:cc:subject :date:message-id:reply-to; bh=Q0BvfVW9dEdVKkHQqhIBmKj5ltkSqAX5VU5izVm7YLQ=; b=WsgUma7FARiEAqWRuvTyJgPTQrI6Rw0gvwxgg2M0KsuaQWiMfYOCIqCoBM43RPcbf8 em+2pSmatRNtdDnciHqQEFuhLyAgkrL0AYoWtR5QlWi2A4/gyuYEeN0lXS0T0lGWdGo5 b6+52Bsf4Eks8k2OqVewZIhKs3M7uSiKMlGgnO9kxT8qCkIWOrFeWG0NfsBrC6lpczJJ x2vVfjmiwTscItFXmr9OhpzUGvda0sq4phQ79xB8PaD7k4ttxIlqKMcghgP/Cxd+ZQTl 1zOs6Pm/dzktIC4SqckCPqOVLG9k1a1gKemUPehhxSfYm1Q31Oeo330izfL9ajG1ULXf cjMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780289948; x=1780894748; h=content-transfer-encoding:mime-version:user-agent:references :in-reply-to:from:subject:cc:to:message-id:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q0BvfVW9dEdVKkHQqhIBmKj5ltkSqAX5VU5izVm7YLQ=; b=TXqUgPiWJrxqIrkWXdkjMKuRmttvfQRGr7qBYXsNBm6hPT9cmuqrXyqke/CkrtP4Hj eqrRV3OaPKgVPxXMj6HVdELtUXTSsPs3+Fj5pzOJRMVe3+umyMxb9Qy8c0dIXoBNcAa9 X+HQpE9SwACi4rtqqaDm3X7zho+6P0KPw8JkYDVqWnVmDo3IoDWTqyyG7SWsQDQhy7Ft alCnrTFP7hnSYSizW5fh9Z0KIW7kduOGUlzEnf61GWsT/ohyvtP0dICCi8g9pb0awz2I H/dK6LlR41vtvO1FzeCK5hF8FraHKdrIKskcSzncLGkqjE/Y5cH82HW1elSCYyKO6Vcj ob/w== X-Forwarded-Encrypted: i=1; AFNElJ85unKvPZyJ7FR12Zk/5bOoP+0noA5s2nC9G6WQHc33U3949TH6QG7VXzKJq8fiNff1R2fl5e2Vcbw/6ilo@postgresql.org X-Gm-Message-State: AOJu0Yy5g9UUgnmR+Qx3U2yAMezkIMffNw+ge2RdiH4IVXLberpfX2xf Wu01FFdd0hIbpfvnQdPX5U2tGyR8M5dUAMJIgf9DnWfWoQRGIXG7FHer X-Gm-Gg: Acq92OHy3auGA+9NLDGCfl2rOea6v4Nb23DB6F4Yn29xBqw/Y2FbYEvW2Yp0EwMCAWi ZWW61ChTdld+6mzignacc0ddPWs6O5Qdb+XrC+ODY1xLkjAN7e9IzLVMWD/xuSwdxGo+TG8cs/4 oADXqq67iTq9Hf3Vs7/kgMHPUY9oQdft9SacOigpABDpqXP0T3+9aEa9QqSq0LaYN9DUrVCijWz eH1YoK+Z+nSZhYXWncsDoc7iWncYxGe1NAl/P3dfcRHanmWg2AVvuGvtOY8emLMZB2P5MxihUo7 58rSrdRlseQ8sJCf1u6S4Ig9gRPNhu1qKM+r/qYmnanBgMRX3lQehX6awIbvAuXBahZSQUzLswh GNgC1TaWM8D+FE865O1pEMzLYCOTy+ai94LnAYIM7TzIc1wBiuCP/zBJGpz6Cs1WOQpn3IXgztU CySnLTWUFHBQfta7nrVap9JR3eBSWq9FwwvlDtqRePBKQB63HJZUtA0bxB9TniUOxhncZqg4HoF QIoO/pCyQ== X-Received: by 2002:a17:90a:ce18:b0:366:2e1f:393 with SMTP id 98e67ed59e1d1-36c501fc1d9mr5577386a91.21.1780289948168; Sun, 31 May 2026 21:59:08 -0700 (PDT) Received: from localhost (KD036014041111.ppp-bb.dion.ne.jp. [36.14.41.111]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-36bbfc971c8sm9204781a91.3.2026.05.31.21.59.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 May 2026 21:59:07 -0700 (PDT) Date: Mon, 01 Jun 2026 13:58:58 +0900 (JST) Message-Id: <20260601.135858.1116584574478485492.horikyota.ntt@gmail.com> To: samimseih@gmail.com Cc: lukas@fittl.com, pgsql-hackers@postgresql.org Subject: Re: Improve pg_stat_statements scalability From: Kyotaro Horiguchi In-Reply-To: References: User-Agent: Mew version 6.8 on Emacs 29.4 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello. I only reviewed v3-0001 so far, and the comments below are limited to that patch. Some of them are higher-level observations rather than comments on individual code fragments. +Datum +pg_stat_report_anytime(PG_FUNCTION_ARGS) The name pg_stat_report_anytime() feels a bit confusing to me. While "report" makes sense from an implementation perspective because the function ultimately reports local statistics to the shared statistics subsystem, that meaning is not very obvious from the perspective of a SQL-visible function. Likewise, I'm not sure that "report_anytime" clearly conveys what the function actually does. Although this is not intended for general users, we already have pg_stat_force_next_flush(), and it seems preferable to follow a similar naming convention. With that in mind, perhaps something like pg_stat_flush_anytime_stats() would be more descriptive. +void +pgstat_report_anytime_stat(void) In the previous thread, I understood that there was discussion about introducing a flush interval of around one second, and that the direction later shifted toward reusing PGSTAT_MIN_INTERVAL. While the introduction of a manually triggered API changes when flushes are initiated, it does not seem to change the need for limiting the flush frequency itself. As far as I can see, the current implementation does not apply any throttling to calls of this API. In theory, a large number of backends could invoke it frequently and generate a high flush rate. For example, if 1000 backends were to call it once per second, that would result in 1000 shared-stats updates per second. Whether such a usage pattern would occur in practice is a separate question. However, given that existing pgstat code uses PGSTAT_MIN_INTERVAL to limit flush frequency, it seems to me that anytime stats should probably retain a similar restriction. - Assert(did_flush || nowait); + /* + * When nowait is false we block for the lock, so the only reason a + * flush_pending_cb can legitimately return false is that the entry + * has active transaction state that must not be freed yet (e.g. + * relation stats with trans != NULL). That situation only arises + * mid-transaction, hence the IsTransactionOrTransactionBlock() check. + */ + Assert(did_flush || nowait || IsTransactionOrTransactionBlock()); Given the current design, the assertion itself looks reasonable to me, since we're now allowing flush_pending_cb() to return false during mid-transaction flushes. However, I find the comment a bit difficult to reconcile with the actual condition. The comment explains a specific reason why flush_pending_cb() may legitimately return false, while the assertion itself only distinguishes between transaction and non-transaction contexts. As I understand it, the intent here is to keep the existing requirement outside a transaction while relaxing it for mid-transaction flushes. If so, I wonder whether expressing the assertion as Assert(IsTransactionOrTransactionBlock() || (did_flush || nowait)); would make that intent a bit more obvious. If that's indeed the intent, perhaps the comment could also be phrased in terms of "the assertion is relaxed during a transaction" rather than describing a specific reason why flush_pending_cb() may return false. * Ignore entries that didn't accumulate any actual counts, such as - * indexes that were opened by the planner but not used. + * indexes that were opened by the planner but not used. The entry cannot + * be freed if there is active transaction state, since + * AtEOXact_PgStat_Relations will still merge counters into it. */ if (pg_memory_is_all_zeros(&lstats->counts, sizeof(struct PgStat_TableCounts))) - return true; + return (lstats->trans == NULL); I may be missing something, but I think there may be a more fundamental issue behind this change. Historically, flushing and freeing an entry were effectively the same decision because flushing only happened after transaction end. With anytime flushes, however, those become separate concerns. A callback may be able to flush everything that is appropriate for the current flush context, while the caller may still need to keep the entry around for later transaction-end processing. That makes it hard to reason about checks such as pg_memory_is_all_zeros(&lstats->counts, ...). This check still appears to tell whether the counters themselves are zero, but the proposed change makes it look as if that is no longer enough to determine whether the entry is "empty", because entry lifetime is also folded into the callback result. I wonder whether it would be cleaner for the caller to make the lifetime decision, based on the kind of flush being performed, and for the callback to be told that flush context explicitly. For example, the callback could be passed whether this is an anytime flush or a transaction-boundary flush, flush the counters that are appropriate for that context, and then return whether any counters remain. That way, the callback would not need to infer entry lifetime from transaction state, and the return value could keep a simpler meaning. Regards. -- Kyotaro Horiguchi NTT Open Source Software Center