Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vq9YU-005Y1Y-2w for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Feb 2026 12:42:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vq9YS-003tOy-0G for pgsql-hackers@arkaria.postgresql.org; Wed, 11 Feb 2026 12:42:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vq9YR-003tOq-2R for pgsql-hackers@lists.postgresql.org; Wed, 11 Feb 2026 12:42:52 +0000 Received: from mail-wr1-x42a.google.com ([2a00:1450:4864:20::42a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vq9YQ-000000009Ge-0XCa for pgsql-hackers@lists.postgresql.org; Wed, 11 Feb 2026 12:42:51 +0000 Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-436356740e6so4552390f8f.2 for ; Wed, 11 Feb 2026 04:42:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770813768; cv=none; d=google.com; s=arc-20240605; b=PLq1mk4Dz6llQyqWjgPjF7UVWDj7KC+mSK8qa0sUiOzOhYL29QoyZKNo1kIDdp2Zeg iTkLScBlNPSgC1F4DMQgYLEVzOVvoLXz339SdNMVq/uceZt6mKA9jcx6/c6D64B5NrH5 JNzV4AyqDR7seHxhDU6KLX+Sv7V5ZsW7Zzn8wMFTd3L1aacFwv0C6MtyZtVR5eoTgeQl R2RQAjisOaKPZi9hzsrBVkiUaZOFNdlsCwHXI5jhVSVHRdslSmzvNHa71dcpPZCcZynj x0pooGMIvx5A38fMfKpDN0FKOyPcr6CjQSGmXFAwvWmdVxBSXkB/mDXsUDr10Hc9DK8o fM6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=sErt9eTdvHaXBAdaXccRZLGJF7Nzn+kdI0nzyfjHRLw=; fh=6oPqKe4Dq3Rq4aYHVsIFauLSc3P7U5uaYpKtUm4yIw8=; b=hY1aJayDOO5tBAp0mlTLRgBiSH2p1wi6hNBS6An2Ju+tFzuTb3xhjL186d1wffpKPW w24ZM/3GSIBX9MA4CfH2Xr2MexyBkgmNtXiIi1oeo+U4rYgDQIJd4soeNOZPtmdhf3QH e6sElIovIxDHmu/ZiFUhAh9InJgxAN6VqwsZO3NtjRtOMWGkF4wuFP0k7b9u9KK7a7JX qoIx3sXB8TF0G6fXZhw8DvV2uSiOx5O4N0aXv4bTf696QweoKN97jUWJgo6vME5ox3IC LRa2fwpNIzO7QLRrUgYk8iD47MYlA+BUKR1n9uzMpiHKSZzjsbUlIClN1jfB7yWPwe0O uwww==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1770813768; x=1771418568; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sErt9eTdvHaXBAdaXccRZLGJF7Nzn+kdI0nzyfjHRLw=; b=BRaGFqGdOLJsC3advUzHss8B+BV+sQDnjHAIVAmXWwdHiiaAuXCdQ4lb8mPOb/zJ1C hUy1IVPPYeEkpqfl+/zKAAE6UfFsQFt3Ps3gcupWUktCH3jGKtoYV0FRZDS1gy9N0LIE xTFrsOHfdBFzMZthSARZiCm1OCOi9WuAs3SjAvUR77ZZPsFvqFNdH4X48C4gf/c9NpPO lEiXNHdsxt4GVQTK+OM/IFPBoPLKOwoVBefxBoVLJQj2HbJc8fkgX441ix4TJPr5l5de Ye+KaCQLctI3b4aRbk0YNTDHYztEvvEuyPqQSemkYehgRsm7oqDL2ZC7AB6xgfpY8QRP G7kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770813768; x=1771418568; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sErt9eTdvHaXBAdaXccRZLGJF7Nzn+kdI0nzyfjHRLw=; b=KbmUkapyRG+uQBzw6HHC7UfrFuXv7+5n5vfv/7FOJZ6yosFdtyi2330ULw9uBtmV/C 2u1YPMbfx67QsGmXAmheBEpEEnaWJTMEeNTeW6ACUL22DdYX8F4hvlu0Sv33Cb2YkIaJ aCIFiWH2sknJ6h/rzJIWXzXqHVyAXONyTqiY6iyRxUFFszIPefKjhhIvWND4pdRkrgdo JV98yHzd612riXq4BuwnEOgz6teABjA9VgqqZgRJUN+lrej7JQeDIpd5s2+LQlasL0C/ eNggjGOHv8EAyl4MpJn+JxOglEa+P6Sw3Gti0yuujlQXZYR8czjNi5/uaELygZf3jmfG eSmA== X-Forwarded-Encrypted: i=1; AJvYcCVDBJpC7qVpGJzp5dBLpRx3w4SxF7dJ4Y24sNjYgX+71ANoKvZ3W5ziJ6pE57qmrW60Wk6SbdYdqo6dSFPM@lists.postgresql.org X-Gm-Message-State: AOJu0YzhzW/z2nrehzEJJ7UusrVwodepeAzjQ+xi5tLv7EckpVyXpz9O uLmBSwWPQXhTeeZ7ULYx5+TCpKC53TmFtvNKItzQ33r+mjgxmY871SWDS8zn4HLkjsoB2enFPJE wb2ruWTjBtFt7uyssEcXE2c/Z8hwdxfdSpLCqJOmOxg== X-Gm-Gg: AZuq6aKLr4eIAIvVOdtqHkhu4QcjURvPL7HrZmWgBtfk+rk726xa+z+wsKONE2GzhIM ayGEBqajyx2sMVlKU0DiTIKuasTxM5odbpWSO/zkYooMfpUiazz2rfbk9Pe/dU/e+r2+E2jODLt SmZUGBaClQTngMusfzjEeHcadXblnJyIsw6hsJ4keNpELpvXXBGCM3Khg/att86wIJkqSN0yiDo oTe++FLFft6QQUP3+GSni8B4XxkoR3keQJJl9AsHVieLq7fZr4w8detU9xhDPy8ySvJBSWvmQHF PCTKWsQM X-Received: by 2002:a05:6000:1447:b0:435:96ec:679e with SMTP id ffacd0b85a97d-43782c25634mr3967504f8f.23.1770813768417; Wed, 11 Feb 2026 04:42:48 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ants Aasma Date: Wed, 11 Feb 2026 14:42:36 +0200 X-Gm-Features: AZwV_Qj83tHxpZreQEEk436kVeOnoBJgNIzyOHuh5wW8n2vc5DBJhmH7SCBwV-s Message-ID: Subject: Re: pg_stat_io_histogram To: Jakub Wartak Cc: Andres Freund , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 27 Jan 2026 at 14:06, Jakub Wartak wrote: > > Hm. Isn't 128us a pretty high floor for at least reads and writes? On a good > > NVMe disk you'll get < 10us, after all. > > I was blind and concentrated way too much on the bad-behaving I/O rather than good > I/O - let's call it I/O negativity bias 8) > > Now v2 contains the min bucket lowered to 8us (but max then is just ~131ms, I > didn't want it to use more than 64b total, 16*4b (uint32)=64b and well > 16*8b(uint64)=128b already, so that's why it's capped max at 131072us right now). I have toyed around with similar histogram implementations as I have dealt with multiple cases where having a latency histogram would have made diagnosis much faster. So thank you for working on this. I think it would be useful to have a max higher than 131ms. I've seen some cases with buggy multipathing driver and self-DDOS'ing networking hardware where the problem latencies have been in the 20s - 60s range. Being able to attribute the whole time to I/O allows quickly ruling out other problems. Seeing a count in 131ms+ bucket is a strong hint, seeing a count in 34s-68s bucket is a smoking gun. Is the main concern for limiting the range cache-misses/pollution when counting I/O or is it memory overhead and cost of collecting? It seems quite wasteful to replicate the histogram 240x for each object/context/op combination. I don't think it matters for I/O instrumentation overhead - each backend is only doing a limited amount of different I/O categories and the lower buckets are likely to be on the same cache line with the counter that gets touched anyway. For higher buckets the overhead should be negligible compared to the cost of the I/O itself. What I'm worried about is that this increases PgStat_PendingIO from 5.6KB to 30KB. This whole chunk of memory needs to be scanned and added to shared memory structures element by element. Compiler auto vectorization doesn't seem to kick in on pgstat_io_flush_cb(), but even then scanning an extra 25KB of mostly zeroes on every commit doesn't seem great. Maybe making the histogram accumulation conditional on the counter field being non-zero is enough to avoid any issues? I haven't yet constructed a benchmark to see if it's actually a problem or not. Select only pgbench with small shared buffers and scale that fits into page cache should be an adversarial use case while still being reasonably realistic. I'm not familiar enough with the new stats infrastructure to tell whether it's a problem, but it seems odd that pgstat_flush_backend_entry_io() isn't modified to aggregate the histograms. Regards, Ants Aasma