Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vslUS-0018vb-1H for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Feb 2026 17:37:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vslUR-00H7p5-25 for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Feb 2026 17:37:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vslUR-00H7ob-0i for pgsql-hackers@lists.postgresql.org; Wed, 18 Feb 2026 17:37:31 +0000 Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vslUO-00000001FdH-2aGQ for pgsql-hackers@lists.postgresql.org; Wed, 18 Feb 2026 17:37:30 +0000 Received: by mail-wr1-x432.google.com with SMTP id ffacd0b85a97d-436e8758b91so88630f8f.0 for ; Wed, 18 Feb 2026 09:37:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1771436248; cv=none; d=google.com; s=arc-20240605; b=MIL6PVFj2ZpH9sMx5+0oVsx2YQdbL+pJiwXKPYrQ6Op5ipCW87ZcSNeDqRZ5Eosn3P Wt6fG2ADlHJONnepEDQTbszYQyneDBDz5uVKdhnj8juuFWICP5oIkvWZE7S2H8E8OQe/ eR4dz35E5WDTwhc/V1P15tYTIMSQhzP1gafMYnLeT1G//rG/0HggVXSeWHNodTkZKeBe K2knA4HThaA9GnCMeGsEHOA2u6CnxjnGVuZhtyv8RMlG1HfEA+7dfse/0PeZQM0/J//d 99I0PUeNnF0PwQCMw45cmerR/WjB5VJnSSXh1EX+ukW5pC5PqQ/sqkhDWrRILcrLxI/Z zsmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=bIJam0EhAmTmkVnRoPAYlbMzhKwZlEqUZaIwHyE/o8Y=; fh=B8rjCadWy2NoB2ok1lnuaL4XsbSDH2O3BGr5p+g6Acs=; b=JuSurRGCCAS27uIvaxPc/j67fbZgAUCLdibqv4LUq0YOkTibPrfIburLL+07VPKGp9 T1HU9Q7c4JWzJpp9NXiyX5ght3KqU/kpixtg1/TuTLXTvJFnpiFy7K4JgpfI0InEGBVh FDRGD8u6n3mtESmuhdOZObOHmZiXtrnvsEzSACn/+OCji+XPifjc5mndb3U7uZ8i/DT1 X24ky8+yIo70SiBEFZXD9Nz+YpXsrz6zaTgGCTRaRs8DF0mnwtHOqiyF1De3V/jTHJhj pErDfisT5N2jdWJzGYqW5WEUlz+nuJ32YYUcsfWkW1eiIeJp5vWuNTQHiYJhFcPb3cKF i2yw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1771436248; x=1772041048; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bIJam0EhAmTmkVnRoPAYlbMzhKwZlEqUZaIwHyE/o8Y=; b=E4qjjRAb+apUSc3voMWv0NMg9kmvMnQu0h6OowQqtmCxwdR0gwGYpPTJZp10wBhKGc AfXErPeeWw4JDXd9cC40Xqs4AUq6N8IKnhmp+G4+EZkkoI0PhtyUhcooQmPSCs9grTAJ zgo1tzIcdJWLLuuPo0crFSFeVs1xS4bJc5iyDlWpTpTFhE/HIgB13g1LmYZUn4k9z8IY nzgmv3BP3TpfMlFIAVhYSMiYqdnTfItBKhSuJZ3AL+J0ouKCtgrV5GFf3LkqO65O8Pjk VK0k0xwDx9yWHXPzThR/U3NzL4VR/EzPhZ+20N7JDX0mOsNAR2oC97Ya48fkyfzTeHbm auyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1771436248; x=1772041048; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bIJam0EhAmTmkVnRoPAYlbMzhKwZlEqUZaIwHyE/o8Y=; b=Aelz0LLbhQhCMITorJgnwsHQRw0+YBLv86g16XzZ1FWSlXa9Rq/X+uSBd5mpEkXVRR Ju+KPye9gDOa5ChcfsYZ4FJ5AYlz/U4IOV9jimE6r6jW2rBD6SH0QqOdcK1l8iEakWz8 RwhGmOvwKSmktepsUuareiJLnMlX7sm/C9FALZIwfnMpa0mz1jVFSsnrt+JqxzMwgDvJ 6bWr4+0RrvaY8p1NMqgaTUc6M03dYrbwdSRKdDYcoe4hk9VKRkpsGGjRd5ASndu7Xwmu 8AF/9W3GI3gx9KMPNbh3v2iTcKs/KMIn48EXptiWlPJlHg3nK3enaJXcyezyqiKb5QpZ 07NA== X-Forwarded-Encrypted: i=1; AJvYcCVBqZAtPmKwdDVhqJIIg8OK8f4ktKcawbz3jGzYcRvk66kvN8CV4GsIE3N6KhHUjgs4JRXQEFlzTS7fVbWs@lists.postgresql.org X-Gm-Message-State: AOJu0YwqPfxUEnK/wDEurh4hF+N93o9rZhCJ7bWnHcbuezaOs7R4ahqr njh27qW5nm02oPQ6GJJRKUhcTunXm/qWOSpXpkYldS9MAxx023cf2BrxzNuo81IRHWMbZDDGj/c 1tbluTptpbLGSAAZoFUft+h+aJgBt8ZGJL3s07fhGwA== X-Gm-Gg: AZuq6aJmqxZG+ZsvqAthwK6RVohBwJlQPDviAiAxikqssMKF7awwxU9MzlJIAIcrk4H hog1sR21lMo+1yr4zPI8SNLGgczI2g7ucOVjDlkn130O9me71q/8d2NGyYedHolfu4+QR+INxi0 ItsbJGF9y3+IzKJXkuHu/z+y8Dtxkz+hhrjc8J2Tk2dL1/Y8xy0ZKdvQVRfJdWcvaZv0mXh+9hI szLIf55pFpdS03VoO2wL2rUIxLHkAiLJrKGRaqrpBnFL3yOimTnsBcIW9Dwo5cTmIy4GSHY4LI1 U6Yv6K6G X-Received: by 2002:a05:6000:2689:b0:436:d824:6218 with SMTP id ffacd0b85a97d-4379dba6bdamr27891094f8f.55.1771436247656; Wed, 18 Feb 2026 09:37:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ants Aasma Date: Wed, 18 Feb 2026 19:37:16 +0200 X-Gm-Features: AaiRm5390KtCVXD7hIfZ1fXKbekPtRGn3CMaXDWUbdbZ_sMZ5Kt9W_Aj0KvupCI Message-ID: Subject: Re: pg_stat_io_histogram To: Jakub Wartak Cc: Andres Freund , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 12 Feb 2026 at 12:31, Ants Aasma wrote: > I meant this: > > @@ -287,6 +287,7 @@ pgstat_io_flush_cb(bool nowait) > bktype_shstats->times[io_object][io_context][io_op] += > INSTR_TIME_GET_MICROSEC(time); > > + if (PendingIOStats.counts[io_object][io_context][io_op] > 0) > for(int b = 0; b < PGSTAT_IO_HIST_BUCKETS; b++) > bktype_shstats->hist_time_buckets[io_object][io_context][io_op][b] += > PendingIOStats.pending_hist_time_buckets[io_object][io_context][io_op][b]; > > Most object/context/op combinations will have a 0 count, so no point in actually looking at the histogram. For this to be of any use, the memset that comes after lock release would also have to be adjusted to be conditional. I was also able to convince clang and gcc to vectorize these loops. I had to split the innermost loop so the time calculation with the /1000 for microseconds conversion and the conditional histogram loop are done separately, mark all the loop with nounroll pragmas, and tag the innermost loop for vectorization by clang. But looking at the benchmark results below, probably not worth the effort. >> .. but the main problem, even if I do all of that I won't be able to >> reliably measure the impact, probably the best I could say is >> "runs good as well as master, +/- 3%". >> >> Could you somehow help me with this? I mean should we reduce the scope(remove >> context) and add that "if"? > > > I think if we only aggregate histograms conditionally, then having a ton of different histograms is less of a problem. Only the histograms that have any data will get accessed. The overhead is limited to the memory usage which I think is acceptable. > > I'll run a few benchmarks on what I have available here to see if I can tease out anything more than the no effect with a 3% error margin we have today. I benchmarked with gcc 15.2 with "-O2 -march=x86-64 -fno-omit-frame-pointer" to match what most users have. The CPU is Ryzen 9 9900X. I concentrated on two codepaths, pgstat_io_flush_cb and pgstat_count_io_op_time. Configuration is default except the following: track_io_timing = 'on' io_method = 'io_uring' io_combine_limit = '1' effective_io_concurrency = '1' For checking if aggregating statistics has an effect I used pgbench scale 100 in read only mode, 10 60s runs each. It will do 1-2 reads from page cache per select. bin | concurrency | avg | stddev_fraction | diff -----------+-------------+----------+-----------------+--------- pg-iohist | 1 | 92927.2 | 0.00254 | 1.00735 pg-master | 1 | 92248.9 | 0.00214 | pg-iohist | 12 | 618342.3 | 0.00828 | 1.00035 pg-master | 12 | 618127.9 | 0.00819 | pg-iohist | 24 | 591228.1 | 0.00889 | 0.98858 pg-master | 24 | 598058.5 | 0.00846 | perf measurement shows 0.00% samples in pgstat_io_flush_cb, and 0.07% in pgstat_count_io_op_time. After checking the logic in pgstat_report_stat() these make sense - stats are aggregated at most once per second per backend. For the I/O collection, I tried using prewarm, but got really noisy results from it. So instead I created a table with 100k rows with one row per page, vacuumed it and benchmarked select count(*) over it. Interestingly, setting effective_io_concurrency = 1 made the results both more consistent and faster. bin | avg | stddev_fraction | diff -----------+-------+-----------------+--------- pg-iohist | 7.526 | 0.01012 | 0.99396 pg-master | 7.572 | 0.01186 | perf measurement shows 0.40% spent in pgstat_count_io_op_time. With -march=native build it goes up to 0.62%, mostly thanks to tps going up by 30%. Checksum calculations really love AVX-512. I think performance wise the patch is fine as is, there is negligible performance overhead even in most adverse conditions. I still want to look at the memory overhead more closely. The 30kB per backend seems tolerable to me, but I think having it in PgStat_BktypeIO is not great. This makes PgStat_IO 30k*BACKEND_NUM_TYPES bigger, or ~ 0.5MB. Having a stats snapshot be half a megabyte bigger for no reason seems too wasteful. Regards, Ants Aasma