pgsql: Increase default maintenance_io

public inbox for [email protected]  
help / color / mirror / Atom feed

pgsql: Increase default maintenance_io_concurrency to 16
10+ messages / 5 participants
[nested] [flat]

* pgsql: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 13:08  Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Melanie Plageman @ 2025-03-18 13:08 UTC (permalink / raw)
  To: [email protected]

Increase default maintenance_io_concurrency to 16

Since its introduction in fc34b0d9de27a, the default
maintenance_io_concurrency has been larger than the default
effective_io_concurrency. maintenance_io_concurrency primarily
controlled prefetching done on behalf of the whole system, for
operations like recovery. Therefore it makes sense for it to have a
value equal to or greater than effective_io_concurrency, which controls
I/O concurrency for reading a relation in a bitmap heap scan.

ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase
maintenance_io_concurrency as well. For now, though, we'll keep the
defaults of effective_io_concurrency and maintenance_io_concurrency
equal to one another (16).

On fast, high IOPs systems, significantly higher values of
maintenance_io_concurrency are observably beneficial [1]. However, such
values would flood low IOPs systems and increase overall system I/O
latency.

It is worth mentioning that since 9256822608f and c3e775e608f,
maintenance_io_concurrency also controls the I/O concurrency of each
vacuum worker. Since many autovacuum workers may be simultaneously
issuing I/Os, we want to keep maintenance_io_concurrency appropriately
conservative.

[1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com

Suggested-by: Jakub Wartak <[email protected]>
Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cc6be07ebde2aef7cc0507f997f563ce77c00a00

Modified Files
--------------
doc/src/sgml/config.sgml                      | 8 ++++----
src/backend/utils/misc/postgresql.conf.sample | 2 +-
src/include/storage/bufmgr.h                  | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: pgsql: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 20:08  Bruce Momjian <[email protected]>
  parent: Melanie Plageman <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Bruce Momjian @ 2025-03-18 20:08 UTC (permalink / raw)
  To: Melanie Plageman <[email protected]>; +Cc: [email protected]


This commit makes our default random_page_cost = 4 out of line with
these new settings (assumes modern SSD/NAS/SAN hardware) and more out of
line with reality.

---------------------------------------------------------------------------

On Tue, Mar 18, 2025 at 01:08:47PM +0000, Melanie Plageman wrote:
> Increase default maintenance_io_concurrency to 16
> 
> Since its introduction in fc34b0d9de27a, the default
> maintenance_io_concurrency has been larger than the default
> effective_io_concurrency. maintenance_io_concurrency primarily
> controlled prefetching done on behalf of the whole system, for
> operations like recovery. Therefore it makes sense for it to have a
> value equal to or greater than effective_io_concurrency, which controls
> I/O concurrency for reading a relation in a bitmap heap scan.
> 
> ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase
> maintenance_io_concurrency as well. For now, though, we'll keep the
> defaults of effective_io_concurrency and maintenance_io_concurrency
> equal to one another (16).
> 
> On fast, high IOPs systems, significantly higher values of
> maintenance_io_concurrency are observably beneficial [1]. However, such
> values would flood low IOPs systems and increase overall system I/O
> latency.
> 
> It is worth mentioning that since 9256822608f and c3e775e608f,
> maintenance_io_concurrency also controls the I/O concurrency of each
> vacuum worker. Since many autovacuum workers may be simultaneously
> issuing I/Os, we want to keep maintenance_io_concurrency appropriately
> conservative.
> 
> [1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com
> 
> Suggested-by: Jakub Wartak <[email protected]>
> Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
> 
> Branch
> ------
> master
> 
> Details
> -------
> https://git.postgresql.org/pg/commitdiff/cc6be07ebde2aef7cc0507f997f563ce77c00a00
> 
> Modified Files
> --------------
> doc/src/sgml/config.sgml                      | 8 ++++----
> src/backend/utils/misc/postgresql.conf.sample | 2 +-
> src/include/storage/bufmgr.h                  | 2 +-
> 3 files changed, 6 insertions(+), 6 deletions(-)
> 


-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: pgsql: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 20:13  Andres Freund <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Andres Freund @ 2025-03-18 20:13 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Melanie Plageman <[email protected]>; [email protected]

Hi,

On 2025-03-18 16:08:22 -0400, Bruce Momjian wrote:
> This commit makes our default random_page_cost = 4 out of line with
> these new settings (assumes modern SSD/NAS/SAN hardware) and more out of
> line with reality.

How so? That seems like an independent consideration to me.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Increase default maintenance_io_concurrency to 16
@ 2025-03-18 20:22  Bruce Momjian <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Bruce Momjian @ 2025-03-18 20:22 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>

On Tue, Mar 18, 2025 at 04:13:26PM -0400, Andres Freund wrote:
> Hi,
> 
> On 2025-03-18 16:08:22 -0400, Bruce Momjian wrote:
> > This commit makes our default random_page_cost = 4 out of line with
> > these new settings (assumes modern SSD/NAS/SAN hardware) and more out of
> > line with reality.
> 
> How so? That seems like an independent consideration to me.

[thread moved to hackers]

Uh, I think our old random_page_cost and *_io_concurrency assumed
magnetic disks --- now *_io_concurrency assumes more modern hardware and
random_page_cost assumes magnetic.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 20:27  Andres Freund <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Andres Freund @ 2025-03-18 20:27 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>

Hi,

On 2025-03-18 16:22:45 -0400, Bruce Momjian wrote:
> On Tue, Mar 18, 2025 at 04:13:26PM -0400, Andres Freund wrote:
> > Hi,
> > 
> > On 2025-03-18 16:08:22 -0400, Bruce Momjian wrote:
> > > This commit makes our default random_page_cost = 4 out of line with
> > > these new settings (assumes modern SSD/NAS/SAN hardware) and more out of
> > > line with reality.
> > 
> > How so? That seems like an independent consideration to me.
> 
> [thread moved to hackers]
> 
> Uh, I think our old random_page_cost and *_io_concurrency assumed
> magnetic disks --- now *_io_concurrency assumes more modern hardware and
> random_page_cost assumes magnetic.

The cost difference between random and non-random IO is actually still
reasonably accurate with NVMEs. You can argue that random_page_cost should be
2.5, but that really depends on the specific hardware.

Particularly for cloud style networked storage, you could even argue that the
difference between sequential and random IO has *grow* given recent changes in
PG (io combining in PG 17), as random IOs much more quickly lead to exhausting
IOPS quotas.

I still don't think adjusting random_page_cost has any meaningful relation to
the change at hand.

Greetings,

Andres Freund

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 20:35  Bruce Momjian <[email protected]>
  parent: Andres Freund <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Bruce Momjian @ 2025-03-18 20:35 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>

On Tue, Mar 18, 2025 at 04:27:18PM -0400, Andres Freund wrote:
> Hi,
> 
> On 2025-03-18 16:22:45 -0400, Bruce Momjian wrote:
> > On Tue, Mar 18, 2025 at 04:13:26PM -0400, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2025-03-18 16:08:22 -0400, Bruce Momjian wrote:
> > > > This commit makes our default random_page_cost = 4 out of line with
> > > > these new settings (assumes modern SSD/NAS/SAN hardware) and more out of
> > > > line with reality.
> > > 
> > > How so? That seems like an independent consideration to me.
> > 
> > [thread moved to hackers]
> > 
> > Uh, I think our old random_page_cost and *_io_concurrency assumed
> > magnetic disks --- now *_io_concurrency assumes more modern hardware and
> > random_page_cost assumes magnetic.
> 
> The cost difference between random and non-random IO is actually still
> reasonably accurate with NVMEs. You can argue that random_page_cost should be
> 2.5, but that really depends on the specific hardware.

Uh, the random_page_cost = 4 assumes caching, so it is assuming actual
random I/O to be 40x slower, which I doubt is true for SSDs:

	https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
	
	Random access to mechanical disk storage is normally much more expensive
	than four times sequential access. However, a lower default is used
	(4.0) because the majority of random accesses to disk, such as indexed
	reads, are assumed to be in cache. The default value can be thought of
	as modeling random access as 40 times slower than sequential, while
	expecting 90% of random reads to be cached.

> Particularly for cloud style networked storage, you could even argue that the
> difference between sequential and random IO has *grow* given recent changes in
> PG (io combining in PG 17), as random IOs much more quickly lead to exhausting
> IOPS quotas.
> 
> I still don't think adjusting random_page_cost has any meaningful relation to
> the change at hand.

Okay, I was just asking.

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 21:04  Andres Freund <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 2 replies; 10+ messages in thread

From: Andres Freund @ 2025-03-18 21:04 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>; Greg Smith <[email protected]>

Hi,

On 2025-03-18 16:35:29 -0400, Bruce Momjian wrote:
> Uh, the random_page_cost = 4 assumes caching, so it is assuming actual
> random I/O to be 40x slower, which I doubt is true for SSDs:

Uh, huh:

> 	https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
> 	Random access to mechanical disk storage is normally much more expensive
> 	than four times sequential access. However, a lower default is used
> 	(4.0) because the majority of random accesses to disk, such as indexed
> 	reads, are assumed to be in cache. The default value can be thought of
> 	as modeling random access as 40 times slower than sequential, while
> 	expecting 90% of random reads to be cached.

Is that actually a good description of what we assume? I don't know where that
90% is coming from? Briefly skimming through selfuncs.c and costsize.c I don't
see anything.

The relevant change:

commit c1d9df4fa227781b31be44a5a3024865a7f48049
Author: Bruce Momjian <[email protected]>
Date:   2012-02-14 16:54:54 -0500

    Document random page cost is only 4x seqeuntial, and not 40x.

The relevant discussion seems to be:
https://postgr.es/m/4F31A05A.1060506%402ndQuadrant.com

But I don't see any origin of that number in that thread.

I am not sure if I found the correct email for Greg Smith?

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 21:19  Bruce Momjian <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 1 reply; 10+ messages in thread

From: Bruce Momjian @ 2025-03-18 21:19 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>; Greg Smith <[email protected]>

On Tue, Mar 18, 2025 at 05:04:46PM -0400, Andres Freund wrote:
> Hi,
> 
> On 2025-03-18 16:35:29 -0400, Bruce Momjian wrote:
> > Uh, the random_page_cost = 4 assumes caching, so it is assuming actual
> > random I/O to be 40x slower, which I doubt is true for SSDs:
> 
> Uh, huh:
> 
> > 	https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
> >
> > 	Random access to mechanical disk storage is normally much more expensive
> > 	than four times sequential access. However, a lower default is used
> > 	(4.0) because the majority of random accesses to disk, such as indexed
> > 	reads, are assumed to be in cache. The default value can be thought of
> > 	as modeling random access as 40 times slower than sequential, while
> > 	expecting 90% of random reads to be cached.
> 
> Is that actually a good description of what we assume? I don't know where that
> 90% is coming from? Briefly skimming through selfuncs.c and costsize.c I don't
> see anything.

The next paragraph is:

	If you believe a 90% cache rate is an incorrect assumption
	for your workload, you can increase random_page_cost to better
	reflect the true cost of random storage reads. Correspondingly,
	if your data is likely to be completely in cache, such as when
	the database is smaller than the total server memory, decreasing
	random_page_cost can be appropriate. Storage that has a low random
	read cost relative to sequential, e.g., solid-state drives, might
	also be better modeled with a lower value for random_page_cost,
	e.g., 1.1.

> The relevant change:
> 
> commit c1d9df4fa227781b31be44a5a3024865a7f48049
> Author: Bruce Momjian <[email protected]>
> Date:   2012-02-14 16:54:54 -0500
> 
>     Document random page cost is only 4x seqeuntial, and not 40x.
> 
> The relevant discussion seems to be:
> https://postgr.es/m/4F31A05A.1060506%402ndQuadrant.com
> 
> But I don't see any origin of that number in that thread.
> 
> I am not sure if I found the correct email for Greg Smith?

Yes, I can't say there is much research behind the value, and even if
there was, the assumed hardware is unlikely to be relevant today.
8
-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 21:47  Tom Lane <[email protected]>
  parent: Bruce Momjian <[email protected]>
  0 siblings, 0 replies; 10+ messages in thread

From: Tom Lane @ 2025-03-18 21:47 UTC (permalink / raw)
  To: Bruce Momjian <[email protected]>; +Cc: Andres Freund <[email protected]>; Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>; Greg Smith <[email protected]>

Bruce Momjian <[email protected]> writes:
>>> https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>>> 
>>> Random access to mechanical disk storage is normally much more expensive
>>> than four times sequential access. However, a lower default is used
>>> (4.0) because the majority of random accesses to disk, such as indexed
>>> reads, are assumed to be in cache. The default value can be thought of
>>> as modeling random access as 40 times slower than sequential, while
>>> expecting 90% of random reads to be cached.

This explanation is, in fact, made up out of whole cloth.  It has
diddly-squat to do with where the number came from, although maybe
it's an okay rationalization for continuing to use 4.0 on modern
hardware.

Where the number came from is that I did a bunch of testing back in
the late 90s and random_page_cost = 4.0 made the planner's ratios of
seqscan vs indexscan costs match up with observed timings.  Of course,
those observations were made with spinning-rust drives, so you'd not
get the same results on SSDs.

> Yes, I can't say there is much research behind the value, and even if
> there was, the assumed hardware is unlikely to be relevant today.

Yes as to the latter, no as to the former.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: Increase default maintenance_io_concurrency to 16
@ 2025-03-18 23:52  Gregory Smith <[email protected]>
  parent: Andres Freund <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Gregory Smith @ 2025-03-18 23:52 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: Bruce Momjian <[email protected]>; Melanie Plageman <[email protected]>; PostgreSQL-development <[email protected]>; Greg Smith <[email protected]>

On Tue, Mar 18, 2025 at 5:04 PM Andres Freund <[email protected]> wrote:

> Is that actually a good description of what we assume? I don't know where
> that
> 90% is coming from?

That one's all my fault.  It was an attempt to curve-fit backwards why the
4.0 number Tom set with his initial commit worked as well as it did given
that underlying storage was closer to 50X as slow, and I sold the idea well
enough for Bruce to follow the reasoning and commit it.  Back then there
was a regular procession of people who measured the actual rate and
wondered why there was the order of magnitude difference between those
measurements and the parameter.  Pointing them toward thinking in terms of
the cached read percentage too did a reasonable job of deflecting them onto
why the model was more complicated than it seems.  I intended to follow
that up with more measurements, only to lose the whole project into a
non-disclosure void I have only recently escaped

I agree with your observation that the underlying cost of a non-sequential
read stall on cloud storage is not markedly better than the original
random: sequential ratio of mechanical drives.   And the PG17 refactoring
to improve I/O chunking worked to magnify that further.

The end of this problem I'm working on again is assembling some useful mix
of workloads such that I can try changing one of these magic constants with
higher confidence. My main working set so far is write performance
regression test sets against the Open Street Map loading workload, that
I've been blogging about, plus the old read-only queries of the SELECT-only
spaced along a scale/client grid.  My experiments so far have been around
another Tom special, the maximum buffer usage count limit,  which turned
into another black hole full of work I have only recently escaped.  I
haven't really thought much yet about a workload set that would allow
adjusting random_page_cost.  On the query side we've been pretty heads down
on the TPC-H and Clickbench sets.  I don't have buffer internals data from
those yet though, will have to add that to the work queue.

--
Greg Smith
Director of Open Source Strategy, Crunchy Data
[email protected]

^ permalink  raw  reply  [nested|flat] 10+ messages in thread

end of thread, other threads:[~2025-03-18 23:52 UTC | newest]

Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-18 13:08 pgsql: Increase default maintenance_io_concurrency to 16 Melanie Plageman <[email protected]>
2025-03-18 20:08 ` Bruce Momjian <[email protected]>
2025-03-18 20:13   ` Andres Freund <[email protected]>
2025-03-18 20:22     ` Increase default maintenance_io_concurrency to 16 Bruce Momjian <[email protected]>
2025-03-18 20:27       ` Re: Increase default maintenance_io_concurrency to 16 Andres Freund <[email protected]>
2025-03-18 20:35         ` Re: Increase default maintenance_io_concurrency to 16 Bruce Momjian <[email protected]>
2025-03-18 21:04           ` Re: Increase default maintenance_io_concurrency to 16 Andres Freund <[email protected]>
2025-03-18 21:19             ` Re: Increase default maintenance_io_concurrency to 16 Bruce Momjian <[email protected]>
2025-03-18 21:47               ` Re: Increase default maintenance_io_concurrency to 16 Tom Lane <[email protected]>
2025-03-18 23:52             ` Re: Increase default maintenance_io_concurrency to 16 Gregory Smith <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox