Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uJyWu-00AahU-QJ for pgsql-hackers@arkaria.postgresql.org; Tue, 27 May 2025 17:56:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uJyWs-009rhG-Ef for pgsql-hackers@arkaria.postgresql.org; Tue, 27 May 2025 17:55:58 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <9erthalion6@gmail.com>) id 1uJyWs-009rh7-3h for pgsql-hackers@lists.postgresql.org; Tue, 27 May 2025 17:55:58 +0000 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <9erthalion6@gmail.com>) id 1uJyWo-000PN1-34 for pgsql-hackers@lists.postgresql.org; Tue, 27 May 2025 17:55:57 +0000 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-3a4c6c0a9c7so2591459f8f.3 for ; Tue, 27 May 2025 10:55:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748368554; x=1748973354; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=29DhBPvJSuZSbsbFVqTNsLLh+gvmKQoE8QQR0fa4+fg=; b=FZCFhDMXnFKB46L1G/tOjNHt+QQZ4oWuhJCSt//EdDwka9R6HjBfjXnmCe7BsD0e0I WNgkGP4Yh1hDN0x4kHnfsA4s5+6LfsmZj+8aBC7ckjERlDHm5ofqdfd1Sbc7svMKpaJY qjZEJ12Lskh2exfYp+FkE6bAAetCwu+1rP5u4uduJSvI0SGkzPtIryjiiYppZ65DPDPw h2HOJaBCuGwwl3OaIV8h+Xy2oPBF8IuV3rbkCWMKSkFIn6AFVSLcoRzfBwf/6H1VlSlR UuCdkNtQ28K3Bo07EtWD0Zz9jY7RtAm5bstCvQ3rw6SilLJsxGMwtkrAD3u++eFHSWjo BIzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748368554; x=1748973354; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=29DhBPvJSuZSbsbFVqTNsLLh+gvmKQoE8QQR0fa4+fg=; b=roJ57p5stx4+ctSTBVEU9wDaBNv2v/CKLSNnOOhUQCkNd000LY9u2VOOAGwg0kANa5 70DtX2EisD8HTwETcW4ysADTrgAA2IkK5aWOvXQ+Vv7Y8U0vv2lh0+n8yo+uxFAB76Aq bednZot3v0q80Pq1oSb8Yt7Y7nDK6EtX+3akObo0KjFbgEZi1XbtlRRnoqk/6H1+R8Fc WS/QJPZd0Y+qJi9pB/jCZB9+HSpvR670RRtIkV/3Xo0la4wD7dCOpxL8zPC4IDZiCrO2 hGGw6vSnZbPTQIX6yPH68aiJcTkzTFHz/FjpTLn3H2wKhKd76PkuVJm5eIINxxJskSQs 6iwg== X-Gm-Message-State: AOJu0YwfTIAC8eZM7zjnYQhBlJFXEft56B+ZCXfojEvaxcwu2FbpP/59 6tI6f11cVXRkpZoaNpv+KykkygVITDSwbFex4XN0LKq7yyhXx6kRXugg5gDCxExmB9YMPkpKlsq j4oOYUYQRXG3rRcTk1o5QzzD5RsN8Vn8= X-Gm-Gg: ASbGncvWvly3w66I/Zc7x/YPl9ZmGXIdctZqxnomr9YLqlX2Owpdu9Hbn0YzOQ2OtGs WHzTEY+o5o1qpIri0CgBWl7GyM0Aj5UcLEQTO/U99RtOiIlb/acKUYqdlZHIW4pJb95pui2dTlQ tRtv7wFokyvjv/wpakycGFmQB7Sg8kZfDxDBoYhRHZFpR3vOMV42WFO3WdopWw5AP7 X-Google-Smtp-Source: AGHT+IFFWvMgA3kOcidfv8coY5v6eSeAC0t9qLXidS05zfwNC44ZC6+mjEsabjP70u5ULzXA1RIm7muyKUYYXBjXyEQ= X-Received: by 2002:a05:6000:24c8:b0:3a4:e706:530d with SMTP id ffacd0b85a97d-3a4e7065699mr429871f8f.38.1748368553867; Tue, 27 May 2025 10:55:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dmitry Dolgov <9erthalion6@gmail.com> Date: Tue, 27 May 2025 19:55:39 +0200 X-Gm-Features: AX0GCFsUq2EdETDS_IJQ0SP0OE1e_4nWZKwXzObDfJIoqf9F3l0M106obRvKqV0 Message-ID: Subject: Re: Automatically sizing the IO worker pool To: Thomas Munro Cc: PostgreSQL Hackers Content-Type: multipart/alternative; boundary="00000000000096fea7063621c68b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000096fea7063621c68b Content-Type: text/plain; charset="UTF-8" On Mon, May 26, 2025, 8:01 AM Thomas Munro wrote: > But ... I'm not even sure if we can say that our > I/O arrivals have a Poisson distribution, since they are not all > independent. > Yeah, a good point, one have to be careful with assumptions about distribution -- from what I've read many processes in computer systems are better described by a Pareto. But the beauty of the queuing theory is that many results are independent from the distribution (not sure about dependencies though). In this version I went back to basics and built something that looks > more like the controls of a classic process/thread pool (think Apache) > or connection pool (think JDBC), with a couple of additions based on > intuition: (1) a launch interval, which acts as a bit of damping > against overshooting on brief bursts that are too far apart, and (2) > the queue length > workers * k as a simple way to determine that > latency is being introduced by not having enough workers. Perhaps > there is a good way to compute an adaptive value for k with some fancy > theories, but k=1 seems to have *some* basis: that's the lowest number > which the pool is too small and *certainly* introducing latency, but > any lower constant is harder to defend because we don't know how many > workers are already awake and about to consume tasks. Something from > queuing theory might provide an adaptive value, but in the end, I > figured we really just want to know if the queue is growing ie in > danger of overflowing (note: the queue is small! 64, and not > currently changeable, more on that later, and the overflow behaviour > is synchronous I/O as back-pressure). You seem to be suggesting that > k=1 sounds too low, not too high, but there is that separate > time-based defence against overshoot in response to rare bursts. > I probably had to start with a statement that I find the current approach reasonable, and I'm only curious if there is more to get out of it. I haven't benchmarked the patch yet (plan getting to it when I'll get back), and can imagine practical considerations significantly impacting any potential solution. About control theory... yeah. That's an interesting bag of tricks. > FWIW Melanie and (more recently) I have looked into textbook control > algorithms at a higher level of the I/O stack (and Melanie gave a talk > about other applications in eg VACUUM at pgconf.dev). In > read_stream.c, where I/O demand is created, we've been trying to set > the desired I/O concurrency level and thus lookahead distance with > adaptive feedback. We've tried a lot of stuff. I hope we can share > some concept patches some time soon, well, maybe in this cycle. Some > interesting recent experiments produced graphs that look a lot like > the ones in the book "Feedback Control for Computer Systems" (an easy > software-person book I found for people without an engineering/control > theory background where the problems match our world more closely, cf > typical texts that are about controlling motors and other mechanical > stuff...). Experimental goals are: find the the smallest concurrent > I/O request level (and thus lookahead distance and thus speculative > work done and buffers pinned) that keeps the I/O stall probability > near zero (and keep adapting, since other queries and applications are > sharing system I/O queues), and if that's not even possible, find the > highest concurrent I/O request level that doesn't cause extra latency > due to queuing in lower levels (I/O workers, kernel, ..., disks). > That second part is quite hard. In other words, if higher levels own > that problem and bring the adaptivity, then perhaps io_method=worker > can get away with being quite stupid. Just a thought... > Looking forward to it. And thanks for the reminder about the talk, wanted to watch it already long time ago, but somehow didn't managed yet. > --00000000000096fea7063621c68b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, May 26, 2025, 8:01 AM Thomas Mun= ro <thomas.munro@gmail.com= > wrote:
But ... = I'm not even sure if we can say that our
I/O arrivals have a Poisson distribution, since they are not all
independent.

Yeah, a good point, one have to be careful with assumptions abo= ut distribution -- from what I've read many processes in computer syste= ms are better described by a Pareto. But the beauty of the queuing theory i= s that many results are independent from the distribution (not sure about d= ependencies though).

In this version I went back to basics and built something that looks
more like the controls of a classic process/thread pool (think Apache)
or connection pool (think JDBC), with a couple of additions based on
intuition: (1) a launch interval, which acts as a bit of damping
against overshooting on brief bursts that are too far apart, and (2)
the queue length > workers * k as a simple way to determine that
latency is being introduced by not having enough workers.=C2=A0 Perhaps
there is a good way to compute an adaptive value for k with some fancy
theories, but k=3D1 seems to have *some* basis: that's the lowest numbe= r
which the pool is too small and *certainly* introducing latency, but
any lower constant is harder to defend because we don't know how many workers are already awake and about to consume tasks.=C2=A0 Something from<= br> queuing theory might provide an adaptive value, but in the end, I
figured we really just want to know if the queue is growing ie in
danger of overflowing (note: the queue is small!=C2=A0 64, and not
currently changeable, more on that later, and the overflow behaviour
is synchronous I/O as back-pressure).=C2=A0 You seem to be suggesting that<= br> k=3D1 sounds too low, not too high, but there is that separate
time-based defence against overshoot in response to rare bursts.

I probably had to = start with a statement that I find the current approach reasonable, and I&#= 39;m only curious if there is more to get out of it. I haven't benchmar= ked the patch yet (plan getting to it when I'll get back), and can imag= ine practical considerations significantly impacting any potential solution= .

pgconf.dev).=C2=A0 In
read_stream.c, where I/O demand is created, we've been trying to set the desired I/O concurrency level and thus lookahead distance with
adaptive feedback.=C2=A0 We've tried a lot of stuff.=C2=A0 I hope we ca= n share
some concept patches some time soon, well, maybe in this cycle.=C2=A0 Some<= br> interesting recent experiments produced graphs that look a lot like
the ones in the book "Feedback Control for Computer Systems" (an = easy
software-person book I found for people without an engineering/control
theory background where the problems match our world more closely, cf
typical texts that are about controlling motors and other mechanical
stuff...).=C2=A0 Experimental goals are: find the the smallest concurrent I/O request level (and thus lookahead distance and thus speculative
work done and buffers pinned) that keeps the I/O stall probability
near zero (and keep adapting, since other queries and applications are
sharing system I/O queues), and if that's not even possible, find the highest concurrent I/O request level that doesn't cause extra latency due to queuing in lower levels (I/O workers, kernel, ...,=C2=A0 disks).
That second part is quite hard.=C2=A0 In other words, if higher levels own<= br> that problem and bring the adaptivity, then perhaps io_method=3Dworker
can get away with being quite stupid.=C2=A0 Just a thought...

Looking forwar= d to it. And thanks for the reminder about the talk, wanted to watch it alr= eady long time ago, but somehow didn't managed yet.
--00000000000096fea7063621c68b--