MIME-Version: 1.0
References: 
 <CA+hUKG+m4xV0LMoH2c=oRAdEXuCnh+tGBTWa7uFeFMGgTLAw+Q@mail.gmail.com>
 <it7fexpclowjku57bsdh4uqr366wa2fxtq5ahzxczoxonmbh5s@g2f5oesiakzq>
 <CA+hUKGJULUTkT2LpeHTSt3KHbJrYNBT-kj1-OhMRV_PnUQ_57A@mail.gmail.com>
In-Reply-To: 
 <CA+hUKGJULUTkT2LpeHTSt3KHbJrYNBT-kj1-OhMRV_PnUQ_57A@mail.gmail.com>
From: Dmitry Dolgov <9erthalion6@gmail.com>
Date: Tue, 27 May 2025 19:55:39 +0200
Message-ID: 
 <CA+q6zcW0qh2aPKg0z58mp-Ba8avp7MWkS00ADrOOv=CBzJJMLA@mail.gmail.com>
Subject: Re: Automatically sizing the IO worker pool
To: Thomas Munro <thomas.munro@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000096fea7063621c68b"
Archived-At: 
 <https://www.postgresql.org/message-id/CA%2Bq6zcW0qh2aPKg0z58mp-Ba8avp7MWkS00ADrOOv%3DCBzJJMLA%40mail.gmail.com>
Precedence: bulk

--00000000000096fea7063621c68b
Content-Type: text/plain; charset="UTF-8"

On Mon, May 26, 2025, 8:01 AM Thomas Munro <thomas.munro@gmail.com> wrote:

> But ... I'm not even sure if we can say that our
> I/O arrivals have a Poisson distribution, since they are not all
> independent.
>

Yeah, a good point, one have to be careful with assumptions about
distribution -- from what I've read many processes in computer systems are
better described by a Pareto. But the beauty of the queuing theory is that
many results are independent from the distribution (not sure about
dependencies though).

In this version I went back to basics and built something that looks
> more like the controls of a classic process/thread pool (think Apache)
> or connection pool (think JDBC), with a couple of additions based on
> intuition: (1) a launch interval, which acts as a bit of damping
> against overshooting on brief bursts that are too far apart, and (2)
> the queue length > workers * k as a simple way to determine that
> latency is being introduced by not having enough workers.  Perhaps
> there is a good way to compute an adaptive value for k with some fancy
> theories, but k=1 seems to have *some* basis: that's the lowest number
> which the pool is too small and *certainly* introducing latency, but
> any lower constant is harder to defend because we don't know how many
> workers are already awake and about to consume tasks.  Something from
> queuing theory might provide an adaptive value, but in the end, I
> figured we really just want to know if the queue is growing ie in
> danger of overflowing (note: the queue is small!  64, and not
> currently changeable, more on that later, and the overflow behaviour
> is synchronous I/O as back-pressure).  You seem to be suggesting that
> k=1 sounds too low, not too high, but there is that separate
> time-based defence against overshoot in response to rare bursts.
>

I probably had to start with a statement that I find the current approach
reasonable, and I'm only curious if there is more to get out of it. I
haven't benchmarked the patch yet (plan getting to it when I'll get back),
and can imagine practical considerations significantly impacting any
potential solution.

About control theory... yeah.  That's an interesting bag of tricks.
> FWIW Melanie and (more recently) I have looked into textbook control
> algorithms at a higher level of the I/O stack (and Melanie gave a talk
> about other applications in eg VACUUM at pgconf.dev).  In
> read_stream.c, where I/O demand is created, we've been trying to set
> the desired I/O concurrency level and thus lookahead distance with
> adaptive feedback.  We've tried a lot of stuff.  I hope we can share
> some concept patches some time soon, well, maybe in this cycle.  Some
> interesting recent experiments produced graphs that look a lot like
> the ones in the book "Feedback Control for Computer Systems" (an easy
> software-person book I found for people without an engineering/control
> theory background where the problems match our world more closely, cf
> typical texts that are about controlling motors and other mechanical
> stuff...).  Experimental goals are: find the the smallest concurrent
> I/O request level (and thus lookahead distance and thus speculative
> work done and buffers pinned) that keeps the I/O stall probability
> near zero (and keep adapting, since other queries and applications are
> sharing system I/O queues), and if that's not even possible, find the
> highest concurrent I/O request level that doesn't cause extra latency
> due to queuing in lower levels (I/O workers, kernel, ...,  disks).
> That second part is quite hard.  In other words, if higher levels own
> that problem and bring the adaptivity, then perhaps io_method=worker
> can get away with being quite stupid.  Just a thought...
>

Looking forward to it. And thanks for the reminder about the talk, wanted
to watch it already long time ago, but somehow didn't managed yet.

>

--00000000000096fea7063621c68b
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><div class=3D"gmail_quote gmail_quote_container"><di=
v dir=3D"ltr" class=3D"gmail_attr">On Mon, May 26, 2025, 8:01 AM Thomas Mun=
ro &lt;<a href=3D"mailto:thomas.munro@gmail.com">thomas.munro@gmail.com</a>=
&gt; wrote:</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">But ... =
I&#39;m not even sure if we can say that our<br>
I/O arrivals have a Poisson distribution, since they are not all<br>
independent.<br></blockquote></div></div><div dir=3D"auto"><br></div><div d=
ir=3D"auto">Yeah, a good point, one have to be careful with assumptions abo=
ut distribution -- from what I&#39;ve read many processes in computer syste=
ms are better described by a Pareto. But the beauty of the queuing theory i=
s that many results are independent from the distribution (not sure about d=
ependencies though).</div><div dir=3D"auto"><br></div><div dir=3D"auto"><di=
v class=3D"gmail_quote gmail_quote_container"><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,20=
4);padding-left:1ex">
In this version I went back to basics and built something that looks<br>
more like the controls of a classic process/thread pool (think Apache)<br>
or connection pool (think JDBC), with a couple of additions based on<br>
intuition: (1) a launch interval, which acts as a bit of damping<br>
against overshooting on brief bursts that are too far apart, and (2)<br>
the queue length &gt; workers * k as a simple way to determine that<br>
latency is being introduced by not having enough workers.=C2=A0 Perhaps<br>
there is a good way to compute an adaptive value for k with some fancy<br>
theories, but k=3D1 seems to have *some* basis: that&#39;s the lowest numbe=
r<br>
which the pool is too small and *certainly* introducing latency, but<br>
any lower constant is harder to defend because we don&#39;t know how many<b=
r>
workers are already awake and about to consume tasks.=C2=A0 Something from<=
br>
queuing theory might provide an adaptive value, but in the end, I<br>
figured we really just want to know if the queue is growing ie in<br>
danger of overflowing (note: the queue is small!=C2=A0 64, and not<br>
currently changeable, more on that later, and the overflow behaviour<br>
is synchronous I/O as back-pressure).=C2=A0 You seem to be suggesting that<=
br>
k=3D1 sounds too low, not too high, but there is that separate<br>
time-based defence against overshoot in response to rare bursts.<br></block=
quote></div></div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=
=3D"gmail_quote gmail_quote_container"><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex"></blockquote></div></div><div dir=3D"auto">I probably had to =
start with a statement that I find the current approach reasonable, and I&#=
39;m only curious if there is more to get out of it. I haven&#39;t benchmar=
ked the patch yet (plan getting to it when I&#39;ll get back), and can imag=
ine practical considerations significantly impacting any potential solution=
.</div><div dir=3D"auto"><br></div><div dir=3D"auto"><div class=3D"gmail_qu=
ote gmail_quote_container"><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">
About control theory... yeah.=C2=A0 That&#39;s an interesting bag of tricks=
.<br>
FWIW Melanie and (more recently) I have looked into textbook control<br>
algorithms at a higher level of the I/O stack (and Melanie gave a talk<br>
about other applications in eg VACUUM at <a href=3D"http://pgconf.dev" rel=
=3D"noreferrer noreferrer" target=3D"_blank">pgconf.dev</a>).=C2=A0 In<br>
read_stream.c, where I/O demand is created, we&#39;ve been trying to set<br=
>
the desired I/O concurrency level and thus lookahead distance with<br>
adaptive feedback.=C2=A0 We&#39;ve tried a lot of stuff.=C2=A0 I hope we ca=
n share<br>
some concept patches some time soon, well, maybe in this cycle.=C2=A0 Some<=
br>
interesting recent experiments produced graphs that look a lot like<br>
the ones in the book &quot;Feedback Control for Computer Systems&quot; (an =
easy<br>
software-person book I found for people without an engineering/control<br>
theory background where the problems match our world more closely, cf<br>
typical texts that are about controlling motors and other mechanical<br>
stuff...).=C2=A0 Experimental goals are: find the the smallest concurrent<b=
r>
I/O request level (and thus lookahead distance and thus speculative<br>
work done and buffers pinned) that keeps the I/O stall probability<br>
near zero (and keep adapting, since other queries and applications are<br>
sharing system I/O queues), and if that&#39;s not even possible, find the<b=
r>
highest concurrent I/O request level that doesn&#39;t cause extra latency<b=
r>
due to queuing in lower levels (I/O workers, kernel, ...,=C2=A0 disks).<br>
That second part is quite hard.=C2=A0 In other words, if higher levels own<=
br>
that problem and bring the adaptivity, then perhaps io_method=3Dworker<br>
can get away with being quite stupid.=C2=A0 Just a thought...<br></blockquo=
te></div></div><div dir=3D"auto"><br></div><div dir=3D"auto">Looking forwar=
d to it. And thanks for the reminder about the talk, wanted to watch it alr=
eady long time ago, but somehow didn&#39;t managed yet.</div><div dir=3D"au=
to"><div class=3D"gmail_quote gmail_quote_container"><blockquote class=3D"g=
mail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204=
,204,204);padding-left:1ex">
</blockquote></div></div></div>

--00000000000096fea7063621c68b--