Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uJNPN-003kZA-J5 for pgsql-hackers@arkaria.postgresql.org; Mon, 26 May 2025 02:17:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uJNPL-0028UF-AP for pgsql-hackers@arkaria.postgresql.org; Mon, 26 May 2025 02:17:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uJNPK-0028T4-TJ for pgsql-hackers@lists.postgresql.org; Mon, 26 May 2025 02:17:42 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uJNPG-0006uL-36 for pgsql-hackers@lists.postgresql.org; Mon, 26 May 2025 02:17:41 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-ad5740dd20eso267995166b.0 for ; Sun, 25 May 2025 19:17:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748225857; x=1748830657; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DXyr3GOnzfkFTvB42AHM8aAj1aOWA5+Ft339Jivb4p0=; b=hAea4JCuCr/06l8ywChSfRt4Lpku9jrrELO+SYOUmui7LWTnXEneMe7Y/UJfMsm8T7 M5Eo9Hh9oPArLo/sc2KrI+H2FZYnvrIj0hZLItXkvM9e2TCeBSm/LQZXps8X1YbaD4BT NztDmKAyyxSmp8VPrky90j0KMZH8FVzznBIQoGKUyCG89JfX9X2NvCsyW8qXTXnwt0sk DQCliLXL5b1Tuq/w7OkyR5DoavtdhfKxZLGddZ2RDo9kjKscx7I3U3Z3fsSEndOLBX0k yiu4sf0I3rpbqmIZg0gEu7gtVysJc24iCEKk6MXo/b8lK3YA/iv5h5FTj3fvkN8YyjS1 qJHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748225857; x=1748830657; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DXyr3GOnzfkFTvB42AHM8aAj1aOWA5+Ft339Jivb4p0=; b=kg/xIPONbBfWrBDEM612PHTFPmSoPNOWNZ9KnZRm0PCqv8eniJk8xflfZSyOBqpXWL EP/HbPuL4mgZHhsOHhQDkSaw9LY0Di3B1MpsarU4tvvq0Tf4+QVORHpoY6chVv0vK4hX BOQ+RCGzpsRpJlTWQZKzZv16a3HowrEw5ywOBKydhVALQzew4kZR79Zpi0mETuz2V3Y5 NaBj/zDZ8S7u944bCk4SZxMh9T26utmb1TNpNh/MU0rUEzLo2y4Dpk59Pmnl12J3G36f e97NiRmYTf+4xVaNWEr0IU0InNkb/XMnl3Ha1Y2PZbdZQlSuRLJ8GSPUQkv9NsoehZBt 8xpw== X-Forwarded-Encrypted: i=1; AJvYcCVyqrEYf3WSUorOX4bJWpwG1uAvGu/lBBZGkZr34d48gPEX1uabee8ud7MFdboNF/DzFTnhwd23NkLtMCmG@lists.postgresql.org X-Gm-Message-State: AOJu0YwYBBpG5WKJkxC9caglQtZt6ibIweHEA+lXLLQTNAymNUm3xsNE igtP187DylOjuvhTAuATS0YHGZOcThxXVlxU/+I6F2mUKL87gL8ULsOhwhStH1aQZlseg0lAwxx lxGiqw0Dli7kAaxa2u4WBxW0D0S6btJ4= X-Gm-Gg: ASbGnctJ2Yw7ygA58qZN+nGkMQyZiPHITyCxcLUz3ePeZMyk2HbcClhQIVoxfmR3LAD tLNLpRw146a+K6CjvkjaF6jW84Nm4ZpEcoaNbW4ClPDKzGsqYO54tyA961X1MuqqcbLSo07XVif P34FyE6ctn5h0PlcmkmzDSKrhL6yy1Kvc= X-Google-Smtp-Source: AGHT+IHQwuyE/R8TXMBpEwUyhgrKBxaEDFTjgUpP9chV1TBjIH2GKvKs+3FVuoQn51sVhFqUv/MXeZS7KA3nN/1pRAw= X-Received: by 2002:a17:907:c26:b0:ad4:e065:9403 with SMTP id a640c23a62f3a-ad85b11ec49mr641447466b.28.1748225857095; Sun, 25 May 2025 19:17:37 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: wenhui qiu Date: Mon, 26 May 2025 10:17:25 +0800 X-Gm-Features: AX0GCFueXlK0a5w6Rz5Xg_wuKrPeWSv937RQTe2lnLx8Ija6F6wfhUWAVn82qz0 Message-ID: Subject: Re: Automatically sizing the IO worker pool To: Dmitry Dolgov <9erthalion6@gmail.com> Cc: Thomas Munro , PostgreSQL Hackers Content-Type: multipart/alternative; boundary="000000000000330dfa0636008dfa" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000330dfa0636008dfa Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable HI > On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote: > It's hard to know how to set io_workers=3D3. If it's too small, > io_method=3Dworker's small submission queue overflows and it silently > falls back to synchronous IO. If it's too high, it generates a lot of > pointless wakeups and scheduling overhead, which might be considered > an independent problem or not, but having the right size pool > certainly mitigates it. Here's a patch to replace that GUC with: > > io_min_workers=3D1 > io_max_workers=3D8 > io_worker_idle_timeout=3D60s > io_worker_launch_interval=3D500ms > > It grows the pool when a backlog is detected (better ideas for this > logic welcome), and lets idle workers time out. I also like idea ,can we set a io_workers=3D 3 io_max_workers=3D cpu/4 io_workers_oversubscribe =3D 3 (range 1-8=EF=BC=89 io_workers * io_workers_oversubscribe <=3Dio_max_workers On Sun, May 25, 2025 at 3:20=E2=80=AFAM Dmitry Dolgov <9erthalion6@gmail.co= m> wrote: > > On Sun, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote: > > It's hard to know how to set io_workers=3D3. If it's too small, > > io_method=3Dworker's small submission queue overflows and it silently > > falls back to synchronous IO. If it's too high, it generates a lot of > > pointless wakeups and scheduling overhead, which might be considered > > an independent problem or not, but having the right size pool > > certainly mitigates it. Here's a patch to replace that GUC with: > > > > io_min_workers=3D1 > > io_max_workers=3D8 > > io_worker_idle_timeout=3D60s > > io_worker_launch_interval=3D500ms > > > > It grows the pool when a backlog is detected (better ideas for this > > logic welcome), and lets idle workers time out. > > I like the idea. In fact, I've been pondering about something like a > "smart" configuration for quite some time, and convinced that a similar > approach needs to be applied to many performance-related GUCs. > > Idle timeout and launch interval serving as a measure of sensitivity > makes sense to me, growing the pool when a backlog (queue_depth > > nworkers, so even a slightest backlog?) is detected seems to be somewhat > arbitrary. From what I understand the pool growing velocity is constant > and do not depend on the worker demand (i.e. queue_depth)? It may sounds > fancy, but I've got an impression it should be possible to apply what's > called a "low-pass filter" in the control theory (sort of a transfer > function with an exponential decay) to smooth out the demand and adjust > the worker pool based on that. > > As a side note, it might be far fetched, but there are instruments in > queueing theory to figure out how much workers are needed to guarantee a > certain low queueing probability, but for that one needs to have an > average arrival rate (in our case, average number of IO operations > dispatched to workers) and an average service rate (average number of IO > operations performed by workers). > > > --000000000000330dfa0636008dfa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
HI=C2=A0
> On Su= n, Apr 13, 2025 at 04:59:54AM GMT, Thomas Munro wrote:
> It's hard to= know how to set io_workers=3D3.=C2=A0 If it's too small,
> io_method= =3Dworker's small submission queue overflows and it silently
> falls = back to synchronous IO.=C2=A0 If it's too high, it generates a lot of
&g= t; pointless wakeups and scheduling overhead, which might be considered
>= an independent problem or not, but having the right size pool
> certainl= y mitigates it.=C2=A0 Here's a patch to replace that GUC with:>
>= =C2=A0 =C2=A0 =C2=A0 =C2=A0io_min_workers=3D1
>=C2=A0 =C2=A0 =C2=A0 =C2= =A0io_max_workers=3D8
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_worker_idle_timeout= =3D60s
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_worker_launch_interval=3D500= ms
>
> It grows the pool when a back= log is detected (better ideas for this
> logic welcome), and lets idle wo= rkers time out.
I a= lso like=C2=A0idea ,can we set a=C2=A0
io_workers=3D 3=C2= =A0
io_max_workers=3D cpu/= 4
io_workers_oversubscribe =3D 3 (range 1-8=EF=BC=89
io_workers * io_work= ers_oversubscribe <=3Dio_max_workers= =C2=A0

On Sun, May 25, 2025 at 3:20=E2=80= =AFAM Dmitry Dolgov <9erthalion= 6@gmail.com> wrote:
> On Sun, Apr 13, 2= 025 at 04:59:54AM GMT, Thomas Munro wrote:
> It's hard to know how to set io_workers=3D3.=C2=A0 If it's too= small,
> io_method=3Dworker's small submission queue overflows and it silen= tly
> falls back to synchronous IO.=C2=A0 If it's too high, it generates= a lot of
> pointless wakeups and scheduling overhead, which might be considered > an independent problem or not, but having the right size pool
> certainly mitigates it.=C2=A0 Here's a patch to replace that GUC w= ith:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_min_workers=3D1
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_max_workers=3D8
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_worker_idle_timeout=3D60s
>=C2=A0 =C2=A0 =C2=A0 =C2=A0io_worker_launch_interval=3D500ms
>
> It grows the pool when a backlog is detected (better ideas for this > logic welcome), and lets idle workers time out.

I like the idea. In fact, I've been pondering about something like a "smart" configuration for quite some time, and convinced that a s= imilar
approach needs to be applied to many performance-related GUCs.

Idle timeout and launch interval serving as a measure of sensitivity
makes sense to me, growing the pool when a backlog (queue_depth >
nworkers, so even a slightest backlog?) is detected seems to be somewhat arbitrary. From what I understand the pool growing velocity is constant
and do not depend on the worker demand (i.e. queue_depth)? It may sounds fancy, but I've got an impression it should be possible to apply what&#= 39;s
called a "low-pass filter" in the control theory (sort of a trans= fer
function with an exponential decay) to smooth out the demand and adjust
the worker pool based on that.

As a side note, it might be far fetched, but there are instruments in
queueing theory to figure out how much workers are needed to guarantee a certain low queueing probability, but for that one needs to have an
average arrival rate (in our case, average number of IO operations
dispatched to workers) and an average service rate (average number of IO operations performed by workers).


--000000000000330dfa0636008dfa--