MIME-Version: 1.0
From: Greg Hennessy <greg.hennessy@gmail.com>
Date: Thu, 10 Jul 2025 15:39:12 -0400
Message-ID: <CA+mZaOMbOTUrs1QXWKyaxLfWGM2N21dQ=72GRx6jodALL-r4aQ@mail.gmail.com>
Subject: optimizing number of workers
To: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="0000000000000649c90639985bf0"
Archived-At: <https://www.postgresql.org/message-id/CA%2BmZaOMbOTUrs1QXWKyaxLfWGM2N21dQ%3D72GRx6jodALL-r4aQ%40mail.gmail.com>
Precedence: bulk

--0000000000000649c90639985bf0
Content-Type: text/plain; charset="UTF-8"

Having just received a shiny new dual CPU machine to use as a postgresql
server, I'm trying to do some reasonable efforts to configure it correctly.
The hard
ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql
16.9.

In postgresql.conf I have:
max_worker_processes = 90               # (change requires restart)
max_parallel_workers_per_gather = 72    # gsh 26 oct 2022
max_parallel_maintenance_workers = 72   # gsh 12 jun 2025
max_parallel_workers =  72              # gsh 12 jun 2025
max_logical_replication_workers = 72    # gsh 12 jun 2025
max_sync_workers_per_subscription = 72   # gsh 12 jun 2025
autovacuum_max_workers = 12             # max number of autovacuum
subprocesses

When I do a simple count of a large (large being 1.8 Billion entries), I get
about 10 workers used.

prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from
gaiadr3.gaia_source;

             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=14379796.81..14379796.82 rows=1 width=8) (actual
time=16702.806..16705.479 rows=1 loops=1)
   Buffers: shared hit=2507481
   ->  Gather  (cost=14379795.78..14379796.79 rows=10 width=8) (actual
time=16702.513..16705.470 rows=11 loops=1)
         Workers Planned: 10
         Workers Launched: 10
         Buffers: shared hit=2507481
         ->  Partial Aggregate  (cost=14379785.78..14379785.79 rows=1
width=8) (actual time=16691.820..16691.821 rows=1 loops=11)
               Buffers: shared hit=2507481
               ->  Parallel Index Only Scan using gaia_source_nest128 on
gaia_source  (cost=0.58..13926632.85 rows=181261171 width=0) (actual
time=0.025..9559.644 rows=164700888 loops=11)
                     Heap Fetches: 0
                     Buffers: shared hit=2507481
 Planning:
   Buffers: shared hit=163
 Planning Time: 14.898 ms
 Execution Time: 16705.592 ms

Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help?

--0000000000000649c90639985bf0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Having just received a shiny new dual CPU machine=C2=A0to =
use as a postgresql<div>server, I&#39;m trying to do some reasonable effort=
s to configure it correctly. The hard</div><div>ware has 128 cores, and I a=
m running a VM with Redhat 9 and Postgresql=C2=A0 16.9.</div><div><br></div=
><div>In postgresql.conf I have:</div><div>max_worker_processes =3D 90 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # (change requires restart)<b=
r>max_parallel_workers_per_gather =3D 72 =C2=A0 =C2=A0# gsh 26 oct 2022<br>=
max_parallel_maintenance_workers =3D 72 =C2=A0 # gsh 12 jun 2025<br>max_par=
allel_workers =3D =C2=A072 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
# gsh 12 jun 2025<br>max_logical_replication_workers =3D 72 =C2=A0 =C2=A0# =
gsh 12 jun 2025<br>max_sync_workers_per_subscription =3D 72 =C2=A0 # gsh 12=
 jun 2025<br>autovacuum_max_workers =3D 12 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 # max number of autovacuum subprocesses<br></div><div><br></div>=
<div>When I do a simple count of a large (large being 1.8 Billion entries),=
 I get</div><div>about 10 workers used.</div><div><br></div><div>prod_v1_0_=
0_rc1=3D# explain (analyze, buffers) select count(*) from gaiadr3.gaia_sour=
ce;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN<br>----------------------------------=
---------------------------------------------------------------------------=
---------------------------------------------------------------------------=
----<br>=C2=A0Finalize Aggregate =C2=A0(cost=3D14379796.81..14379796.82 row=
s=3D1 width=3D8) (actual time=3D16702.806..16705.479 rows=3D1 loops=3D1)<br=
>=C2=A0 =C2=A0Buffers: shared hit=3D2507481<br>=C2=A0 =C2=A0-&gt; =C2=A0Gat=
her =C2=A0(cost=3D14379795.78..14379796.79 rows=3D10 width=3D8) (actual tim=
e=3D16702.513..16705.470 rows=3D11 loops=3D1)<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0Workers Planned: 10<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Workers =
Launched: 10<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D250=
7481<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Partial Aggregate =C2=
=A0(cost=3D14379785.78..14379785.79 rows=3D1 width=3D8) (actual time=3D1669=
1.820..16691.821 rows=3D1 loops=3D11)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D2507481<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Parallel Index Only Scan usin=
g gaia_source_nest128 on gaia_source =C2=A0(cost=3D0.58..13926632.85 rows=
=3D181261171 width=3D0) (actual time=3D0.025..9559.644 rows=3D164700888 loo=
ps=3D11)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Heap Fetches: 0<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Buffers: shared hit=3D2507481<br>=C2=A0Pl=
anning:<br>=C2=A0 =C2=A0Buffers: shared hit=3D163<br>=C2=A0Planning Time: 1=
4.898 ms<br>=C2=A0Execution Time: 16705.592 ms<br><br></div><div>Postgres h=
as chosen to use only a small fraction of the CPU&#39;s I have on</div><div=
>my machine. Given the query returns an answer in about 8 seconds, it may b=
e</div><div>that Postgresql has allocated the proper number of works. But i=
f I wanted</div><div>to try to tweak some config parameters to see if using=
 more workers</div><div>would give me an answer faster, I don&#39;t seem to=
 see any obvious knobs</div><div>to turn. Are there parameters that I can a=
djust to see if I can increase</div><div>throughput? Would adjusting parall=
el_setup_cost or parallel_tuple_cost</div><div>likely to be of help?=C2=A0<=
/div></div>

--0000000000000649c90639985bf0--