public inbox for [email protected]  
help / color / mirror / Atom feed
BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
4+ messages / 3 participants
[nested] [flat]

* BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
@ 2026-06-02 18:36  PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: PG Bug reporting form @ 2026-06-02 18:36 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following bug has been logged on the website:

Bug reference:      19505
Logged by:          Maxim Boguk
Email address:      [email protected]
PostgreSQL version: 18.4
Operating system:   Ubuntu 24.04.4 LTS
Description:        

I started investigation of this issue after found that process count of
postgresql on my replica sometime jump to 200k+ (with max_connections=1000
and real connections under 100 most time).
Somehow single (seems random by always heavy/analytical) query spawn
thousands of the threads and tens thousands of parallel workers.

After some logging I caught one snapshot (ps -u postgres -L -o
pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:

[postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
39257

Main content is:
PID          TID               PPID  StartTime
command
2158552 2158552  948705 Tue Jun  2 17:40:17 2026 postgres: 18/main:
background_shared db [local] SELECT

Then:
The same PID but 1620 different TIDS.
PID          TID               PPID  StartTime
command
#main process
2158557 2158557  948705 Tue Jun  2 17:40:18 2026 postgres: 18/main:
background_shared db [local] SELECT
#1620 threads
2158557 2158607  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
background_shared db [local] SELECT
2158557 2158608  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
background_shared db [local] SELECT
2158557 2158609  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
background_shared db [local] SELECT

Then, 37571 rows!!! of:
PID          TID               PPID  StartTime
command
2158579 2159176  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557
2158579 2159179  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557
2158579 2159183  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557
2158579 2159196  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557
2158579 2159198  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557
2158579 2159202  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main: parallel
worker for PID 2158557

I double checked the query (it had been logged in database log): it run with
6 worker processes and without any issues on manual run.

Related db configuration:
max_connections = 1000
max_worker_processes = 128              # (change requires restart)
max_parallel_workers_per_gather = 16    # limited by max_parallel_workers
max_parallel_workers = 64
io_method = io_uring                    # worker, io_uring, sync
io_max_concurrency = -1         # Max number of IOs that one process
jit = on (usual suspect in case of weird things going on)

Given that situation happens like 1-10 times per hour (and lead for short LA
spikes up to 10000) - it's seriously affect the database replica
performance.

No external/non-standard/C extensions except of pgq and postgis loaded into
the database.

I can look for any additional information and  perform any local research
but currently I'm out of ideas what my next steps should be.

PS: it's seems that the issue could be triggered by different queries, but
not the one particular.







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
@ 2026-06-02 18:51  Maxim Boguk <[email protected]>
  parent: PG Bug reporting form <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Maxim Boguk @ 2026-06-02 18:51 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <[email protected]>
wrote:

> The following bug has been logged on the website:
>
> Bug reference:      19505
> Logged by:          Maxim Boguk
> Email address:      [email protected]
> PostgreSQL version: 18.4
> Operating system:   Ubuntu 24.04.4 LTS
> Description:
>
> I started investigation of this issue after found that process count of
> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
> and real connections under 100 most time).
> Somehow single (seems random by always heavy/analytical) query spawn
> thousands of the threads and tens thousands of parallel workers.
>
> After some logging I caught one snapshot (ps -u postgres -L -o
> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:
>
> [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
> 39257
>
> Main content is:
> PID          TID               PPID  StartTime
> command
> 2158552 2158552  948705 Tue Jun  2 17:40:17 2026 postgres: 18/main:
> background_shared db [local] SELECT
>
> Then:
> The same PID but 1620 different TIDS.
> PID          TID               PPID  StartTime
> command
> #main process
> 2158557 2158557  948705 Tue Jun  2 17:40:18 2026 postgres: 18/main:
> background_shared db [local] SELECT
> #1620 threads
> 2158557 2158607  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
> 2158557 2158608  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
> 2158557 2158609  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
>
> Then, 37571 rows!!! of:
> PID          TID               PPID  StartTime
> command
> 2158579 2159176  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159179  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159183  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159196  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159198  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159202  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
>
> I double checked the query (it had been logged in database log): it run
> with
> 6 worker processes and without any issues on manual run.
>
> Related db configuration:
> max_connections = 1000
> max_worker_processes = 128              # (change requires restart)
> max_parallel_workers_per_gather = 16    # limited by max_parallel_workers
> max_parallel_workers = 64
> io_method = io_uring                    # worker, io_uring, sync
> io_max_concurrency = -1         # Max number of IOs that one process
> jit = on (usual suspect in case of weird things going on)
>
> Given that situation happens like 1-10 times per hour (and lead for short
> LA
> spikes up to 10000) - it's seriously affect the database replica
> performance.
>
> No external/non-standard/C extensions except of pgq and postgis loaded into
> the database.
>
> I can look for any additional information and  perform any local research
> but currently I'm out of ideas what my next steps should be.
>
> PS: it's seems that the issue could be triggered by different queries, but
> not the one particular.
>
>
UPD:

Looks like the issue is always triggered by queries of the same kind (below
query and explain),
but I can't see any way how this query can spawn 10k-100k threads (or spawn
any threads at all).
PS: The problem started after upgrading 17->18, so the next suspect I am
going to test is "io_method = io_uring".

explain analyze  WITH _tmp as (
SELECT
"job_stats_master"."device", "job_stats_master"."organic",
"job_stats_master"."paid", "job_stats_master"."global_action",
"job_stats_master"."employer_id",
SUM(COALESCE(stats_count, 1)), sum(coalesce(price, 0)) AS spent,
DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE
'Etc/UTC'),
coalesce(SUM(price) filter(where bid_metric_matches(bid_metric,
global_action) AND paid AND NOT organic), 0) earned,
"job_stats_master"."job_board_id", "job_stats_master"."job_group_id",
"job_stats_master"."country_id", "job_stats_master"."gcc",
"job_stats_master"."exchange_id", "job_stats_master"."bid_metric",
sum(coalesce(price_publisher, 0)), sum(coalesce(cpc, 0)),
sum(coalesce(price, 0) * coalesce(ncc, 1)) AS spent_ncc,
sum(coalesce(price, 0) / coalesce(gcc, 1)) AS spent_gcc,
sum(coalesce(price_publisher, 0)) AS publisher_spent
FROM "job_stats_master"
WHERE
"job_stats_master"."job_board_id" = 30875
AND (job_stats_master.created_at >= '2026-05-01 00:00:00')
AND (job_stats_master.created_at <= '2026-05-31 23:59:59.999999')
AND ((job_stats_master.unpaid_reason NOT IN (7, 9, 19) OR
job_stats_master.unpaid_reason IS NULL))
AND "job_stats_master"."employer_id" NOT IN (20894, 28518, 28508, 28517,
20691, 24731, 28280, 20692, 20690, 28281, 28509, 20700, 28284, 28520,
20895, 24730, 20698, 20699, 28285, 28519, 28521, 28613, 28607, 28612,
28602, 28603, 28604, 28610, 28605, 28606, 28608, 28609, 28614, 28615,
28611, 28629, 28630, 28625, 28626, 28627, 28633, 28632, 28622, 28624,
28620, 28621, 28623, 28628, 28631, 28652, 28653, 28654, 28658, 28656,
28647, 28648, 28651, 28649, 28655, 28650, 28659, 28657, 28660, 20893,
28279, 19292, 28527, 19291, 28278, 24707, 19290, 28526, 28525, 18157,
18158, 18156, 18155, 28255, 28256, 28258, 28259, 28260, 24689, 28262,
28265, 28268, 28267, 24688, 28257, 22774, 20701, 28283, 21597, 24729,
20702, 22781, 20703, 20896, 28282, 28524, 22780, 28522, 28523, 7400, 7397,
7399, 8118, 8110, 7402, 6706, 6700, 6703, 8113, 8108, 9457, 8109, 9453,
7068, 8114, 7401, 9456, 6866, 8117, 9455, 8115, 7398, 6707, 7403, 9454,
7404, 8116, 6708, 21129, 21130, 21131, 21132, 24369, 24364, 24368, 24365,
24367, 24366, 18576, 18574, 18575, 18573, 24433, 21414, 2887, 60, 62, 61,
10381, 20676, 20677, 20851, 20972, 18220, 18219, 18221, 18222, 3182, 3158,
3157, 5213, 5212, 3156, 3839, 3840, 6919, 5818, 6951, 9502, 3841, 10489,
10490, 10491, 15231, 10493, 15232, 10492, 9189, 9185, 9187, 9188, 9190,
9186, 8898, 8900, 8897, 8899, 8895, 8896, 8912, 8909, 8914, 8913, 8911,
8910, 18184, 18182, 18185, 18183, 10858, 20706, 10861, 10859, 10857, 10860,
19570, 8906, 8908, 8907, 8904, 8905, 8903, 7686, 7685, 7684, 7682, 7687,
7683, 7689, 7692, 7688, 7693, 7690, 7691, 9503, 8225, 11593, 8223, 8227,
8222, 8226, 8224, 3295, 3294, 4151, 10468, 3293, 4081, 5668, 3153, 3152,
3154, 3480, 3148, 3150, 3482, 3149, 3291, 3292, 13708, 3987, 3290, 3533,
20675, 10298, 10290, 10297, 10299, 10296, 3286, 3284, 5320, 3951, 3907,
3459, 3285, 3529, 3289, 3288, 3287, 6894, 19552, 7873, 6240, 21467, 6816,
17328, 4598, 4599, 4600, 18244)
GROUP BY "job_stats_master"."device", "job_stats_master"."organic",
"job_stats_master"."paid", "job_stats_master"."global_action",
"job_stats_master"."employer_id", "job_stats_master"."job_board_id",
"job_stats_master"."job_group_id", "job_stats_master"."country_id",
DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE
'Etc/UTC'), "job_stats_master"."gcc", "job_stats_master"."exchange_id",
"job_stats_master"."bid_metric"
)
        SELECT employers.enterprise_id AS tier0_enterprise_id, _tmp.*
        FROM _tmp
        INNER JOIN employers on employers.id = _tmp.employer_id
;



                                                   QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=117116.37..156282.19 rows=556873 width=280) (actual
time=626.650..1874.724 rows=313222.00 loops=1)
   Hash Cond: (job_stats_master.employer_id = employers.id)
   Buffers: shared hit=505859
   ->  Finalize GroupAggregate  (cost=115548.94..154129.81 rows=556873
width=276) (actual time=593.246..1799.007 rows=313222.00 loops=1)
         Group Key: job_stats_master.device, job_stats_master.organic,
job_stats_master.paid, job_stats_master.global_action,
job_stats_master.employer_id, job_stats_master.job_group_id,
job_stats_master.country_id, (date(((job_stats_master.created_at AT TIME
ZONE 'UTC'::text) AT TIME ZONE 'Etc/UTC'::text))), job_stats_master.gcc,
job_stats_master.exchange_id, job_stats_master.bid_metric
         Buffers: shared hit=495471
         ->  Gather Merge  (cost=115548.94..130184.30 rows=556872
width=244) (actual time=593.228..1242.326 rows=475158.00 loops=1)
               Workers Planned: 6
               Workers Launched: 6
               Buffers: shared hit=495471
               ->  Partial GroupAggregate  (cost=105548.90..110653.56
rows=92812 width=244) (actual time=569.484..682.066 rows=67879.71 loops=7)
                     Group Key: job_stats_master.device,
job_stats_master.organic, job_stats_master.paid,
job_stats_master.global_action, job_stats_master.employer_id,
job_stats_master.job_group_id, job_stats_master.country_id,
(date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE
'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id,
job_stats_master.bid_metric
                     Buffers: shared hit=495471
                     ->  Sort  (cost=105548.90..105641.71 rows=92812
width=72) (actual time=569.430..580.918 rows=87120.71 loops=7)
                           Sort Key: job_stats_master.device,
job_stats_master.organic, job_stats_master.paid,
job_stats_master.global_action, job_stats_master.employer_id,
job_stats_master.job_group_id, job_stats_master.country_id,
(date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE
'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id,
job_stats_master.bid_metric
                           Sort Method: quicksort  Memory: 10559kB
                           Buffers: shared hit=495471
                           Worker 0:  Sort Method: quicksort  Memory: 9807kB
                           Worker 1:  Sort Method: quicksort  Memory: 9684kB
                           Worker 2:  Sort Method: quicksort  Memory: 9818kB
                           Worker 3:  Sort Method: quicksort  Memory: 9772kB
                           Worker 4:  Sort Method: quicksort  Memory: 9764kB
                           Worker 5:  Sort Method: quicksort  Memory: 9702kB
                           ->  Parallel Bitmap Heap Scan on
job_stats_new_2026_05 job_stats_master  (cost=34412.46..102021.67
rows=92812 width=72) (actual time=174.110..463.709 rows=87120.71 loops=7)
                                 Recheck Cond: ((job_board_id = 30875) AND
(created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND
(created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone))
                                 Filter: (((unpaid_reason <> ALL
('{7,9,19}'::integer[])) OR (unpaid_reason IS NULL)) AND (employer_id <>
ALL
('{20894,28518,28508,28517,20691,24731,28280,20692,20690,28281,28509,20700,28284,28520,20895,24730,20698,20699,28285,28519,28521,28613,28607,28612,28602,28603,28604,28610,28605,28606,28608,28609,28614,28615,28611,28629,28630,28625,28626,28627,28633,28632,28622,28624,28620,28621,28623,28628,28631,28652,28653,28654,28658,28656,28647,28648,28651,28649,28655,28650,28659,28657,28660,20893,28279,19292,28527,19291,28278,24707,19290,28526,28525,18157,18158,18156,18155,28255,28256,28258,28259,28260,24689,28262,28265,28268,28267,24688,28257,22774,20701,28283,21597,24729,20702,22781,20703,20896,28282,28524,22780,28522,28523,7400,7397,7399,8118,8110,7402,6706,6700,6703,8113,8108,9457,8109,9453,7068,8114,7401,9456,6866,8117,9455,8115,7398,6707,7403,9454,7404,8116,6708,21129,21130,21131,21132,24369,24364,24368,24365,24367,24366,18576,18574,18575,18573,24433,21414,2887,60,62,61,10381,20676,20677,20851,20972,18220,18219,18221,18222,3182,3158,3157,5213,5212,3156,3839,3840,6919,5818,6951,9502,3841,10489,10490,10491,15231,10493,15232,10492,9189,9185,9187,9188,9190,9186,8898,8900,8897,8899,8895,8896,8912,8909,8914,8913,8911,8910,18184,18182,18185,18183,10858,20706,10861,10859,10857,10860,19570,8906,8908,8907,8904,8905,8903,7686,7685,7684,7682,7687,7683,7689,7692,7688,7693,7690,7691,9503,8225,11593,8223,8227,8222,8226,8224,3295,3294,4151,10468,3293,4081,5668,3153,3152,3154,3480,3148,3150,3482,3149,3291,3292,13708,3987,3290,3533,20675,10298,10290,10297,10299,10296,3286,3284,5320,3951,3907,3459,3285,3529,3289,3288,3287,6894,19552,7873,6240,21467,6816,17328,4598,4599,4600,18244}'::integer[])))
                                 Rows Removed by Filter: 45
                                 Heap Blocks: exact=76322
                                 Buffers: shared hit=495195
                                 Worker 0:  Heap Blocks: exact=69229
                                 Worker 1:  Heap Blocks: exact=67759
                                 Worker 2:  Heap Blocks: exact=69407
                                 Worker 3:  Heap Blocks: exact=68584
                                 Worker 4:  Heap Blocks: exact=68597
                                 Worker 5:  Heap Blocks: exact=67828
                                 ->  Bitmap Index Scan on
job_stats_new_2026_05_job_board_id_job_reference_created_at_idx
 (cost=0.00..34356.49 rows=635582 width=0) (actual time=90.252..90.253
rows=610218.00 loops=1)
                                       Index Cond: ((job_board_id = 30875)
AND (created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND
(created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone))
                                       Index Searches: 1
                                       Buffers: shared hit=7343
   ->  Hash  (cost=1290.53..1290.53 rows=25173 width=8) (actual
time=33.285..33.292 rows=25173.00 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 1240kB
         Buffers: shared hit=10388
         ->  Seq Scan on employers  (cost=0.00..1290.53 rows=25173 width=8)
(actual time=18.012..30.607 rows=25173.00 loops=1)
               Buffers: shared hit=10388
 Planning:
   Buffers: shared hit=1212
 Planning Time: 2.535 ms
 JIT:
   Functions: 88
   Options: Inlining false, Optimization false, Expressions true, Deforming
true
   Timing: Generation 11.607 ms (Deform 5.619 ms), Inlining 0.000 ms,
Optimization 7.214 ms, Emission 90.846 ms, Total 109.668 ms
 Execution Time: 1902.285 ms


-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
@ 2026-06-22 20:22  Maxim Boguk <[email protected]>
  parent: Maxim Boguk <[email protected]>
  0 siblings, 1 reply; 4+ messages in thread

From: Maxim Boguk @ 2026-06-22 20:22 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Tue, Jun 2, 2026 at 9:51 PM Maxim Boguk <[email protected]> wrote:

>
>
> On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <
> [email protected]> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      19505
>> Logged by:          Maxim Boguk
>> Email address:      [email protected]
>> PostgreSQL version: 18.4
>> Operating system:   Ubuntu 24.04.4 LTS
>> Description:
>>
>> I started investigation of this issue after found that process count of
>> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
>> and real connections under 100 most time).
>> Somehow single (seems random by always heavy/analytical) query spawn
>> thousands of the threads and tens thousands of parallel workers.
>>
>> After some logging I caught one snapshot (ps -u postgres -L -o
>> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:
>>
>> [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
>> 39257
>>
>> Main content is:
>> PID          TID               PPID  StartTime
>> command
>> 2158552 2158552  948705 Tue Jun  2 17:40:17 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then:
>> The same PID but 1620 different TIDS.
>> PID          TID               PPID  StartTime
>> command
>> #main process
>> 2158557 2158557  948705 Tue Jun  2 17:40:18 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> #1620 threads
>> 2158557 2158607  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158608  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158609  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then, 37571 rows!!! of:
>> PID          TID               PPID  StartTime
>> command
>> 2158579 2159176  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159179  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159183  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159196  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159198  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159202  948705 Tue Jun  2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>>
>> I double checked the query (it had been logged in database log): it run
>> with
>> 6 worker processes and without any issues on manual run.
>>
>> Related db configuration:
>> max_connections = 1000
>> max_worker_processes = 128              # (change requires restart)
>> max_parallel_workers_per_gather = 16    # limited by max_parallel_workers
>> max_parallel_workers = 64
>> io_method = io_uring                    # worker, io_uring, sync
>> io_max_concurrency = -1         # Max number of IOs that one process
>> jit = on (usual suspect in case of weird things going on)
>>
>> Given that situation happens like 1-10 times per hour (and lead for short
>> LA
>> spikes up to 10000) - it's seriously affect the database replica
>> performance.
>>
>> No external/non-standard/C extensions except of pgq and postgis loaded
>> into
>> the database.
>>
>> I can look for any additional information and  perform any local research
>> but currently I'm out of ideas what my next steps should be.
>>
>> PS: it's seems that the issue could be triggered by different queries, but
>> not the one particular
>
>

Update:  issue had been triggered by unconstrained spawn of helper threads
for io_method=io_uring
(thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan).
Switching to the io_method=worker fixed problem.

Seems io_uring have some unexpected issues with unconstrained threads spawn.


-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678


^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
@ 2026-06-22 20:53  Andres Freund <[email protected]>
  parent: Maxim Boguk <[email protected]>
  0 siblings, 0 replies; 4+ messages in thread

From: Andres Freund @ 2026-06-22 20:53 UTC (permalink / raw)
  To: Maxim Boguk <[email protected]>; +Cc: [email protected]

Hi,

On 2026-06-22 23:22:53 +0300, Maxim Boguk wrote:
> >> I started investigation of this issue after found that process count of
> >> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
> >> and real connections under 100 most time).

You say upthread:

> Given that situation happens like 1-10 times per hour (and lead for short LA
> spikes up to 10000) - it's seriously affect the database replica
> performance.

I don't immediately see how the kernel worker thread creations leads to such
an LA and/or replica performance issues.


> Update:  issue had been triggered by unconstrained spawn of helper threads
> for io_method=io_uring
> (thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan).

Note that these are kernel controlled, not postgres controlled.  Newer kernels
will typically start fewer worker threads, because more of the IO can be
handled "full asynchronously" (i.e. without workers).

What is effective_io_concurrency set to?

I assume io_max_concurrency = -1 ends up with io_max_concurrency = 64 (the max
auto-tuned value)?


> Switching to the io_method=worker fixed problem.

Did that actually fix any performance issues?


> Seems io_uring have some unexpected issues with unconstrained threads spawn.

I don't really see how the query you described could trigger that many kernel
worker threads. Within each backend that should be bound by io_max_concurrency
(or perhaps io_max_concurrency * 2, due to the some kernel partitioning).
That's a far cry from 1620.

Across a parallel query it should be bound by by io_max_concurrency *
#postgres processes involved in query.  Assuming an effective_io_concurrency >
64, and io_max_concurrency = 64, your query shouldn't trigger more than 7 * 64
worker threads (due to the number of workers chosen) - clearly not a small
number, but not 1620 (even if it were not within one process). And even if all
16 workers were used, that's 17*64, still a lot less.

If this is happening with a current kernel, I think you need to make a
bugreport to ubuntu.

Greetings,

Andres Freund






^ permalink  raw  reply  [nested|flat] 4+ messages in thread


end of thread, other threads:[~2026-06-22 20:53 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-02 18:36 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. PG Bug reporting form <[email protected]>
2026-06-02 18:51 ` Maxim Boguk <[email protected]>
2026-06-22 20:22   ` Maxim Boguk <[email protected]>
2026-06-22 20:53     ` Andres Freund <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox