public inbox for [email protected]help / color / mirror / Atom feed
BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. 4+ messages / 3 participants [nested] [flat]
* BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. @ 2026-06-02 18:36 PG Bug reporting form <[email protected]> 2026-06-02 18:51 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Maxim Boguk <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: PG Bug reporting form @ 2026-06-02 18:36 UTC (permalink / raw) To: [email protected]; +Cc: [email protected] The following bug has been logged on the website: Bug reference: 19505 Logged by: Maxim Boguk Email address: [email protected] PostgreSQL version: 18.4 Operating system: Ubuntu 24.04.4 LTS Description: I started investigation of this issue after found that process count of postgresql on my replica sometime jump to 200k+ (with max_connections=1000 and real connections under 100 most time). Somehow single (seems random by always heavy/analytical) query spawn thousands of the threads and tens thousands of parallel workers. After some logging I caught one snapshot (ps -u postgres -L -o pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes: [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l 39257 Main content is: PID TID PPID StartTime command 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main: background_shared db [local] SELECT Then: The same PID but 1620 different TIDS. PID TID PPID StartTime command #main process 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main: background_shared db [local] SELECT #1620 threads 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: background_shared db [local] SELECT 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: background_shared db [local] SELECT 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: background_shared db [local] SELECT Then, 37571 rows!!! of: PID TID PPID StartTime command 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: parallel worker for PID 2158557 I double checked the query (it had been logged in database log): it run with 6 worker processes and without any issues on manual run. Related db configuration: max_connections = 1000 max_worker_processes = 128 # (change requires restart) max_parallel_workers_per_gather = 16 # limited by max_parallel_workers max_parallel_workers = 64 io_method = io_uring # worker, io_uring, sync io_max_concurrency = -1 # Max number of IOs that one process jit = on (usual suspect in case of weird things going on) Given that situation happens like 1-10 times per hour (and lead for short LA spikes up to 10000) - it's seriously affect the database replica performance. No external/non-standard/C extensions except of pgq and postgis loaded into the database. I can look for any additional information and perform any local research but currently I'm out of ideas what my next steps should be. PS: it's seems that the issue could be triggered by different queries, but not the one particular. ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. 2026-06-02 18:36 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. PG Bug reporting form <[email protected]> @ 2026-06-02 18:51 ` Maxim Boguk <[email protected]> 2026-06-22 20:22 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Maxim Boguk <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Maxim Boguk @ 2026-06-02 18:51 UTC (permalink / raw) To: [email protected]; [email protected] On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <[email protected]> wrote: > The following bug has been logged on the website: > > Bug reference: 19505 > Logged by: Maxim Boguk > Email address: [email protected] > PostgreSQL version: 18.4 > Operating system: Ubuntu 24.04.4 LTS > Description: > > I started investigation of this issue after found that process count of > postgresql on my replica sometime jump to 200k+ (with max_connections=1000 > and real connections under 100 most time). > Somehow single (seems random by always heavy/analytical) query spawn > thousands of the threads and tens thousands of parallel workers. > > After some logging I caught one snapshot (ps -u postgres -L -o > pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes: > > [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l > 39257 > > Main content is: > PID TID PPID StartTime > command > 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main: > background_shared db [local] SELECT > > Then: > The same PID but 1620 different TIDS. > PID TID PPID StartTime > command > #main process > 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main: > background_shared db [local] SELECT > #1620 threads > 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > background_shared db [local] SELECT > 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > background_shared db [local] SELECT > 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > background_shared db [local] SELECT > > Then, 37571 rows!!! of: > PID TID PPID StartTime > command > 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: > parallel > worker for PID 2158557 > > I double checked the query (it had been logged in database log): it run > with > 6 worker processes and without any issues on manual run. > > Related db configuration: > max_connections = 1000 > max_worker_processes = 128 # (change requires restart) > max_parallel_workers_per_gather = 16 # limited by max_parallel_workers > max_parallel_workers = 64 > io_method = io_uring # worker, io_uring, sync > io_max_concurrency = -1 # Max number of IOs that one process > jit = on (usual suspect in case of weird things going on) > > Given that situation happens like 1-10 times per hour (and lead for short > LA > spikes up to 10000) - it's seriously affect the database replica > performance. > > No external/non-standard/C extensions except of pgq and postgis loaded into > the database. > > I can look for any additional information and perform any local research > but currently I'm out of ideas what my next steps should be. > > PS: it's seems that the issue could be triggered by different queries, but > not the one particular. > > UPD: Looks like the issue is always triggered by queries of the same kind (below query and explain), but I can't see any way how this query can spawn 10k-100k threads (or spawn any threads at all). PS: The problem started after upgrading 17->18, so the next suspect I am going to test is "io_method = io_uring". explain analyze WITH _tmp as ( SELECT "job_stats_master"."device", "job_stats_master"."organic", "job_stats_master"."paid", "job_stats_master"."global_action", "job_stats_master"."employer_id", SUM(COALESCE(stats_count, 1)), sum(coalesce(price, 0)) AS spent, DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Etc/UTC'), coalesce(SUM(price) filter(where bid_metric_matches(bid_metric, global_action) AND paid AND NOT organic), 0) earned, "job_stats_master"."job_board_id", "job_stats_master"."job_group_id", "job_stats_master"."country_id", "job_stats_master"."gcc", "job_stats_master"."exchange_id", "job_stats_master"."bid_metric", sum(coalesce(price_publisher, 0)), sum(coalesce(cpc, 0)), sum(coalesce(price, 0) * coalesce(ncc, 1)) AS spent_ncc, sum(coalesce(price, 0) / coalesce(gcc, 1)) AS spent_gcc, sum(coalesce(price_publisher, 0)) AS publisher_spent FROM "job_stats_master" WHERE "job_stats_master"."job_board_id" = 30875 AND (job_stats_master.created_at >= '2026-05-01 00:00:00') AND (job_stats_master.created_at <= '2026-05-31 23:59:59.999999') AND ((job_stats_master.unpaid_reason NOT IN (7, 9, 19) OR job_stats_master.unpaid_reason IS NULL)) AND "job_stats_master"."employer_id" NOT IN (20894, 28518, 28508, 28517, 20691, 24731, 28280, 20692, 20690, 28281, 28509, 20700, 28284, 28520, 20895, 24730, 20698, 20699, 28285, 28519, 28521, 28613, 28607, 28612, 28602, 28603, 28604, 28610, 28605, 28606, 28608, 28609, 28614, 28615, 28611, 28629, 28630, 28625, 28626, 28627, 28633, 28632, 28622, 28624, 28620, 28621, 28623, 28628, 28631, 28652, 28653, 28654, 28658, 28656, 28647, 28648, 28651, 28649, 28655, 28650, 28659, 28657, 28660, 20893, 28279, 19292, 28527, 19291, 28278, 24707, 19290, 28526, 28525, 18157, 18158, 18156, 18155, 28255, 28256, 28258, 28259, 28260, 24689, 28262, 28265, 28268, 28267, 24688, 28257, 22774, 20701, 28283, 21597, 24729, 20702, 22781, 20703, 20896, 28282, 28524, 22780, 28522, 28523, 7400, 7397, 7399, 8118, 8110, 7402, 6706, 6700, 6703, 8113, 8108, 9457, 8109, 9453, 7068, 8114, 7401, 9456, 6866, 8117, 9455, 8115, 7398, 6707, 7403, 9454, 7404, 8116, 6708, 21129, 21130, 21131, 21132, 24369, 24364, 24368, 24365, 24367, 24366, 18576, 18574, 18575, 18573, 24433, 21414, 2887, 60, 62, 61, 10381, 20676, 20677, 20851, 20972, 18220, 18219, 18221, 18222, 3182, 3158, 3157, 5213, 5212, 3156, 3839, 3840, 6919, 5818, 6951, 9502, 3841, 10489, 10490, 10491, 15231, 10493, 15232, 10492, 9189, 9185, 9187, 9188, 9190, 9186, 8898, 8900, 8897, 8899, 8895, 8896, 8912, 8909, 8914, 8913, 8911, 8910, 18184, 18182, 18185, 18183, 10858, 20706, 10861, 10859, 10857, 10860, 19570, 8906, 8908, 8907, 8904, 8905, 8903, 7686, 7685, 7684, 7682, 7687, 7683, 7689, 7692, 7688, 7693, 7690, 7691, 9503, 8225, 11593, 8223, 8227, 8222, 8226, 8224, 3295, 3294, 4151, 10468, 3293, 4081, 5668, 3153, 3152, 3154, 3480, 3148, 3150, 3482, 3149, 3291, 3292, 13708, 3987, 3290, 3533, 20675, 10298, 10290, 10297, 10299, 10296, 3286, 3284, 5320, 3951, 3907, 3459, 3285, 3529, 3289, 3288, 3287, 6894, 19552, 7873, 6240, 21467, 6816, 17328, 4598, 4599, 4600, 18244) GROUP BY "job_stats_master"."device", "job_stats_master"."organic", "job_stats_master"."paid", "job_stats_master"."global_action", "job_stats_master"."employer_id", "job_stats_master"."job_board_id", "job_stats_master"."job_group_id", "job_stats_master"."country_id", DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Etc/UTC'), "job_stats_master"."gcc", "job_stats_master"."exchange_id", "job_stats_master"."bid_metric" ) SELECT employers.enterprise_id AS tier0_enterprise_id, _tmp.* FROM _tmp INNER JOIN employers on employers.id = _tmp.employer_id ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=117116.37..156282.19 rows=556873 width=280) (actual time=626.650..1874.724 rows=313222.00 loops=1) Hash Cond: (job_stats_master.employer_id = employers.id) Buffers: shared hit=505859 -> Finalize GroupAggregate (cost=115548.94..154129.81 rows=556873 width=276) (actual time=593.246..1799.007 rows=313222.00 loops=1) Group Key: job_stats_master.device, job_stats_master.organic, job_stats_master.paid, job_stats_master.global_action, job_stats_master.employer_id, job_stats_master.job_group_id, job_stats_master.country_id, (date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE 'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id, job_stats_master.bid_metric Buffers: shared hit=495471 -> Gather Merge (cost=115548.94..130184.30 rows=556872 width=244) (actual time=593.228..1242.326 rows=475158.00 loops=1) Workers Planned: 6 Workers Launched: 6 Buffers: shared hit=495471 -> Partial GroupAggregate (cost=105548.90..110653.56 rows=92812 width=244) (actual time=569.484..682.066 rows=67879.71 loops=7) Group Key: job_stats_master.device, job_stats_master.organic, job_stats_master.paid, job_stats_master.global_action, job_stats_master.employer_id, job_stats_master.job_group_id, job_stats_master.country_id, (date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE 'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id, job_stats_master.bid_metric Buffers: shared hit=495471 -> Sort (cost=105548.90..105641.71 rows=92812 width=72) (actual time=569.430..580.918 rows=87120.71 loops=7) Sort Key: job_stats_master.device, job_stats_master.organic, job_stats_master.paid, job_stats_master.global_action, job_stats_master.employer_id, job_stats_master.job_group_id, job_stats_master.country_id, (date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE 'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id, job_stats_master.bid_metric Sort Method: quicksort Memory: 10559kB Buffers: shared hit=495471 Worker 0: Sort Method: quicksort Memory: 9807kB Worker 1: Sort Method: quicksort Memory: 9684kB Worker 2: Sort Method: quicksort Memory: 9818kB Worker 3: Sort Method: quicksort Memory: 9772kB Worker 4: Sort Method: quicksort Memory: 9764kB Worker 5: Sort Method: quicksort Memory: 9702kB -> Parallel Bitmap Heap Scan on job_stats_new_2026_05 job_stats_master (cost=34412.46..102021.67 rows=92812 width=72) (actual time=174.110..463.709 rows=87120.71 loops=7) Recheck Cond: ((job_board_id = 30875) AND (created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND (created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone)) Filter: (((unpaid_reason <> ALL ('{7,9,19}'::integer[])) OR (unpaid_reason IS NULL)) AND (employer_id <> ALL ('{20894,28518,28508,28517,20691,24731,28280,20692,20690,28281,28509,20700,28284,28520,20895,24730,20698,20699,28285,28519,28521,28613,28607,28612,28602,28603,28604,28610,28605,28606,28608,28609,28614,28615,28611,28629,28630,28625,28626,28627,28633,28632,28622,28624,28620,28621,28623,28628,28631,28652,28653,28654,28658,28656,28647,28648,28651,28649,28655,28650,28659,28657,28660,20893,28279,19292,28527,19291,28278,24707,19290,28526,28525,18157,18158,18156,18155,28255,28256,28258,28259,28260,24689,28262,28265,28268,28267,24688,28257,22774,20701,28283,21597,24729,20702,22781,20703,20896,28282,28524,22780,28522,28523,7400,7397,7399,8118,8110,7402,6706,6700,6703,8113,8108,9457,8109,9453,7068,8114,7401,9456,6866,8117,9455,8115,7398,6707,7403,9454,7404,8116,6708,21129,21130,21131,21132,24369,24364,24368,24365,24367,24366,18576,18574,18575,18573,24433,21414,2887,60,62,61,10381,20676,20677,20851,20972,18220,18219,18221,18222,3182,3158,3157,5213,5212,3156,3839,3840,6919,5818,6951,9502,3841,10489,10490,10491,15231,10493,15232,10492,9189,9185,9187,9188,9190,9186,8898,8900,8897,8899,8895,8896,8912,8909,8914,8913,8911,8910,18184,18182,18185,18183,10858,20706,10861,10859,10857,10860,19570,8906,8908,8907,8904,8905,8903,7686,7685,7684,7682,7687,7683,7689,7692,7688,7693,7690,7691,9503,8225,11593,8223,8227,8222,8226,8224,3295,3294,4151,10468,3293,4081,5668,3153,3152,3154,3480,3148,3150,3482,3149,3291,3292,13708,3987,3290,3533,20675,10298,10290,10297,10299,10296,3286,3284,5320,3951,3907,3459,3285,3529,3289,3288,3287,6894,19552,7873,6240,21467,6816,17328,4598,4599,4600,18244}'::integer[]))) Rows Removed by Filter: 45 Heap Blocks: exact=76322 Buffers: shared hit=495195 Worker 0: Heap Blocks: exact=69229 Worker 1: Heap Blocks: exact=67759 Worker 2: Heap Blocks: exact=69407 Worker 3: Heap Blocks: exact=68584 Worker 4: Heap Blocks: exact=68597 Worker 5: Heap Blocks: exact=67828 -> Bitmap Index Scan on job_stats_new_2026_05_job_board_id_job_reference_created_at_idx (cost=0.00..34356.49 rows=635582 width=0) (actual time=90.252..90.253 rows=610218.00 loops=1) Index Cond: ((job_board_id = 30875) AND (created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND (created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone)) Index Searches: 1 Buffers: shared hit=7343 -> Hash (cost=1290.53..1290.53 rows=25173 width=8) (actual time=33.285..33.292 rows=25173.00 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1240kB Buffers: shared hit=10388 -> Seq Scan on employers (cost=0.00..1290.53 rows=25173 width=8) (actual time=18.012..30.607 rows=25173.00 loops=1) Buffers: shared hit=10388 Planning: Buffers: shared hit=1212 Planning Time: 2.535 ms JIT: Functions: 88 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 11.607 ms (Deform 5.619 ms), Inlining 0.000 ms, Optimization 7.214 ms, Emission 90.846 ms, Total 109.668 ms Execution Time: 1902.285 ms -- Maxim Boguk Senior Postgresql DBA Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. 2026-06-02 18:36 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. PG Bug reporting form <[email protected]> 2026-06-02 18:51 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Maxim Boguk <[email protected]> @ 2026-06-22 20:22 ` Maxim Boguk <[email protected]> 2026-06-22 20:53 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Andres Freund <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Maxim Boguk @ 2026-06-22 20:22 UTC (permalink / raw) To: [email protected]; [email protected] On Tue, Jun 2, 2026 at 9:51 PM Maxim Boguk <[email protected]> wrote: > > > On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form < > [email protected]> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 19505 >> Logged by: Maxim Boguk >> Email address: [email protected] >> PostgreSQL version: 18.4 >> Operating system: Ubuntu 24.04.4 LTS >> Description: >> >> I started investigation of this issue after found that process count of >> postgresql on my replica sometime jump to 200k+ (with max_connections=1000 >> and real connections under 100 most time). >> Somehow single (seems random by always heavy/analytical) query spawn >> thousands of the threads and tens thousands of parallel workers. >> >> After some logging I caught one snapshot (ps -u postgres -L -o >> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes: >> >> [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l >> 39257 >> >> Main content is: >> PID TID PPID StartTime >> command >> 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main: >> background_shared db [local] SELECT >> >> Then: >> The same PID but 1620 different TIDS. >> PID TID PPID StartTime >> command >> #main process >> 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main: >> background_shared db [local] SELECT >> #1620 threads >> 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> >> Then, 37571 rows!!! of: >> PID TID PPID StartTime >> command >> 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> >> I double checked the query (it had been logged in database log): it run >> with >> 6 worker processes and without any issues on manual run. >> >> Related db configuration: >> max_connections = 1000 >> max_worker_processes = 128 # (change requires restart) >> max_parallel_workers_per_gather = 16 # limited by max_parallel_workers >> max_parallel_workers = 64 >> io_method = io_uring # worker, io_uring, sync >> io_max_concurrency = -1 # Max number of IOs that one process >> jit = on (usual suspect in case of weird things going on) >> >> Given that situation happens like 1-10 times per hour (and lead for short >> LA >> spikes up to 10000) - it's seriously affect the database replica >> performance. >> >> No external/non-standard/C extensions except of pgq and postgis loaded >> into >> the database. >> >> I can look for any additional information and perform any local research >> but currently I'm out of ideas what my next steps should be. >> >> PS: it's seems that the issue could be triggered by different queries, but >> not the one particular > > Update: issue had been triggered by unconstrained spawn of helper threads for io_method=io_uring (thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan). Switching to the io_method=worker fixed problem. Seems io_uring have some unexpected issues with unconstrained threads spawn. -- Maxim Boguk Senior Postgresql DBA Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. 2026-06-02 18:36 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. PG Bug reporting form <[email protected]> 2026-06-02 18:51 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Maxim Boguk <[email protected]> 2026-06-22 20:22 ` Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. Maxim Boguk <[email protected]> @ 2026-06-22 20:53 ` Andres Freund <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Andres Freund @ 2026-06-22 20:53 UTC (permalink / raw) To: Maxim Boguk <[email protected]>; +Cc: [email protected] Hi, On 2026-06-22 23:22:53 +0300, Maxim Boguk wrote: > >> I started investigation of this issue after found that process count of > >> postgresql on my replica sometime jump to 200k+ (with max_connections=1000 > >> and real connections under 100 most time). You say upthread: > Given that situation happens like 1-10 times per hour (and lead for short LA > spikes up to 10000) - it's seriously affect the database replica > performance. I don't immediately see how the kernel worker thread creations leads to such an LA and/or replica performance issues. > Update: issue had been triggered by unconstrained spawn of helper threads > for io_method=io_uring > (thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan). Note that these are kernel controlled, not postgres controlled. Newer kernels will typically start fewer worker threads, because more of the IO can be handled "full asynchronously" (i.e. without workers). What is effective_io_concurrency set to? I assume io_max_concurrency = -1 ends up with io_max_concurrency = 64 (the max auto-tuned value)? > Switching to the io_method=worker fixed problem. Did that actually fix any performance issues? > Seems io_uring have some unexpected issues with unconstrained threads spawn. I don't really see how the query you described could trigger that many kernel worker threads. Within each backend that should be bound by io_max_concurrency (or perhaps io_max_concurrency * 2, due to the some kernel partitioning). That's a far cry from 1620. Across a parallel query it should be bound by by io_max_concurrency * #postgres processes involved in query. Assuming an effective_io_concurrency > 64, and io_max_concurrency = 64, your query shouldn't trigger more than 7 * 64 worker threads (due to the number of workers chosen) - clearly not a small number, but not 1620 (even if it were not within one process). And even if all 16 workers were used, that's 17*64, still a lot less. If this is happening with a current kernel, I think you need to make a bugreport to ubuntu. Greetings, Andres Freund ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-06-22 20:53 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-06-02 18:36 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. PG Bug reporting form <[email protected]> 2026-06-02 18:51 ` Maxim Boguk <[email protected]> 2026-06-22 20:22 ` Maxim Boguk <[email protected]> 2026-06-22 20:53 ` Andres Freund <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox