Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wblB9-002ouV-1b for pgsql-bugs@arkaria.postgresql.org; Mon, 22 Jun 2026 20:23:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wblB8-007xp3-0m for pgsql-bugs@arkaria.postgresql.org; Mon, 22 Jun 2026 20:23:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wblB7-007xov-2v for pgsql-bugs@lists.postgresql.org; Mon, 22 Jun 2026 20:23:33 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wblB5-00000001oJ6-2TkC for pgsql-bugs@lists.postgresql.org; Mon, 22 Jun 2026 20:23:33 +0000 Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-69690062350so4499688a12.0 for ; Mon, 22 Jun 2026 13:23:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1782159810; cv=none; d=google.com; s=arc-20240605; b=d328frCHiLxs8xVL2cvydjUUNb5rfLwE7T0M96R/Dm0uRunUQFjQPSj+yEnsIu80Q/ GIMcvAflBJnuGFHYj0z7SLTzDNb+JymvlLNYod9oTyZFLS9ieC4LQBVCNPgXHhVMcrZC s6SrBX9mi5/manTOQZkjEOulgo7dkbJIPLFCfLYe8hNFDSuRJCeRLqggR8Lk9AuhW5Ew oCf/rmoy15qBegQaGbk6n1isScNL0STRKerF6qdWz/BDUkwcWXyLsjdge4MZ3QvIltz2 t+IKbasFJgvaxsAEZj2oXkuI31I9y7+4UDIdByCLv66cAj4x0v7oJU86M0HWxs5X2T5z KgbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=meZ+6ASJpBW9J7f9e0IvS3P9NarPqifpv6NCar1O84w=; fh=yataC2CPNNtsIGal3q4RNqweG0cBxrcIv46PSnYYpY8=; b=aVe7QgK2256PPc9hHN+NYeiLAV7NqSxqOnI2LVwEB/QagELwXN/Q1sHBpwrAkLTI+P 7lF1dJUO0zdfiphFZKMiM8shhf22+TuLBkT7+nlMB30M2qh00K3QhivG7er0edqSuYPA WnJBxvc0lmZMcRHvx1DB0zjh3beplIQcwSNNyDJpWjVVpvkSioMZtP6syqgmRA2yRVG+ KX1Zfxt2G3AW9o8GaBEj8s7C/DJEXThmnBtddgKBIU44SBZAHHvykFrFiGMscK8SJhTn OprCh4tLykZ4z7GcvuFoE3CGVzK5ZHF1+5x/v2tEuY8/zTkp+27lo0O4IeYftnT7AFVq PWEw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1782159810; x=1782764610; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=meZ+6ASJpBW9J7f9e0IvS3P9NarPqifpv6NCar1O84w=; b=GekjANFk1IJhNaFbdC5lDIe5KC+mbLl/MZFA02PZzj36IWsSUWdLPCC40jbum1w2UW +Z2T5qJ1OLjMAA3RDT5MKcmrHOBmx4rArv3WjRj7FbadsJC5NRRPvZ5Ahc0l01PJNRBx BwIkf0CUmSr1uDQc06PZBTPgaZspb93bmCf3tnWFuzv93viqwVq3BrLkg6Ty5Px4B/pt 3rNlFIdjyvTbSVMi+E5CPQaCQTJ2GC2s0N8AjKKyY80F2yzIx/dXf0917S9JdYJoSLqA /PNHv0FdtdCJ85OZfehB/5TdL3ugWEAN+Bh0l7jYcRrIKpCsy4x66pXceUfHumvz68+M 5jhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782159810; x=1782764610; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=meZ+6ASJpBW9J7f9e0IvS3P9NarPqifpv6NCar1O84w=; b=oCOaXZQd8mG8PO3yIMTkRP6TkmbcGfpKDzUo09l3dbj2wEaAxG1pI5KC2WV5CdDAEW jV9vZoIaBIuQ8woFg5PXp3WhNySjNztE08VgX8l5rFmI4YbNTz0sH1KGTWQJfI/zGA+x loxrMSJEUxhvO3skuQgFA2xF/wZajlQwU2E10XA7rxFWjOzBr1Ah3mfXPKdKMjhKhU88 xsJQpZEzvck1akhwDBo4PggfYLhe2cnPJYhAAl/xNPl5tPNYwjBcI4agjsIDABN/cyn9 fxvbp75SYpMF1pCc8oIItGSFbN9Jky7N3e4Yrfk8r4XeVPO1pCSa3jBTq6erxeyDoZVA I//g== X-Forwarded-Encrypted: i=1; AFNElJ+k62zIgiXNjYGVsxHsJk/XjM61sf9tc0ZAbEgMP+dizlKfmJf1WXJQEi/6gtUtPyQNjiv3fFNGNXzJ@lists.postgresql.org X-Gm-Message-State: AOJu0Yx5NeArP4PwcyiQ46K2fZLzqT5qEFVQ8gFT7Y42xAsKkw2SB74z H4KGCN6jCXx8m6p9pIrTbG1Nyaa2CX0b19mPXhOiUG8PUg2Spx/KucLW//xqs7KFe+LvG2m8x4E UQtqGF4wlO4U6CCTRxnyAklriuAU8Snc= X-Gm-Gg: AfdE7cnLG37LSRDlkHIgAsZlfgRbk3AduLCGU7FIJQsALGjBbrZqxmBe2FRCVL7zAKR /LZ1WR3Uedh5RqVDLUbtsyOwjZ5g/VOZZVZovmhOZHxNVYFtJgQw57B9nzpz4MriE93e3CtM+x+ nx6Yh1ioeT7Tvs1PNz6uBHr9ISMb/l7i1FO3+XW0IObgjBwwU1pZ/JpaF0xkKs2QF71UTytOGlq 0BHe6QGwTOuImCHS9vpubWiP/AqIwI5DA0nmMTIZoiF6hPnKiLUUBfCSqMZ/kHShXyeiOpUQiZS whiaH75+Xzf+Ehz5xFE82Y5Am6k4pdDYr5j54aXi/gSlASBf35c0EySScn8Q X-Received: by 2002:a05:6402:1ed5:b0:697:d53e:5855 with SMTP id 4fb4d7f45d1cf-697d53e59bdmr233915a12.16.1782159810124; Mon, 22 Jun 2026 13:23:30 -0700 (PDT) MIME-Version: 1.0 References: <19505-7ec445dffe2732d2@postgresql.org> In-Reply-To: From: Maxim Boguk Date: Mon, 22 Jun 2026 23:22:53 +0300 X-Gm-Features: AVVi8CcFQkexcVcM7oDRfD5TDIVty8bIbEP-X8I4X5ix7DKoBh7GHy7rM-xM2m4 Message-ID: Subject: Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. To: maxim.boguk@gmail.com, pgsql-bugs@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000006a68800654dd6af2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000006a68800654dd6af2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jun 2, 2026 at 9:51=E2=80=AFPM Maxim Boguk = wrote: > > > On Tue, Jun 2, 2026 at 9:37=E2=80=AFPM PG Bug reporting form < > noreply@postgresql.org> wrote: > >> The following bug has been logged on the website: >> >> Bug reference: 19505 >> Logged by: Maxim Boguk >> Email address: maxim.boguk@gmail.com >> PostgreSQL version: 18.4 >> Operating system: Ubuntu 24.04.4 LTS >> Description: >> >> I started investigation of this issue after found that process count of >> postgresql on my replica sometime jump to 200k+ (with max_connections=3D= 1000 >> and real connections under 100 most time). >> Somehow single (seems random by always heavy/analytical) query spawn >> thousands of the threads and tens thousands of parallel workers. >> >> After some logging I caught one snapshot (ps -u postgres -L -o >> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes: >> >> [postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l >> 39257 >> >> Main content is: >> PID TID PPID StartTime >> command >> 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main: >> background_shared db [local] SELECT >> >> Then: >> The same PID but 1620 different TIDS. >> PID TID PPID StartTime >> command >> #main process >> 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main: >> background_shared db [local] SELECT >> #1620 threads >> 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> background_shared db [local] SELECT >> >> Then, 37571 rows!!! of: >> PID TID PPID StartTime >> command >> 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main: >> parallel >> worker for PID 2158557 >> >> I double checked the query (it had been logged in database log): it run >> with >> 6 worker processes and without any issues on manual run. >> >> Related db configuration: >> max_connections =3D 1000 >> max_worker_processes =3D 128 # (change requires restart) >> max_parallel_workers_per_gather =3D 16 # limited by max_parallel_work= ers >> max_parallel_workers =3D 64 >> io_method =3D io_uring # worker, io_uring, sync >> io_max_concurrency =3D -1 # Max number of IOs that one process >> jit =3D on (usual suspect in case of weird things going on) >> >> Given that situation happens like 1-10 times per hour (and lead for shor= t >> LA >> spikes up to 10000) - it's seriously affect the database replica >> performance. >> >> No external/non-standard/C extensions except of pgq and postgis loaded >> into >> the database. >> >> I can look for any additional information and perform any local researc= h >> but currently I'm out of ideas what my next steps should be. >> >> PS: it's seems that the issue could be triggered by different queries, b= ut >> not the one particular > > Update: issue had been triggered by unconstrained spawn of helper threads for io_method=3Dio_uring (thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan). Switching to the io_method=3Dworker fixed problem. Seems io_uring have some unexpected issues with unconstrained threads spawn= . --=20 Maxim Boguk Senior Postgresql DBA Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 --0000000000006a68800654dd6af2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Jun 2, 2026 at 9:51=E2=80=AFPM Maxim Boguk <maxim.boguk@gmail.com> wrote:


On Tue, Jun 2, 2026 at 9:37=E2=80=AFPM PG Bug reporting form <noreply@postgresql.org= > wrote:
= The following bug has been logged on the website:

Bug reference:=C2=A0 =C2=A0 =C2=A0 19505
Logged by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Maxim Boguk
Email address:=C2=A0 =C2=A0 =C2=A0 maxim.boguk@gmail.com
PostgreSQL version: 18.4
Operating system:=C2=A0 =C2=A0Ubuntu 24.04.4 LTS
Description:=C2=A0 =C2=A0 =C2=A0 =C2=A0

I started investigation of this issue after found that process count of
postgresql on my replica sometime jump to 200k+ (with max_connections=3D100= 0
and real connections under 100 most time).
Somehow single (seems random by always heavy/analytical) query spawn
thousands of the threads and tens thousands of parallel workers.

After some logging I caught one snapshot (ps -u postgres -L -o
pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:

[postgres@db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
39257

Main content is:
PID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0PPID=C2=A0 StartTime
command
2158552 2158552=C2=A0 948705 Tue Jun=C2=A0 2 17:40:17 2026 postgres: 18/mai= n:
background_shared db [local] SELECT

Then:
The same PID but 1620 different TIDS.
PID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0PPID=C2=A0 StartTime
command
#main process
2158557 2158557=C2=A0 948705 Tue Jun=C2=A0 2 17:40:18 2026 postgres: 18/mai= n:
background_shared db [local] SELECT
#1620 threads
2158557 2158607=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n:
background_shared db [local] SELECT
2158557 2158608=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n:
background_shared db [local] SELECT
2158557 2158609=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n:
background_shared db [local] SELECT

Then, 37571 rows!!! of:
PID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TID=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0PPID=C2=A0 StartTime
command
2158579 2159176=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557
2158579 2159179=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557
2158579 2159183=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557
2158579 2159196=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557
2158579 2159198=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557
2158579 2159202=C2=A0 948705 Tue Jun=C2=A0 2 17:40:20 2026 postgres: 18/mai= n: parallel
worker for PID 2158557

I double checked the query (it had been logged in database log): it run wit= h
6 worker processes and without any issues on manual run.

Related db configuration:
max_connections =3D 1000
max_worker_processes =3D 128=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 # (change requires restart)
max_parallel_workers_per_gather =3D 16=C2=A0 =C2=A0 # limited by max_parall= el_workers
max_parallel_workers =3D 64
io_method =3D io_uring=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # worker, io_uring, sync
io_max_concurrency =3D -1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0# Max number of = IOs that one process
jit =3D on (usual suspect in case of weird things going on)

Given that situation happens like 1-10 times per hour (and lead for short L= A
spikes up to 10000) - it's seriously affect the database replica
performance.

No external/non-standard/C extensions except of pgq and postgis loaded into=
the database.

I can look for any additional information and=C2=A0 perform any local resea= rch
but currently I'm out of ideas what my next steps should be.

PS: it's seems that the issue could be triggered by different queries, = but
not the one particular

=

Update:=C2=A0 issue had been triggered by uncons= trained spawn of helper threads for=C2=A0io_method=3Dio_uring
(thousands/ten thousands of helper=C2=A0"iou-wrk-****" threads=C2=A0per bitmap scan).
<= div style=3D"font-family:monospace,monospace;font-size:large" class=3D"gmai= l_default">Switching to the io_method=3Dworker fixed problem.

Seems io_uring have some unexpected issues with u= nconstrained threads spawn.


--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
= Phone AU: +61=C2=A0 45 218 5678

--0000000000006a68800654dd6af2--