Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uLTVg-009x4e-Kl for pgsql-admin@arkaria.postgresql.org; Sat, 31 May 2025 21:12:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uLTVf-00Fmyg-Ai for pgsql-admin@arkaria.postgresql.org; Sat, 31 May 2025 21:12:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uLTVe-00FmyY-Td for pgsql-admin@lists.postgresql.org; Sat, 31 May 2025 21:12:54 +0000 Received: from cloud.gatewaynet.com ([185.90.37.94]) by magus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uLTVc-0016Lm-0v for pgsql-admin@lists.postgresql.org; Sat, 31 May 2025 21:12:54 +0000 Content-Type: multipart/alternative; boundary="------------rnQjT051q6K8he20ARviwIyC" Message-ID: Date: Sun, 1 Jun 2025 00:12:50 +0300 MIME-Version: 1.0 Subject: Re: PostgreSQL 16.6 , query stuck with STAT Ssl, wait_event_type : IPC , wait_event : ParallelFinish From: Achilleas Mantzios To: pgsql-admin References: Content-Language: en-US In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------rnQjT051q6K8he20ARviwIyC Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 31/5/25 23:43, Achilleas Mantzios wrote: > Hi > > a query is stuck with the above, it seems it waits for parallel worker > to finish, however , there are no parallel works running : > > postgres@[local]/dynacom=# SELECT application_name, backend_type, > backend_start,xact_start,query_start,wait_event_type, wait_event > ,state FROM pg_stat_activity; > application_name |         backend_type         | >         backend_start         |          xact_start           | >          query_start          | wait_event_type |     wait_event >      | state > ------------------+------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+-------- > >                  | autovacuum launcher          | 2024-11-29 > 17:48:50.92935+02  |                               | >                               | Activity        | AutoVacuumMain      | >                  | logical replication launcher | 2024-11-29 > 17:48:50.929496+02 |                               | >                               | Activity        | LogicalLauncherMain | > DBMIRROR         | client backend               | 2025-05-31 > 19:04:16.724305+03 | 2025-05-31 19:05:21.686093+03 | 2025-05-31 > 19:05:21.909936+03 | IPC             | ParallelFinish      | active >                  | client backend               | 2025-05-31 > 23:31:30.030806+03 |                               | 2025-05-31 > 23:35:05.045573+03 | Client          | ClientRead          | idle > psql             | client backend               | 2025-05-31 > 23:29:33.863485+03 | 2025-05-31 23:35:09.322972+03 | 2025-05-31 > 23:35:09.322972+03 |                 |                     | active > RXMLFVSLS        | client backend               | 2025-05-31 > 23:32:37.351131+03 |                               | 2025-05-31 > 23:35:09.295221+03 | Client          | ClientRead          | idle > psql             | client backend               | 2025-04-28 > 16:59:55.968442+03 |                               | 2025-05-27 > 16:43:56.338228+03 | Client          | ClientRead          | idle >                  | background writer            | 2024-11-29 > 17:48:50.916876+02 |                               | >                               | Activity        | BgWriterMain        | >                  | archiver                     | 2024-12-03 > 18:57:36.447067+02 |                               | >                               | Activity        | ArchiverMain        | >                  | checkpointer                 | 2024-11-29 > 17:48:50.916648+02 |                               | >                               | Activity        | CheckpointerMain    | >                  | walwriter                    | 2024-11-29 > 17:48:50.928789+02 |                               | >                               | Activity        | WalWriterMain       | > (11 rows) > > postgres@[local]/dynacom=# > > So, I will terminate this backend now to get the system working again, > we are curious why this happened, our system serves daily 22M+ > transactions, this is Saturday night hence the low traffic. > > postgres@smadb:~$ lsb_release -a > No LSB modules are available. > Distributor ID: Debian > Description:    Debian GNU/Linux 12 (bookworm) > Release:        12 > Codename:       bookworm > postgres@smadb:~$ psql -Aqt -c 'select version()' > PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian > 12.2.0-14) 12.2.0, 64-bit > postgres@smadb:~$ > > Some additional info : 1690535 : the pid in question we found no trace or indication for OOM killer , root@smadb:~# strace -p 1690535 strace: Process 1690535 attached epoll_wait(12, also lsof showed : postgres  1690535 1690649 Sweeper           postgres    0r      CHR                1,3        0t0          4 /dev/null postgres  1690535 1690649 Sweeper           postgres    1w     FIFO               0,14        0t0  213624178 pipe postgres  1690535 1690649 Sweeper           postgres    2w     FIFO               0,14        0t0  213624178 pipe postgres  1690535 1690649 Sweeper           postgres    3u  a_inode               0,15          0       1059 [signalfd] postgres  1690535 1690649 Sweeper           postgres    4r     FIFO               0,14        0t0  213624177 pipe postgres  1690535 1690649 Sweeper           postgres    5u  a_inode               0,15          0       1059 [eventpoll:3,4,11] postgres  1690535 1690649 Sweeper           postgres    6u      REG               8,32       8192  157352475 /raid4/pgsql/data/PG_16_202 307071/207491653/2601 postgres  1690535 1690649 Sweeper           postgres    7u      REG               8,32     450560  157352631 /raid4/pgsql/data/PG_16_202 307071/207491653/207493206 postgres  1690535 1690649 Sweeper           postgres    8u      REG               8,32      40960  157356847 /raid4/pgsql/data/PG_16_202 307071/207491653/207503536 postgres  1690535 1690649 Sweeper           postgres    9u      REG               8,32      40960  157356848 /raid4/pgsql/data/PG_16_202 307071/207491653/207503538 postgres  1690535 1690649 Sweeper           postgres   10u      REG               8,32      40960  157357848 /raid4/pgsql/data/PG_16_202 307071/207491653/207504627 postgres  1690535 1690649 Sweeper           postgres   11u     IPv4         1241927029        0t0        TCP smadb.internal.net:postgres ql->sma.internal.net:42615 (ESTABLISHED) postgres  1690535 1690649 Sweeper           postgres   12u  a_inode               0,15          0       1059 [eventpoll:3,4] So , as far as we understand, it waited for an inode ? I tried to pg_terminate_backend(1690535); it did nothing to the process, then pg_cancel_backend(1690535); I went to the shell and did a normal kill 1690535 it did nothing, and before i kill -9 it, I just tried kill -HUP, and it woke up with 10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST DBMIRROR postgres@dynacom line:4 FATAL:  terminating connection due to administrator command 10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST DBMIRROR postgres@dynacom line:5 STATEMENT:  SELECT pd.XID,MAX(SeqId) FROM dbmirror_Pending pd LEFT JOIN dbmirror_MirroredTransaction mt INNER JOIN dbmirror_MirrorHost mh ON mt.MirrorHostId =  mh.MirrorHostId AND mh.HostName= '192.168.211.1'  ON pd.XID = mt.XID WHERE mt.XID is null and (pd.slaveid is null or pd.slaveid = '579')  GROUP BY pd.XID  ORDER BY MAX(pd.SeqId) 10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.234 EEST DBMIRROR postgres@dynacom line:6 LOG:  disconnection: session time: 4:49:31.510 user=postgr es database=dynacom host=10.9.0.10 port=42615 --------------rnQjT051q6K8he20ARviwIyC Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

On 31/5/25 23:43, Achilleas Mantzios wrote:

Hi

a query is stuck with the above, it seems it waits for parallel worker to finish, however , there are no parallel works running :

postgres@[local]/dynacom=# SELECT application_name, backend_type, backend_start,xact_start,query_start,wait_event_type, wait_event ,state FROM pg_stat_activity;
application_name |         backend_type         |         backend_start         |          xact_start           |          query_start          | wait_event_type |     wait_event      | state   
------------------+------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------
                 | autovacuum launcher          | 2024-11-29 17:48:50.92935+02  |                               |                               | Activity        | AutoVacuumMain      |  
                 | logical replication launcher | 2024-11-29 17:48:50.929496+02 |                               |                               | Activity        | LogicalLauncherMain |  
DBMIRROR         | client backend               | 2025-05-31 19:04:16.724305+03 | 2025-05-31 19:05:21.686093+03 | 2025-05-31 19:05:21.909936+03 | IPC             | ParallelFinish      | active
                 | client backend               | 2025-05-31 23:31:30.030806+03 |                               | 2025-05-31 23:35:05.045573+03 | Client          | ClientRead          | idle
psql             | client backend               | 2025-05-31 23:29:33.863485+03 | 2025-05-31 23:35:09.322972+03 | 2025-05-31 23:35:09.322972+03 |                 |                     | active
RXMLFVSLS        | client backend               | 2025-05-31 23:32:37.351131+03 |                               | 2025-05-31 23:35:09.295221+03 | Client          | ClientRead          | idle
psql             | client backend               | 2025-04-28 16:59:55.968442+03 |                               | 2025-05-27 16:43:56.338228+03 | Client          | ClientRead          | idle
                 | background writer            | 2024-11-29 17:48:50.916876+02 |                               |                               | Activity        | BgWriterMain        |  
                 | archiver                     | 2024-12-03 18:57:36.447067+02 |                               |                               | Activity        | ArchiverMain        |  
                 | checkpointer                 | 2024-11-29 17:48:50.916648+02 |                               |                               | Activity        | CheckpointerMain    |  
                 | walwriter                    | 2024-11-29 17:48:50.928789+02 |                               |                               | Activity        | WalWriterMain       |  
(11 rows)

postgres@[local]/dynacom=#

So, I will terminate this backend now to get the system working again, we are curious why this happened, our system serves daily 22M+ transactions, this is Saturday night hence the low traffic.

postgres@smadb:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm
postgres@smadb:~$ psql -Aqt -c 'select version()'
PostgreSQL 16.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
postgres@smadb:~$


Some additional info :

1690535 : the pid in question

we found no trace or indication for OOM killer ,

root@smadb:~# strace -p 1690535      
strace: Process 1690535 attached

epoll_wait(12,

also lsof showed :

postgres  1690535 1690649 Sweeper           postgres    0r      CHR                1,3        0t0          4 /dev/null
postgres  1690535 1690649 Sweeper           postgres    1w     FIFO               0,14        0t0  213624178 pipe
postgres  1690535 1690649 Sweeper           postgres    2w     FIFO               0,14        0t0  213624178 pipe
postgres  1690535 1690649 Sweeper           postgres    3u  a_inode               0,15          0       1059 [signalfd]
postgres  1690535 1690649 Sweeper           postgres    4r     FIFO               0,14        0t0  213624177 pipe
postgres  1690535 1690649 Sweeper           postgres    5u  a_inode               0,15          0       1059 [eventpoll:3,4,11]
postgres  1690535 1690649 Sweeper           postgres    6u      REG               8,32       8192  157352475 /raid4/pgsql/data/PG_16_202
307071/207491653/2601
postgres  1690535 1690649 Sweeper           postgres    7u      REG               8,32     450560  157352631 /raid4/pgsql/data/PG_16_202
307071/207491653/207493206
postgres  1690535 1690649 Sweeper           postgres    8u      REG               8,32      40960  157356847 /raid4/pgsql/data/PG_16_202
307071/207491653/207503536
postgres  1690535 1690649 Sweeper           postgres    9u      REG               8,32      40960  157356848 /raid4/pgsql/data/PG_16_202
307071/207491653/207503538
postgres  1690535 1690649 Sweeper           postgres   10u      REG               8,32      40960  157357848 /raid4/pgsql/data/PG_16_202
307071/207491653/207504627
postgres  1690535 1690649 Sweeper           postgres   11u     IPv4         1241927029        0t0        TCP smadb.internal.net:postgres
ql->sma.internal.net:42615 (ESTABLISHED)

postgres  1690535 1690649 Sweeper           postgres   12u  a_inode               0,15          0       1059 [eventpoll:3,4]

So , as far as we understand, it waited for an inode ?

I tried to pg_terminate_backend(1690535); it did nothing to the process, then pg_cancel_backend(1690535);

I went to the shell and did a normal 

kill 1690535

it did nothing, and before i kill -9 it, I just tried kill -HUP, and it woke up with

10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST DBMIRROR postgres@dynacom line:4 FATAL:  terminating connection due to administrator command
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.231 EEST DBMIRROR postgres@dynacom line:5 STATEMENT:  SELECT pd.XID,MAX(SeqId) FROM dbmirror_Pending
pd LEFT JOIN dbmirror_MirroredTransaction mt INNER JOIN dbmirror_MirrorHost mh ON mt.MirrorHostId =  mh.MirrorHostId AND mh.HostName= '192.168.211.1'  ON pd.XID =
mt.XID WHERE mt.XID is null and (pd.slaveid is null or pd.slaveid = '579')  GROUP BY pd.XID  ORDER BY MAX(pd.SeqId)
10.9.0.10(42615) [1690535] 683b2880.19cba7 2025-05-31 23:53:48.234 EEST DBMIRROR postgres@dynacom line:6 LOG:  disconnection: session time: 4:49:31.510 user=postgr
es database=dynacom host=10.9.0.10 port=42615


--------------rnQjT051q6K8he20ARviwIyC--