Query Spins

public inbox for [email protected]  
help / color / mirror / Atom feed

Query Spins
3+ messages / 2 participants
[nested] [flat]

* Query Spins
@ 2025-09-01 03:52 Murthy Nunna <[email protected]>
  2025-09-01 07:52 ` Re: Query Spins Laurenz Albe <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Murthy Nunna @ 2025-09-01 03:52 UTC (permalink / raw)
  To: [email protected] <[email protected]>

Hi,

Pg16.10

I have a query which runs fine most of the time. When it runs fine, it spawns parallel workers. In pg_stat_activity, wait_event is blank, state is active and backend_type = "client backend" for the main query. For parallel workers of this query I see wait_event = MessageQueueSend, state is active and backend_type = "parallel worker"

But some times, it has no parallel workers. wait_event is blank, state is active and backend_type =client backend. And it never ends. It takes up lot of CPU. The socket on both server and client server are in ESTABLISHED state (netstat -tulpa | grep <client_port>).

I appreciate any help you can provide.

^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Query Spins
  2025-09-01 03:52 Query Spins Murthy Nunna <[email protected]>
@ 2025-09-01 07:52 ` Laurenz Albe <[email protected]>
  2025-09-05 01:47   ` RE: Query Spins Murthy Nunna <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Laurenz Albe @ 2025-09-01 07:52 UTC (permalink / raw)
  To: Murthy Nunna <[email protected]>; [email protected] <[email protected]>

On Mon, 2025-09-01 at 03:52 +0000, Murthy Nunna wrote:
> Pg16.10
>  
> I have a query which runs fine most of the time. When it runs fine, it spawns
> parallel workers. In pg_stat_activity, wait_event is blank, state is active and
> backend_type = "client backend" for the main query. For parallel workers of
> this query I see wait_event = MessageQueueSend, state is active and
> backend_type = "parallel worker"
>  
> But some times, it has no parallel workers. wait_event is blank, state is active
> and backend_type =client backend. And it never ends. It takes up lot of CPU.
> The socket on both server and client server are in ESTABLISHED state
> (netstat -tulpa | grep <client_port>).

Perhaps no workers are spawned because "max_parallel_workers" has already
been exhausted by other backends.  Check for the number of concurrent parallel
workers next time you get the error.

I cannot know the source of your performance problems, but perhaps it is
a combination of system overload and lack of available parallel worker
processes, which might well go together.

Yours,
Laurenz Albe





^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* RE: Query Spins
  2025-09-01 03:52 Query Spins Murthy Nunna <[email protected]>
  2025-09-01 07:52 ` Re: Query Spins Laurenz Albe <[email protected]>
@ 2025-09-05 01:47   ` Murthy Nunna <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: Murthy Nunna @ 2025-09-05 01:47 UTC (permalink / raw)
  To: [email protected] <[email protected]>

More information....
1) At 5:30pm, the query is fired 5 times from a client
2) 4 of them finished. One stuck (spins - takes CPU)
3) The underlying tables are pretty big and dynamic. But I have been running autovacuum and analyze aggressively at table level.
4) max_parallel_workers and max_worker_processes are both set to 8.

I collected following info just a little after 5:30pm after all 5 queries have fired.

ps -aef | grep postgres | grep worker

postgres 2162504 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162501
postgres 2162505 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162501
postgres 2162506 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162502
postgres 2162507 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162502
postgres 2162508 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162503
postgres 2162509 1802555 99 17:30 ?        00:00:04 postgres: parallel worker for PID 2162503
postgres 2162827 1802555 99 17:30 ?        00:00:03 postgres: parallel worker for PID 2162703

It is almost certain that max_parallel_workers have exhausted. So, what? The remaining query should not get stuck!

The stuck query leader PID is 2162502. The parallel workers of this PID 2162502 are long gone.
ps -aef | grep 2162506 | grep -v grep
ps -aef | grep 2162507 | grep -v grep

Following is from pg_stat_activity:

pid           | 2162502
client_port   | 37264
xact_start    | 2025-09-04 17:30
query_start   | 2025-09-04 17:30
state_change  | 2025-09-04 17:30
wait_event    | 
state         | active
now           | 2025-09-04 20:10
time_runnning | 02:40:00.509246

Interesting part is, temp file... last change timestamp changes but file size doesn't. File pgsql_tmp2162502.1 has been at 242155520 bytes for a long time but the last change time stamp keeps changing.

ls -ltr base/pgsql_tmp/
total 277864
-rw-------. 1 postgres postgres  16359424 Sep  4 17:30 pgsql_tmp2162502.0
-rw-------. 1 postgres postgres 242155520 Sep  4 20:21 pgsql_tmp2162502.1

-----Original Message-----
From: Laurenz Albe <[email protected]> 
Sent: Monday, September 1, 2025 2:52 AM
To: Murthy Nunna <[email protected]>; [email protected]
Subject: Re: Query Spins

[EXTERNAL] – This message is from an external sender

On Mon, 2025-09-01 at 03:52 +0000, Murthy Nunna wrote:
> Pg16.10
>
> I have a query which runs fine most of the time. When it runs fine, it 
> spawns parallel workers. In pg_stat_activity, wait_event is blank, 
> state is active and backend_type = "client backend" for the main 
> query. For parallel workers of this query I see wait_event = 
> MessageQueueSend, state is active and backend_type = "parallel worker"
>
> But some times, it has no parallel workers. wait_event is blank, state 
> is active and backend_type =client backend. And it never ends. It takes up lot of CPU.
> The socket on both server and client server are in ESTABLISHED state 
> (netstat -tulpa | grep <client_port>).

Perhaps no workers are spawned because "max_parallel_workers" has already been exhausted by other backends.  Check for the number of concurrent parallel workers next time you get the error.

I cannot know the source of your performance problems, but perhaps it is a combination of system overload and lack of available parallel worker processes, which might well go together.

Yours,
Laurenz Albe

^ permalink  raw  reply  [nested|flat] 3+ messages in thread

end of thread, other threads:[~2025-09-05 01:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-09-01 03:52 Query Spins Murthy Nunna <[email protected]>
2025-09-01 07:52 ` Laurenz Albe <[email protected]>
2025-09-05 01:47   ` Murthy Nunna <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox