Re: Streaming replica hangs periodically for ~ 1 second

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
27+ messages / 7 participants
[nested] [flat]

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-20 15:14  Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-20 15:14 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/20/25 04:32, hubert depesz lubaczewski wrote:
> On Tue, Aug 19, 2025 at 11:39:03AM -0700, Adrian Klaver wrote:
>>> Every now and then (usually every 3-5 minutes, but not through the whole
>>> day), we see situations where every query suddently takes ~ 1 second.
>> Given the subject line, what you are reporting is happening on the replica,
>> correct?
> 
> Yes.
> 
>> If so where is the replica relative to the primary, in terms of network
>> distance?
> 
> =$ ping -c 10 primary
> reports:
> 10 packets transmitted, 10 received, 0% packet loss, time 9181ms
> rtt min/avg/max/mdev = 0.942/0.956/0.991/0.012 ms
> 
>> Also what are the 'hardware' specifications on the replica instance?
> 
> c8g.48xlarge ec2 instance. It is arm64, 192 cores, with 384 gb of ram.
> 
> As for storage, this is relatitvely slow, because this db is rather
> small:
> gp3 500gb volume, with 6000 iops. At no point is IO in any way close to
> limits, the whole db fits easily in ram.

Hmm.

 From initial post:

"For ~ 1 second there are no logs going to log (we usually have at 5-20
messages logged per second), no connection, nothing. And then we get
bunch (30+) messages with the same milisecond time."

Are the 30+ messages all coming in on one connection or multiple 
connections?

Also to be clear these are statements that are being run on the replica 
locally, correct?

Does the AWS monitoring indicate any issues?

> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-20 16:08  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 3 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-20 16:08 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
> Hmm.
> 
> From initial post:
> 
> "For ~ 1 second there are no logs going to log (we usually have at 5-20
> messages logged per second), no connection, nothing. And then we get
> bunch (30+) messages with the same milisecond time."
> 
> Are the 30+ messages all coming in on one connection or multiple
> connections?

Multiple connections.

> Also to be clear these are statements that are being run on the replica
> locally, correct?

What do you mean locally?

Application servers are *not* on the same server as the db. So no, they
are not local.

> Does the AWS monitoring indicate any issues?

Nope. All looks "fine".

Same for our monitoring  - CPU usage, iops, load.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-20 17:24  Rob Sargent <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  2 siblings, 0 replies; 27+ messages in thread

From: Rob Sargent @ 2025-08-20 17:24 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>



> On Aug 20, 2025, at 10:08 AM, hubert depesz lubaczewski <[email protected]> wrote:
> 
> On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
>> Hmm.
>> 
>> From initial post:
>> 
>> "For ~ 1 second there are no logs going to log (we usually have at 5-20
>> messages logged per second), no connection, nothing. And then we get
>> bunch (30+) messages with the same milisecond time."

Is that logger ms or db server ms?  The latter seems unlikely to me. 







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-20 17:30  Thom Brown <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  2 siblings, 1 reply; 27+ messages in thread

From: Thom Brown @ 2025-08-20 17:30 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>

On Wed, 20 Aug 2025 at 17:08, hubert depesz lubaczewski
<[email protected]> wrote:
>
> On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
> > Hmm.
> >
> > From initial post:
> >
> > "For ~ 1 second there are no logs going to log (we usually have at 5-20
> > messages logged per second), no connection, nothing. And then we get
> > bunch (30+) messages with the same milisecond time."
> >
> > Are the 30+ messages all coming in on one connection or multiple
> > connections?
>
> Multiple connections.
>
> > Also to be clear these are statements that are being run on the replica
> > locally, correct?
>
> What do you mean locally?
>
> Application servers are *not* on the same server as the db. So no, they
> are not local.
>
> > Does the AWS monitoring indicate any issues?
>
> Nope. All looks "fine".
>
> Same for our monitoring  - CPU usage, iops, load.
>
> Best regards,
>
> depesz

Do you have THP enabled? Can you use mpstat and see what %steal shows as?

Thom






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-20 17:45  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  2 siblings, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-20 17:45 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/20/25 09:08, hubert depesz lubaczewski wrote:
> On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
>> Hmm.
>>
>>  From initial post:
>>
>> "For ~ 1 second there are no logs going to log (we usually have at 5-20
>> messages logged per second), no connection, nothing. And then we get
>> bunch (30+) messages with the same milisecond time."
>>
>> Are the 30+ messages all coming in on one connection or multiple
>> connections?
> 
> Multiple connections.
> 
>> Also to be clear these are statements that are being run on the replica
>> locally, correct?
> 
> What do you mean locally?

I should have been clearer. Are the queries being run against the 
replica or the primary?

> 
> Application servers are *not* on the same server as the db. So no, they
> are not local.

Where are the application servers relative to the replica server?

How many applications servers are hitting the database?

> 
>> Does the AWS monitoring indicate any issues?
> 
> Nope. All looks "fine".


> 
> Same for our monitoring  - CPU usage, iops, load.
> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 10:03  hubert depesz lubaczewski <[email protected]>
  parent: Thom Brown <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-21 10:03 UTC (permalink / raw)
  To: Thom Brown <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>

On Wed, Aug 20, 2025 at 06:30:00PM +0100, Thom Brown wrote:
> Do you have THP enabled? Can you use mpstat and see what %steal shows as?

Sorry, what is "THP"? I tried searching for "what is thp", and most
common search results are related to some chemical compound.

mpstat, yes, we have it. Let me quickly show what I get:

Two days ago, at ~ 10:10pm UTC I saw this in Pg logs:

# grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_220000.csv | uniq -c | grep -C3 -P '^\s*\d\d'
      2 2025-08-19 22:09:29.084 UTC
      1 2025-08-19 22:09:29.094 UTC
      2 2025-08-19 22:09:29.097 UTC
     70 2025-08-19 22:09:29.109 UTC
     90 2025-08-19 22:09:29.110 UTC
      6 2025-08-19 22:09:29.111 UTC
      1 2025-08-19 22:09:29.153 UTC
      1 2025-08-19 22:09:29.555 UTC

As you can see we have 70, and then 90 messages all logged with the same
timestamp. All of them (160) were "duration:x" lines.

At the time, mpstat (called using `mpstat 5`) looked like this:

             CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
22:09:04     all    2.25    0.00    0.27    0.24    0.00    0.10    0.00    0.00    0.00   97.14
22:09:09     all    2.70    0.00    0.28    0.27    0.00    0.12    0.00    0.00    0.00   96.63
22:09:14     all    2.83    0.00    0.63    0.23    0.00    0.09    0.00    0.00    0.00   96.22
22:09:19     all    2.66    0.00    0.28    0.23    0.00    0.10    0.00    0.00    0.00   96.74
22:09:24     all    2.36    0.00    0.27    0.21    0.00    0.10    0.00    0.00    0.00   97.06
22:09:29     all    2.49    0.00    0.40    0.40    0.00    0.11    0.00    0.00    0.00   96.60
22:09:34     all    2.65    0.00    0.28    0.23    0.00    0.11    0.00    0.00    0.00   96.73
22:09:39     all    2.26    0.00    0.26    0.21    0.00    0.09    0.00    0.00    0.00   97.19
22:09:44     all    2.31    0.00    0.25    0.22    0.00    0.09    0.00    0.00    0.00   97.13
22:09:49     all    2.44    0.00    0.22    0.24    0.00    0.08    0.00    0.00    0.00   97.02
22:09:54     all    2.23    0.00    0.23    0.28    0.00    0.10    0.00    0.00    0.00   97.16
22:09:59     all    2.48    0.00    0.26    0.30    0.00    0.11    0.00    0.00    0.00   96.85
22:10:04     all    2.29    0.00    0.28    0.26    0.00    0.12    0.00    0.00    0.00   97.06
22:10:09     all    2.31    0.00    0.22    0.21    0.00    0.11    0.00    0.00    0.00   97.14
22:10:14     all    2.38    0.00    0.25    0.21    0.00    0.10    0.00    0.00    0.00   97.05
22:10:19     all    2.54    0.00    0.24    0.20    0.00    0.11    0.00    0.00    0.00   96.91
22:10:24     all    2.26    0.00    0.25    0.22    0.00    0.10    0.00    0.00    0.00   97.17
22:10:29     all    2.12    0.00    0.27    0.24    0.00    0.09    0.00    0.00    0.00   97.27
22:10:34     all    2.56    0.00    0.26    0.26    0.00    0.12    0.00    0.00    0.00   96.80
22:10:39     all    2.16    0.00    0.23    0.24    0.00    0.10    0.00    0.00    0.00   97.27
22:10:44     all    2.37    0.00    0.26    0.22    0.00    0.10    0.00    0.00    0.00   97.05
22:10:49     all    2.25    0.00    0.25    0.61    0.00    0.09    0.00    0.00    0.00   96.80
22:10:54     all    2.41    0.00    0.28    0.22    0.00    0.10    0.00    0.00    0.00   96.99
22:10:59     all    2.83    0.00    0.29    0.19    0.00    0.12    0.00    0.00    0.00   96.57

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 10:07  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-21 10:07 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Wed, Aug 20, 2025 at 10:45:13AM -0700, Adrian Klaver wrote:
> On 8/20/25 09:08, hubert depesz lubaczewski wrote:
> > On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
> > > Hmm.
> > > 
> > >  From initial post:
> > > 
> > > "For ~ 1 second there are no logs going to log (we usually have at 5-20
> > > messages logged per second), no connection, nothing. And then we get
> > > bunch (30+) messages with the same milisecond time."
> > > Are the 30+ messages all coming in on one connection or multiple
> > > connections?
> > Multiple connections.
> > > Also to be clear these are statements that are being run on the replica
> > > locally, correct?
> > What do you mean locally?
> I should have been clearer. Are the queries being run against the replica or
> the primary?

All to replica. Primary has its own work, of course, but the problem
we're experiencing is on replicas.

> > Application servers are *not* on the same server as the db. So no, they
> > are not local.
> Where are the application servers relative to the replica server?

Well, there is a lot of them. All are in AWS EC2, same region, various
availability zones. Generally we use 3 AZs, with more or less equal
split, so I'd say ~ 33% of app servers is within the same AZ.

> How many applications servers are hitting the database?

To be honest, I'm not sure. I have visibility into dbs, and bouncers,
not really into Apps. I know that these are automatically dynamically
scaled, so number of app server is very varying.

I'd say anything from 40 to 200 app servers hit first layer of bouncers,
which we usually have 6-9 (2-3 per az).

These go to 2nd layer of bouncers, on the db server itself.

Best regards,

depesz

^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 11:41  Thom Brown <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: Thom Brown @ 2025-08-21 11:41 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>

On Thu, 21 Aug 2025 at 11:03, hubert depesz lubaczewski
<[email protected]> wrote:
>
> On Wed, Aug 20, 2025 at 06:30:00PM +0100, Thom Brown wrote:
> > Do you have THP enabled? Can you use mpstat and see what %steal shows as?
>
> Sorry, what is "THP"? I tried searching for "what is thp", and most
> common search results are related to some chemical compound.

Ah, yeah I meant transparent hugepage:

cat /sys/kernel/mm/transparent_hugepage/enabled

This should show it being set as "never".

>
> mpstat, yes, we have it. Let me quickly show what I get:
>
> Two days ago, at ~ 10:10pm UTC I saw this in Pg logs:
>
> # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_220000.csv | uniq -c | grep -C3 -P '^\s*\d\d'
>       2 2025-08-19 22:09:29.084 UTC
>       1 2025-08-19 22:09:29.094 UTC
>       2 2025-08-19 22:09:29.097 UTC
>      70 2025-08-19 22:09:29.109 UTC
>      90 2025-08-19 22:09:29.110 UTC
>       6 2025-08-19 22:09:29.111 UTC
>       1 2025-08-19 22:09:29.153 UTC
>       1 2025-08-19 22:09:29.555 UTC
>
> As you can see we have 70, and then 90 messages all logged with the same
> timestamp. All of them (160) were "duration:x" lines.
>
> At the time, mpstat (called using `mpstat 5`) looked like this:
>
>              CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
> 22:09:04     all    2.25    0.00    0.27    0.24    0.00    0.10    0.00    0.00    0.00   97.14
> 22:09:09     all    2.70    0.00    0.28    0.27    0.00    0.12    0.00    0.00    0.00   96.63
> 22:09:14     all    2.83    0.00    0.63    0.23    0.00    0.09    0.00    0.00    0.00   96.22
> 22:09:19     all    2.66    0.00    0.28    0.23    0.00    0.10    0.00    0.00    0.00   96.74
> 22:09:24     all    2.36    0.00    0.27    0.21    0.00    0.10    0.00    0.00    0.00   97.06
> 22:09:29     all    2.49    0.00    0.40    0.40    0.00    0.11    0.00    0.00    0.00   96.60
> 22:09:34     all    2.65    0.00    0.28    0.23    0.00    0.11    0.00    0.00    0.00   96.73
> 22:09:39     all    2.26    0.00    0.26    0.21    0.00    0.09    0.00    0.00    0.00   97.19
> 22:09:44     all    2.31    0.00    0.25    0.22    0.00    0.09    0.00    0.00    0.00   97.13
> 22:09:49     all    2.44    0.00    0.22    0.24    0.00    0.08    0.00    0.00    0.00   97.02
> 22:09:54     all    2.23    0.00    0.23    0.28    0.00    0.10    0.00    0.00    0.00   97.16
> 22:09:59     all    2.48    0.00    0.26    0.30    0.00    0.11    0.00    0.00    0.00   96.85
> 22:10:04     all    2.29    0.00    0.28    0.26    0.00    0.12    0.00    0.00    0.00   97.06
> 22:10:09     all    2.31    0.00    0.22    0.21    0.00    0.11    0.00    0.00    0.00   97.14
> 22:10:14     all    2.38    0.00    0.25    0.21    0.00    0.10    0.00    0.00    0.00   97.05
> 22:10:19     all    2.54    0.00    0.24    0.20    0.00    0.11    0.00    0.00    0.00   96.91
> 22:10:24     all    2.26    0.00    0.25    0.22    0.00    0.10    0.00    0.00    0.00   97.17
> 22:10:29     all    2.12    0.00    0.27    0.24    0.00    0.09    0.00    0.00    0.00   97.27
> 22:10:34     all    2.56    0.00    0.26    0.26    0.00    0.12    0.00    0.00    0.00   96.80
> 22:10:39     all    2.16    0.00    0.23    0.24    0.00    0.10    0.00    0.00    0.00   97.27
> 22:10:44     all    2.37    0.00    0.26    0.22    0.00    0.10    0.00    0.00    0.00   97.05
> 22:10:49     all    2.25    0.00    0.25    0.61    0.00    0.09    0.00    0.00    0.00   96.80
> 22:10:54     all    2.41    0.00    0.28    0.22    0.00    0.10    0.00    0.00    0.00   96.99
> 22:10:59     all    2.83    0.00    0.29    0.19    0.00    0.12    0.00    0.00    0.00   96.57

This output looks fine, so it doesn't show anything concerning, so
suggests the issue is somehow on the Postgres side.

Did you happen to poll pg_stat_activity at the time to see whether you
had lots of IPC waits? I'm wondering whether the storage layer is
freezing up for a moment.

Thom






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 12:03  hubert depesz lubaczewski <[email protected]>
  parent: Thom Brown <[email protected]>
  0 siblings, 0 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-21 12:03 UTC (permalink / raw)
  To: Thom Brown <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>

On Thu, Aug 21, 2025 at 12:41:44PM +0100, Thom Brown wrote:
> Ah, yeah I meant transparent hugepage:
> cat /sys/kernel/mm/transparent_hugepage/enabled
> This should show it being set as "never".

Ah. Sorry, couldn't decipher. Yes, it's "never".

> > # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_220000.csv | uniq -c | grep -C3 -P '^\s*\d\d'
> >       2 2025-08-19 22:09:29.084 UTC
> >       1 2025-08-19 22:09:29.094 UTC
> >       2 2025-08-19 22:09:29.097 UTC
> >      70 2025-08-19 22:09:29.109 UTC
> >      90 2025-08-19 22:09:29.110 UTC
> >       6 2025-08-19 22:09:29.111 UTC
> >       1 2025-08-19 22:09:29.153 UTC
> >       1 2025-08-19 22:09:29.555 UTC
…
> > 22:10:54     all    2.41    0.00    0.28    0.22    0.00    0.10    0.00    0.00    0.00   96.99
> > 22:10:59     all    2.83    0.00    0.29    0.19    0.00    0.12    0.00    0.00    0.00   96.57
> 
> This output looks fine, so it doesn't show anything concerning, so
> suggests the issue is somehow on the Postgres side.
> 
> Did you happen to poll pg_stat_activity at the time to see whether you
> had lots of IPC waits? I'm wondering whether the storage layer is
> freezing up for a moment.

So, we get select * from pg_stat_activity, for client backends that are
not idle, every 29 seconds.
So, 1 second "freeze" is impossible to cathc.

Plus - I suspect that if I ran select * from pg_stat_activity while "in
freeze", it would also get frozen.

Anyway, I have data from 22:09:22 and 22:09:51. In both cases only
4 non-idle backend.

6 of them had NULL in wait_event*

one was Client/ClientRead and one was IPC/BgWorkerShutdown.

State_change for the IPC/BgWorkerShutdown backend was 2025-08-19
22:09:51.79504+00 so it was well past the moment when the problem
struck.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 15:04  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-21 15:04 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/21/25 03:07, hubert depesz lubaczewski wrote:
> On Wed, Aug 20, 2025 at 10:45:13AM -0700, Adrian Klaver wrote:
>> On 8/20/25 09:08, hubert depesz lubaczewski wrote:
>>> On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote:
>>>> Hmm.
>>>>
>>>>   From initial post:
>>>>
>>>> "For ~ 1 second there are no logs going to log (we usually have at 5-20
>>>> messages logged per second), no connection, nothing. And then we get
>>>> bunch (30+) messages with the same milisecond time."
>>>> Are the 30+ messages all coming in on one connection or multiple
>>>> connections?
>>> Multiple connections.
>>>> Also to be clear these are statements that are being run on the replica
>>>> locally, correct?
>>> What do you mean locally?
>> I should have been clearer. Are the queries being run against the replica or
>> the primary?
> 
> All to replica. Primary has its own work, of course, but the problem
> we're experiencing is on replicas.

If I am following there is more then one primary --> replica pair and 
the problem exists across all the pairs.

>> How many applications servers are hitting the database?
> 
> To be honest, I'm not sure. I have visibility into dbs, and bouncers,
> not really into Apps. I know that these are automatically dynamically
> scaled, so number of app server is very varying.
> 
> I'd say anything from 40 to 200 app servers hit first layer of bouncers,
> which we usually have 6-9 (2-3 per az).
> 
> These go to 2nd layer of bouncers, on the db server itself.

By bouncer I assume you mean something like pgBouncer, a connection pooler.

Is it possible to determine what bouncer the queries in question are 
coming from?

> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 15:13  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-21 15:13 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Thu, Aug 21, 2025 at 08:04:25AM -0700, Adrian Klaver wrote:
> > > > > "For ~ 1 second there are no logs going to log (we usually have at 5-20
> > > > > messages logged per second), no connection, nothing. And then we get
> > > > > bunch (30+) messages with the same milisecond time."
> > > > > Are the 30+ messages all coming in on one connection or multiple
> > > > > connections?
> > > > Multiple connections.
> > > > > Also to be clear these are statements that are being run on the replica
> > > > > locally, correct?
> > > > What do you mean locally?
> > > I should have been clearer. Are the queries being run against the replica or
> > > the primary?
> > All to replica. Primary has its own work, of course, but the problem
> > we're experiencing is on replicas.
> 
> If I am following there is more then one primary --> replica pair and the
> problem exists across all the pairs.

Not all. We have ~ 300 such clusters. The thing doesn't cause any
customer-visible issues (after all it's just 1 second delay every so
often), so it's generally overlooked when it happens.

But we were paying closer attention to one such cluster, and then couple
of other, and we've seen this behavior.

> > > How many applications servers are hitting the database?
> > 
> > To be honest, I'm not sure. I have visibility into dbs, and bouncers,
> > not really into Apps. I know that these are automatically dynamically
> > scaled, so number of app server is very varying.
> > 
> > I'd say anything from 40 to 200 app servers hit first layer of bouncers,
> > which we usually have 6-9 (2-3 per az).
> > 
> > These go to 2nd layer of bouncers, on the db server itself.
> 
> By bouncer I assume you mean something like pgBouncer, a connection pooler.
> Is it possible to determine what bouncer the queries in question are coming
> from?

From the POV of db, all queries are coming from one of N localhost
bouncers. N is usually 2…6.
From the POV of the local bouncers, the queries come from range of
remote bouncers.

Generally we haven't seen any correlation between queries coming from
specific ranges of ips. Logged queries, the ones that we see with
runtime of 1s, have comments that indicate source, and they some from
"all-around". Specifically, "DISCARD ALL" queries are generated by
bouncers themselves (both layers).

Just so that it will be clear, I don't expect anyone to be able to
diagnose the problem based on description. I'm looking more into idea
what to look for. The issue is that with the situation being pretty
short, and happening on servers with non-trivial query load, I can't do
stuff, like, for example, strace, stuff.

Best regards,

depesz

^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 15:59  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-21 15:59 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/21/25 08:13, hubert depesz lubaczewski wrote:
> On Thu, Aug 21, 2025 at 08:04:25AM -0700, Adrian Klaver wrote:

>> By bouncer I assume you mean something like pgBouncer, a connection pooler.
>> Is it possible to determine what bouncer the queries in question are coming
>> from?
> 
>  From the POV of db, all queries are coming from one of N localhost
> bouncers. N is usually 2…6.
>  From the POV of the local bouncers, the queries come from range of
> remote bouncers.
> 
> Generally we haven't seen any correlation between queries coming from
> specific ranges of ips. Logged queries, the ones that we see with
> runtime of 1s, have comments that indicate source, and they some from
> "all-around". Specifically, "DISCARD ALL" queries are generated by
> bouncers themselves (both layers).

Well so much for that theory:)

> 
> Just so that it will be clear, I don't expect anyone to be able to
> diagnose the problem based on description. I'm looking more into idea
> what to look for. The issue is that with the situation being pretty
> short, and happening on servers with non-trivial query load, I can't do
> stuff, like, for example, strace, stuff.

Getting to the bottom of the bag of ideas:

Have you looked at the OS system log for the time period involved?

You mentioned this seemed to involve PREPARE and DISCARD ALL.
Is this the same set of statements or is it all over the place?

Also it would be helpful to know what bouncer you are actually using and 
what mode you are running in?


> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 16:51  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-21 16:51 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote:
> Getting to the bottom of the bag of ideas:
> Have you looked at the OS system log for the time period involved?

Yes. Mostly dmesg. Nothing interesting logged around the time.

> You mentioned this seemed to involve PREPARE and DISCARD ALL.
> Is this the same set of statements or is it all over the place?

No. From what I can tell it's random sample.

> Also it would be helpful to know what bouncer you are actually using and
> what mode you are running in?

pgBouncer, version 1.23.1. As for more... mostly transaction pooling.
Applications go using transaction pooling, but people (dbas, ops) have
session pooling.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 18:17  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  0 siblings, 2 replies; 27+ messages in thread

From: Adrian Klaver @ 2025-08-21 18:17 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/21/25 09:51, hubert depesz lubaczewski wrote:
> On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote:
>> Getting to the bottom of the bag of ideas:
>> Have you looked at the OS system log for the time period involved?
> 
> Yes. Mostly dmesg. Nothing interesting logged around the time.
> 
>> You mentioned this seemed to involve PREPARE and DISCARD ALL.
>> Is this the same set of statements or is it all over the place?
> 
> No. From what I can tell it's random sample.
> 
>> Also it would be helpful to know what bouncer you are actually using and
>> what mode you are running in?
> 
> pgBouncer, version 1.23.1. As for more... mostly transaction pooling.
> Applications go using transaction pooling, but people (dbas, ops) have
> session pooling.

Have you looked at?:

https://www.pgbouncer.org/changelog.html#pgbouncer-124x

To see if anything stands out.

Then there is:

https://www.pgbouncer.org/config.html#max_prepared_statements

The below may also be worth looking at:

https://github.com/pgbouncer/pgbouncer/pull/1144

I can't help thinking that there is a caching issue at stake, though 
that is just a guess.


> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-21 18:38  Chris Wilson <[email protected]>
  parent: Adrian Klaver <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: Chris Wilson @ 2025-08-21 18:38 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: [email protected]; PostgreSQL General <[email protected]>

If all your queries are coming through pgBouncer, and only those hang (the
server itself responds if you connect directly to it), then it might be
this pgBouncer issue:

https://github.com/pgbouncer/pgbouncer/issues/1054

Although that issue is now "closed", because the invisible "debug" log
message was upgraded to a warning (and I don't think that change is in any
released version), the underlying problem still exists: pgbouncer hangs
completely (stops forwarding packets) for a while if the PAM
authentication queue becomes full.

If you have a relatively slow PAM service (such as pam_ldap) then you can
trigger it by opening ~100 connections to pgBouncer simultaneously (without
waiting for previous ones to authenticate), something like this:

for i in `seq 1 100`; do psql -h pgbouncer -p 6432 -U user db_name -c
"SELECT 1" & done


Thanks, Chris.

On Thu, 21 Aug 2025 at 19:17, Adrian Klaver <[email protected]>
wrote:

> On 8/21/25 09:51, hubert depesz lubaczewski wrote:
> > On Thu, Aug 21, 2025 at 08:59:03AM -0700, Adrian Klaver wrote:
> >> Getting to the bottom of the bag of ideas:
> >> Have you looked at the OS system log for the time period involved?
> >
> > Yes. Mostly dmesg. Nothing interesting logged around the time.
> >
> >> You mentioned this seemed to involve PREPARE and DISCARD ALL.
> >> Is this the same set of statements or is it all over the place?
> >
> > No. From what I can tell it's random sample.
> >
> >> Also it would be helpful to know what bouncer you are actually using and
> >> what mode you are running in?
> >
> > pgBouncer, version 1.23.1. As for more... mostly transaction pooling.
> > Applications go using transaction pooling, but people (dbas, ops) have
> > session pooling.
>
> Have you looked at?:
>
> https://www.pgbouncer.org/changelog.html#pgbouncer-124x
>
> To see if anything stands out.
>
> Then there is:
>
> https://www.pgbouncer.org/config.html#max_prepared_statements
>
> The below may also be worth looking at:
>
> https://github.com/pgbouncer/pgbouncer/pull/1144
>
> I can't help thinking that there is a caching issue at stake, though
> that is just a guess.
>
>
> >
> > Best regards,
> >
> > depesz
> >
>
>
> --
> Adrian Klaver
> [email protected]
>
>
>


^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 12:37  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 12:37 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>

On Thu, Aug 21, 2025 at 11:17:27AM -0700, Adrian Klaver wrote:
> Have you looked at?:
> https://www.pgbouncer.org/changelog.html#pgbouncer-124x
> To see if anything stands out.
> Then there is:
> https://www.pgbouncer.org/config.html#max_prepared_statements
> The below may also be worth looking at:
> https://github.com/pgbouncer/pgbouncer/pull/1144
> I can't help thinking that there is a caching issue at stake, though that is
> just a guess.

Will check, but we don't use prepared statements, so this doesn't seem
to be part-of-the-problem.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 12:39  hubert depesz lubaczewski <[email protected]>
  parent: Chris Wilson <[email protected]>
  0 siblings, 0 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 12:39 UTC (permalink / raw)
  To: Chris Wilson <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>

On Thu, Aug 21, 2025 at 07:38:34PM +0100, Chris Wilson wrote:
> If all your queries are coming through pgBouncer, and only those hang (the
> server itself responds if you connect directly to it), then it might be
> this pgBouncer issue:
> 
> https://github.com/pgbouncer/pgbouncer/issues/1054
> 
> Although that issue is now "closed", because the invisible "debug" log
> message was upgraded to a warning (and I don't think that change is in any
> released version), the underlying problem still exists: pgbouncer hangs
> completely (stops forwarding packets) for a while if the PAM
> authentication queue becomes full.
> 
> If you have a relatively slow PAM service (such as pam_ldap) then you can
> trigger it by opening ~100 connections to pgBouncer simultaneously (without
> waiting for previous ones to authenticate), something like this:
> 
> for i in `seq 1 100`; do psql -h pgbouncer -p 6432 -U user db_name -c
> "SELECT 1" & done

Please note that during the time of problem *NOTHING* seems to be
happening in Pg. There are no messages in logs about anything.

We don't have direct access to db, so there is no sane way to check if
directly sent queries will work.

I can, of course, start shell, and run queries there, but ephemeral
nature of the problem makes it more difficult.

As for authentication - there is no ldap/pam in place. All auth is based
on certs/passowrds-in-file, and we never noticed authentication
slowdowns.

Plus - queries like 'DISCARD ALL' are not ran because someone connected…

Best regards,

depesz








^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 14:33  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  0 siblings, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-22 14:33 UTC (permalink / raw)
  To: [email protected]; +Cc: PostgreSQL General <[email protected]>

On 8/22/25 05:37, hubert depesz lubaczewski wrote:
> On Thu, Aug 21, 2025 at 11:17:27AM -0700, Adrian Klaver wrote:
>> Have you looked at?:
>> https://www.pgbouncer.org/changelog.html#pgbouncer-124x
>> To see if anything stands out.
>> Then there is:
>> https://www.pgbouncer.org/config.html#max_prepared_statements
>> The below may also be worth looking at:
>> https://github.com/pgbouncer/pgbouncer/pull/1144
>> I can't help thinking that there is a caching issue at stake, though that is
>> just a guess.
> 
> Will check, but we don't use prepared statements, so this doesn't seem
> to be part-of-the-problem.

In your OP there is:

"Every now and then (usually every 3-5 minutes, but not through the whole
day), we see situations where every query suddently takes ~ 1 second.


I'm talkign about "prepare" for preparing statements. binds. also
"discard all".
"

Is that referring to something else?

> 
> Best regards,
> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:03  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 2 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 15:03 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On Fri, Aug 22, 2025 at 07:33:03AM -0700, Adrian Klaver wrote:
> On 8/22/25 05:37, hubert depesz lubaczewski wrote:
> > On Thu, Aug 21, 2025 at 11:17:27AM -0700, Adrian Klaver wrote:
> > > Have you looked at?:
> > > https://www.pgbouncer.org/changelog.html#pgbouncer-124x
> > > To see if anything stands out.
> > > Then there is:
> > > https://www.pgbouncer.org/config.html#max_prepared_statements
> > > The below may also be worth looking at:
> > > https://github.com/pgbouncer/pgbouncer/pull/1144
> > > I can't help thinking that there is a caching issue at stake, though that is
> > > just a guess.
> > 
> > Will check, but we don't use prepared statements, so this doesn't seem
> > to be part-of-the-problem.
> 
> In your OP there is:
> 
> "Every now and then (usually every 3-5 minutes, but not through the whole
> day), we see situations where every query suddently takes ~ 1 second.
> 
> 
> I'm talkign about "prepare" for preparing statements. binds. also
> "discard all".
> "
> 
> Is that referring to something else?

No. Same problem. The thing is that it comes and goes.

*BUT*

I got repeatable case today. Is is breaking on its own everyy
~ 5 minutes.

So I made test.sql with:
\timing on
SELECT 'SELECT now(), pg_sleep(.05);' FROM generate_series(1,100000) i \gexec

and ran it like this:

psql -p 5432 -d the_db -f test.sql -qAtX | perl -pe 's/\|\s*$/ :: /'

This bypassed pgbouncer, connected via unix socket, the whole thing got simpler.

After 4 minutes of 50ms runtimes I got:

2025-08-22 14:44:14.013541+00 :: Time: 50.090 ms
2025-08-22 14:44:14.063636+00 :: Time: 50.088 ms
2025-08-22 14:44:14.11373+00 :: Time: 50.085 ms
2025-08-22 14:44:14.163822+00 :: Time: 50.099 ms
2025-08-22 14:44:14.213927+00 :: Time: 1393.932 ms (00:01.394)
2025-08-22 14:44:15.607872+00 :: Time: 50.171 ms
2025-08-22 14:44:15.658097+00 :: Time: 50.209 ms
2025-08-22 14:44:15.708353+00 :: Time: 50.212 ms
2025-08-22 14:44:15.758589+00 :: Time: 50.170 ms

This is pg_sleep. There are no locks, nothing else. I think this
disproves idea that the problem is about pgbouncer, and it kinda looks
like a thing in pg?

Interestingly: ran strace on the pg backend that was doing this loop,
like this:
strace -tt -T -ff -p 1266521  -s 256 -o a.log

And it showed something really strange:

…
14:57:00.864684 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
14:57:00.864739 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
14:57:00.914823 sendto(9, "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22 14:57:00.86+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:00.914849 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
14:57:00.914891 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050064>
14:57:00.964978 sendto(9, "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22 14:57:00.91+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
14:57:00.965006 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>                                                                                                                                               14:57:00.965052 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
14:57:01.015133 futex(0xffe7a6636538, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) = 0 <1.354927>
14:57:02.370180 write(2, "\0\0\36\1YS\23\0T2025-08-22 14:57:02.370 UTC,\"postgres\",\"the.db\",1266521,\"[local]\",68a884fb.135359,5,\"SELECT\",2025-08-22 14:55:55 UTC,85/0,0,LOG,00000,\"duration: 1405.106 ms  statement: SELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);
14:57:02.370521 sendto(8, "\2\0\0\0@\0\0\0\216A\0\0\0\0\0\0\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\255\221\33\0\0\0\0\0\262\341\6\0\0\0\0\0\0\0\0\0\0\0\0\0", 64, 0, NULL, 0) = 64 <0.000041>
14:57:02.370639 sendto(9, "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22 14:57:00.97+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.371063 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000019>
14:57:02.371215 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050090>
14:57:02.421381 sendto(9, "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22 14:57:02.37+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
14:57:02.421480 recvfrom(9, 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000008>                                                                                                                                                                 14:57:02.421507 epoll_pwait(4, [{EPOLLIN, {u32=470567512, u64=187651886697048}}], 1, -1, NULL, 8) = 1 <0.000037>
14:57:02.421570 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000012>
14:57:02.421673 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050099>
14:57:02.471866 sendto(9, "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22 14:57:02.42+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.471961 recvfrom(9, 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000008>
…

As in, normal execution is: recv from, epoll_pwait that takes 0.05s, and sendto.
all clear, nothing fancy.

*BUT*
in case of this one problematic thing there is futex() call that took 1.354s ?! Checked, and in straced ~ 800 queries this is the only futex call() ?!

Any idea on what could it be?

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:07  Pavel Stehule <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: Pavel Stehule @ 2025-08-22 15:07 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

pá 22. 8. 2025 v 17:03 odesílatel hubert depesz lubaczewski <
[email protected]> napsal:

> On Fri, Aug 22, 2025 at 07:33:03AM -0700, Adrian Klaver wrote:
> > On 8/22/25 05:37, hubert depesz lubaczewski wrote:
> > > On Thu, Aug 21, 2025 at 11:17:27AM -0700, Adrian Klaver wrote:
> > > > Have you looked at?:
> > > > https://www.pgbouncer.org/changelog.html#pgbouncer-124x
> > > > To see if anything stands out.
> > > > Then there is:
> > > > https://www.pgbouncer.org/config.html#max_prepared_statements
> > > > The below may also be worth looking at:
> > > > https://github.com/pgbouncer/pgbouncer/pull/1144
> > > > I can't help thinking that there is a caching issue at stake, though
> that is
> > > > just a guess.
> > >
> > > Will check, but we don't use prepared statements, so this doesn't seem
> > > to be part-of-the-problem.
> >
> > In your OP there is:
> >
> > "Every now and then (usually every 3-5 minutes, but not through the whole
> > day), we see situations where every query suddently takes ~ 1 second.
> >
> >
> > I'm talkign about "prepare" for preparing statements. binds. also
> > "discard all".
> > "
> >
> > Is that referring to something else?
>
> No. Same problem. The thing is that it comes and goes.
>
> *BUT*
>
> I got repeatable case today. Is is breaking on its own everyy
> ~ 5 minutes.
>
> So I made test.sql with:
> \timing on
> SELECT 'SELECT now(), pg_sleep(.05);' FROM generate_series(1,100000) i
> \gexec
>
> and ran it like this:
>
> psql -p 5432 -d the_db -f test.sql -qAtX | perl -pe 's/\|\s*$/ :: /'
>
> This bypassed pgbouncer, connected via unix socket, the whole thing got
> simpler.
>
> After 4 minutes of 50ms runtimes I got:
>
> 2025-08-22 14:44:14.013541+00 :: Time: 50.090 ms
> 2025-08-22 14:44:14.063636+00 :: Time: 50.088 ms
> 2025-08-22 14:44:14.11373+00 :: Time: 50.085 ms
> 2025-08-22 14:44:14.163822+00 :: Time: 50.099 ms
> 2025-08-22 14:44:14.213927+00 :: Time: 1393.932 ms (00:01.394)
> 2025-08-22 14:44:15.607872+00 :: Time: 50.171 ms
> 2025-08-22 14:44:15.658097+00 :: Time: 50.209 ms
> 2025-08-22 14:44:15.708353+00 :: Time: 50.212 ms
> 2025-08-22 14:44:15.758589+00 :: Time: 50.170 ms
>
> This is pg_sleep. There are no locks, nothing else. I think this
> disproves idea that the problem is about pgbouncer, and it kinda looks
> like a thing in pg?
>
> Interestingly: ran strace on the pg backend that was doing this loop,
> like this:
> strace -tt -T -ff -p 1266521  -s 256 -o a.log
>
> And it showed something really strange:
>
> …
> 14:57:00.864684 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
> 14:57:00.864739 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
> 14:57:00.914823 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.86+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:00.914849 recvfrom(9,
> "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0",
> 8192, 0, NULL, NULL) = 68 <0.000007>
> 14:57:00.914891 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050064>
> 14:57:00.964978 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.91+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
> 14:57:00.965006 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
>
>
>  14:57:00.965052 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
> 14:57:01.015133 futex(0xffe7a6636538,
> FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) =
> 0 <1.354927>
> 14:57:02.370180 write(2, "\0\0\36\1YS\23\0T2025-08-22 14:57:02.370
> UTC,\"postgres\",\"the.db\",1266521,\"[local]\",68a884fb.135359,5,\"SELECT\",2025-08-22
> 14:55:55 UTC,85/0,0,LOG,00000,\"duration: 1405.106 ms  statement: SELECT
> now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);
> 14:57:02.370521 sendto(8, "\2\0\0\0@\0\0\0\216A\0\0\0\0\0\0\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\255\221\33\0\0\0\0\0\262\341\6\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 64, 0, NULL, 0) = 64 <0.000041>
> 14:57:02.370639 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.97+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.371063 recvfrom(9,
> "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0",
> 8192, 0, NULL, NULL) = 68 <0.000019>
> 14:57:02.371215 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050090>
> 14:57:02.421381 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:02.37+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
> 14:57:02.421480 recvfrom(9, 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1
> EAGAIN (Resource temporarily unavailable) <0.000008>
>
>
>  14:57:02.421507 epoll_pwait(4, [{EPOLLIN, {u32=470567512,
> u64=187651886697048}}], 1, -1, NULL, 8) = 1 <0.000037>
> 14:57:02.421570 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000012>
> 14:57:02.421673 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050099>
> 14:57:02.471866 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:02.42+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.471961 recvfrom(9,
> 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily
> unavailable) <0.000008>
> …
>
> As in, normal execution is: recv from, epoll_pwait that takes 0.05s, and
> sendto.
> all clear, nothing fancy.
>
> *BUT*
> in case of this one problematic thing there is futex() call that took
> 1.354s ?! Checked, and in straced ~ 800 queries this is the only futex
> call() ?!
>
> Any idea on what could it be?
>

Few years ago I had a similar experience - some sec mystic lags - the
problem was in virtualization.

Regards

Pavel



>
> Best regards,
>
> depesz
>
>
>
>


^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:19  hubert depesz lubaczewski <[email protected]>
  parent: Pavel Stehule <[email protected]>
  0 siblings, 0 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 15:19 UTC (permalink / raw)
  To: Pavel Stehule <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On Fri, Aug 22, 2025 at 05:07:09PM +0200, Pavel Stehule wrote:
> > Any idea on what could it be?
> Few years ago I had a similar experience - some sec mystic lags - the
> problem was in virtualization.

While I can't discount this idea, after all it is aws ec2, so virtual
boxes, what makes me wonder is:

in case of this slow situation there is futex call, which takes
a second. But there are no futex calls in any other query processing
logs.

So, while it is obvious that futex "wasted" this time, the question is:
why would pg run futex() for this one query at all?

Best regards,

depesz

^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:21  Tom Lane <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: Tom Lane @ 2025-08-22 15:21 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

hubert depesz lubaczewski <[email protected]> writes:
> I got repeatable case today. Is is breaking on its own everyy
> ~ 5 minutes.

Interesting.  That futex call is presumably caused by interaction
with some other process within the standby server, and the only
plausible candidate really is the startup process (which is replaying
WAL received from the primary).  There are cases where WAL replay
will take locks that can block queries on the standby.  Can you
correlate the delays on the standby server with any DDL events
occurring on the primary?

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:30  hubert depesz lubaczewski <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 2 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 15:30 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski <[email protected]> writes:
> > I got repeatable case today. Is is breaking on its own everyy
> > ~ 5 minutes.
> 
> Interesting.  That futex call is presumably caused by interaction
> with some other process within the standby server, and the only
> plausible candidate really is the startup process (which is replaying
> WAL received from the primary).  There are cases where WAL replay
> will take locks that can block queries on the standby.  Can you
> correlate the delays on the standby server with any DDL events
> occurring on the primary?

Nope. Plus there is certain repetition of these cases, so even if I'd
miss *some* create table/alter, it just isn't going to be happening
every 4-5 minutes.

For example, looking at logs for the last ~2h, and just checking
situation when there are more than 20 messages in the same milisecond,
I can see:

    108 14:02:03.149
     25 14:04:01.619
    110 14:05:36.924
     77 14:05:36.925
    108 14:09:28.155
     38 14:13:52.481
     63 14:13:52.482
     73 14:13:52.484
    146 14:18:19.338
     39 14:18:19.339
     24 14:20:01.694
     82 14:23:07.352
     55 14:23:07.353
     37 14:23:07.353
     45 14:27:44.125
    132 14:27:44.126
    109 14:31:41.593
     70 14:31:41.594
     24 14:32:01.205
     21 14:34:01.477
     79 14:35:36.761
    104 14:35:36.762
     22 14:39:49.541
    151 14:39:49.542
     22 14:39:49.543
    112 14:44:15.607
     73 14:44:15.608
     28 14:48:01.256
     50 14:48:25.588
    131 14:48:25.589
    139 14:52:44.391
     74 14:57:02.369
    117 14:57:02.370
     20 15:00:02.008
    137 15:00:43.982
     34 15:00:43.983
     20 15:01:01.110
     22 15:04:21.037
    153 15:04:21.038
     20 15:08:01.136
     31 15:08:55.798
    126 15:08:55.799
     76 15:13:46.654
     83 15:13:46.655
     20 15:17:01.700
    107 15:18:42.112
     72 15:18:42.113
    124 15:23:48.689
     32 15:23:48.690
     25 15:23:48.690
     28 15:24:01.397

So, while there are outliers, I'd say that most of the problems happens every
3-5 minutes.

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:39  Adrian Klaver <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: Adrian Klaver @ 2025-08-22 15:39 UTC (permalink / raw)
  To: [email protected]; Tom Lane <[email protected]>; +Cc: PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On 8/22/25 08:30, hubert depesz lubaczewski wrote:
> On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
>> hubert depesz lubaczewski <[email protected]> writes:
>>> I got repeatable case today. Is is breaking on its own everyy
>>> ~ 5 minutes.
>>
>> Interesting.  That futex call is presumably caused by interaction
>> with some other process within the standby server, and the only
>> plausible candidate really is the startup process (which is replaying
>> WAL received from the primary).  There are cases where WAL replay
>> will take locks that can block queries on the standby.  Can you
>> correlate the delays on the standby server with any DDL events
>> occurring on the primary?
> 
> Nope. Plus there is certain repetition of these cases, so even if I'd
> miss *some* create table/alter, it just isn't going to be happening
> every 4-5 minutes.
> 

> So, while there are outliers, I'd say that most of the problems happens every
> 3-5 minutes.

Are you using the Postgres community version or the AWS variant?

> 
> depesz
> 


-- 
Adrian Klaver
[email protected]






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:42  hubert depesz lubaczewski <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 0 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 15:42 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: Tom Lane <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On Fri, Aug 22, 2025 at 08:39:21AM -0700, Adrian Klaver wrote:
> On 8/22/25 08:30, hubert depesz lubaczewski wrote:
> > On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
> > > hubert depesz lubaczewski <[email protected]> writes:
> > > > I got repeatable case today. Is is breaking on its own everyy
> > > > ~ 5 minutes.
> > > 
> > > Interesting.  That futex call is presumably caused by interaction
> > > with some other process within the standby server, and the only
> > > plausible candidate really is the startup process (which is replaying
> > > WAL received from the primary).  There are cases where WAL replay
> > > will take locks that can block queries on the standby.  Can you
> > > correlate the delays on the standby server with any DDL events
> > > occurring on the primary?
> > 
> > Nope. Plus there is certain repetition of these cases, so even if I'd
> > miss *some* create table/alter, it just isn't going to be happening
> > every 4-5 minutes.
> > 
> 
> > So, while there are outliers, I'd say that most of the problems happens every
> > 3-5 minutes.
> 
> Are you using the Postgres community version or the AWS variant?

Community. From pgdg repo.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 15:55  Tom Lane <[email protected]>
  parent: hubert depesz lubaczewski <[email protected]>
  1 sibling, 1 reply; 27+ messages in thread

From: Tom Lane @ 2025-08-22 15:55 UTC (permalink / raw)
  To: [email protected]; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

hubert depesz lubaczewski <[email protected]> writes:
> On Fri, Aug 22, 2025 at 11:21:22AM -0400, Tom Lane wrote:
>> Interesting.  That futex call is presumably caused by interaction
>> with some other process within the standby server, and the only
>> plausible candidate really is the startup process (which is replaying
>> WAL received from the primary).  There are cases where WAL replay
>> will take locks that can block queries on the standby.  Can you
>> correlate the delays on the standby server with any DDL events
>> occurring on the primary?

> Nope. Plus there is certain repetition of these cases, so even if I'd
> miss *some* create table/alter, it just isn't going to be happening
> every 4-5 minutes.

Nonetheless, I'm suspecting an interaction with the startup process,
because there just isn't that much else that this process could be
needing to deal with.  Can you try strace'ing both the process doing
the test query and the startup process, to see what the startup
process is doing at the times the futex calls happen?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 27+ messages in thread

* Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug
@ 2025-08-22 16:01  hubert depesz lubaczewski <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 27+ messages in thread

From: hubert depesz lubaczewski @ 2025-08-22 16:01 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Adrian Klaver <[email protected]>; PostgreSQL General <[email protected]>; Chris Wilson <[email protected]>

On Fri, Aug 22, 2025 at 11:55:18AM -0400, Tom Lane wrote:
> Nonetheless, I'm suspecting an interaction with the startup process,
> because there just isn't that much else that this process could be
> needing to deal with.  Can you try strace'ing both the process doing
> the test query and the startup process, to see what the startup
> process is doing at the times the futex calls happen?

I can, but not today. Will get back to you about it, next week, though.

Best regards,

depesz







^ permalink  raw  reply  [nested|flat] 27+ messages in thread

end of thread, other threads:[~2025-08-22 16:01 UTC | newest]

Thread overview: 27+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-08-20 15:14 Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug Adrian Klaver <[email protected]>
2025-08-20 16:08 ` hubert depesz lubaczewski <[email protected]>
2025-08-20 17:24   ` Rob Sargent <[email protected]>
2025-08-20 17:30   ` Thom Brown <[email protected]>
2025-08-21 10:03     ` hubert depesz lubaczewski <[email protected]>
2025-08-21 11:41       ` Thom Brown <[email protected]>
2025-08-21 12:03         ` hubert depesz lubaczewski <[email protected]>
2025-08-20 17:45   ` Adrian Klaver <[email protected]>
2025-08-21 10:07     ` hubert depesz lubaczewski <[email protected]>
2025-08-21 15:04       ` Adrian Klaver <[email protected]>
2025-08-21 15:13         ` hubert depesz lubaczewski <[email protected]>
2025-08-21 15:59           ` Adrian Klaver <[email protected]>
2025-08-21 16:51             ` hubert depesz lubaczewski <[email protected]>
2025-08-21 18:17               ` Adrian Klaver <[email protected]>
2025-08-21 18:38                 ` Chris Wilson <[email protected]>
2025-08-22 12:39                   ` hubert depesz lubaczewski <[email protected]>
2025-08-22 12:37                 ` hubert depesz lubaczewski <[email protected]>
2025-08-22 14:33                   ` Adrian Klaver <[email protected]>
2025-08-22 15:03                     ` hubert depesz lubaczewski <[email protected]>
2025-08-22 15:07                       ` Pavel Stehule <[email protected]>
2025-08-22 15:19                         ` hubert depesz lubaczewski <[email protected]>
2025-08-22 15:21                       ` Tom Lane <[email protected]>
2025-08-22 15:30                         ` hubert depesz lubaczewski <[email protected]>
2025-08-22 15:39                           ` Adrian Klaver <[email protected]>
2025-08-22 15:42                             ` hubert depesz lubaczewski <[email protected]>
2025-08-22 15:55                           ` Tom Lane <[email protected]>
2025-08-22 16:01                             ` hubert depesz lubaczewski <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox