public inbox for [email protected]help / color / mirror / Atom feed
Re: Possible causes of high_replay lag, given replication settings? 3+ messages / 2 participants [nested] [flat]
* Re: Possible causes of high_replay lag, given replication settings? @ 2025-07-25 13:57 Jon Zeppieri <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Jon Zeppieri @ 2025-07-25 13:57 UTC (permalink / raw) To: Nick Cleaton <[email protected]>; +Cc: [email protected] On Wed, Jul 23, 2025 at 4:27 PM Nick Cleaton <[email protected]> wrote: > > On Fri, 18 Jul 2025 at 21:29, Jon Zeppieri <[email protected]> wrote: > > > > I just had a situation where physical replication fell far behind > > (hours). The write and flush lag times were 0, but replay_lag was > > high. The replica has hot_standby_feedback on, and both > > max_standby_streaming_delay and max_standby_archive_delay are set to > > 30s. > > > > What could cause a situation like this? If the network were a problem, > > I'd expect the other _lag times to be high. So it appears that the > > replica was getting the WAL but was unable to apply it. Are there > > situations where the replica cannot apply WAL other than the kinds of > > conflicts that would be addressed by the _delay settings? > > > > I checked pg_stat_database_conflicts, but there was nothing in it -- all zeros. > > This can happen when there are several busy writing processes on the > primary. The single replay process on the replica can't keep up with > the writes. Thanks for the response, Nick. I'm curious why the situation you describe wouldn't also lead to the write_lag and flush_lag also being high. If the problem is simply keeping up with the primary, wouldn't you expect all three lag times to be elevated? - Jon ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Possible causes of high_replay lag, given replication settings? @ 2025-07-25 23:12 Greg Sabino Mullane <[email protected]> parent: Jon Zeppieri <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Greg Sabino Mullane @ 2025-07-25 23:12 UTC (permalink / raw) To: Jon Zeppieri <[email protected]>; +Cc: Nick Cleaton <[email protected]>; [email protected] On Fri, Jul 25, 2025 at 9:57 AM Jon Zeppieri <[email protected]> wrote: > Thanks for the response, Nick. I'm curious why the situation you describe > wouldn't also lead to the write_lag and flush_lag also being > high. If the problem is simply keeping up with the primary, wouldn't you > expect all three lag times to be elevated? > No - write and flush are pretty quick and simple, it's just putting the WAL onto the local disk. Replay involves a lot more work as we have to parse the WAL and apply the changes, which means doing a lot of I/O across many files. Still, *hours* to me indicates more than just a lot of extra traffic. Check that recovery_min_apply_delay is still 0, then log onto the replica and see what's going on with regards to open transactions and locks. Cheers, Greg -- Crunchy Data - https://www.crunchydata.com Enterprise Postgres Software Products & Tech Support ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Possible causes of high_replay lag, given replication settings? @ 2025-07-26 15:43 Jon Zeppieri <[email protected]> parent: Greg Sabino Mullane <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: Jon Zeppieri @ 2025-07-26 15:43 UTC (permalink / raw) To: Greg Sabino Mullane <[email protected]>; +Cc: Nick Cleaton <[email protected]>; [email protected] On Fri, Jul 25, 2025 at 7:13 PM Greg Sabino Mullane <[email protected]> wrote: > > On Fri, Jul 25, 2025 at 9:57 AM Jon Zeppieri <[email protected]> wrote: >> >> Thanks for the response, Nick. I'm curious why the situation you describe wouldn't also lead to the write_lag and flush_lag also being >> high. If the problem is simply keeping up with the primary, wouldn't you expect all three lag times to be elevated? > > > No - write and flush are pretty quick and simple, it's just putting the WAL onto the local disk. Replay involves a lot more work as we have to parse the WAL and apply the changes, which means doing a lot of I/O across many files. Still, *hours* to me indicates more than just a lot of extra traffic. Check that recovery_min_apply_delay is still 0, then log onto the replica and see what's going on with regards to open transactions and locks. Thanks Greg. `recovery_min_apply_delay` is 0, just checked. Also, I didn't mention in my initial post that it seemed the cause of the delay was long-running queries on the replica, rather than the primary. It's possible, of course, that I'm wrong, but I was able to get the replica moving again when I killed off old queries on the replica. If those were the problem, though, then I don't understand why the max_standby_streaming_delay didn't prevent that situation. - Jon ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2025-07-26 15:43 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-07-25 13:57 Re: Possible causes of high_replay lag, given replication settings? Jon Zeppieri <[email protected]> 2025-07-25 23:12 ` Greg Sabino Mullane <[email protected]> 2025-07-26 15:43 ` Jon Zeppieri <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox