public inbox for [email protected]help / color / mirror / Atom feed
Re: Disabling vacuum truncate for autovacuum 3+ messages / 2 participants [nested] [flat]
* Re: Disabling vacuum truncate for autovacuum @ 2024-12-26 20:21 Jeremy Schneider <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Jeremy Schneider @ 2024-12-26 20:21 UTC (permalink / raw) To: Will Storey <[email protected]>; [email protected] On Mon, 16 Dec 2024 16:25:06 -0800 Will Storey <[email protected]> wrote: > I would like to disable vacuum's truncate behaviour for autovacuum. > Previously I had an outage due to its access exclusive lock when it > was replicated to a hot standby. > > When that outage happened it was from a VACUUM call in a cronjob > rather than autovacuum. I now run such VACUUMs with TRUNCATE false > which avoids the issue for these. However I've realized that > autovacuum could cause this as well. This is of interest to me > because I'm looking at tuning autovacuum and getting rid of the > cronjob, but I've realized relying on autovacuum could be dangerous > because of the truncates. Can you tell us a little bit more about the outage? Autovacuum is designed to quickly relinquish this lock if there is any contention, and the dangers of disabling autovacuum are significant, so your statement about autovac being "dangerous" will raise a lot of eyebrows. Did your outage involve hot standbys serving read-only traffic, or did it only involve a read-write database? What was the exact nature of the outage and how did you narrow down the cause to the exclusive lock held specifically during an autovacuum truncation? -Jeremy ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Disabling vacuum truncate for autovacuum @ 2024-12-26 21:24 Will Storey <[email protected]> parent: Jeremy Schneider <[email protected]> 0 siblings, 1 reply; 3+ messages in thread From: Will Storey @ 2024-12-26 21:24 UTC (permalink / raw) To: Jeremy Schneider <[email protected]>; +Cc: [email protected] On Thu 2024-12-26 12:21:08 -0800, Jeremy Schneider wrote: > On Mon, 16 Dec 2024 16:25:06 -0800 > Will Storey <[email protected]> wrote: > > > I would like to disable vacuum's truncate behaviour for autovacuum. > > Previously I had an outage due to its access exclusive lock when it > > was replicated to a hot standby. > > > > When that outage happened it was from a VACUUM call in a cronjob > > rather than autovacuum. I now run such VACUUMs with TRUNCATE false > > which avoids the issue for these. However I've realized that > > autovacuum could cause this as well. This is of interest to me > > because I'm looking at tuning autovacuum and getting rid of the > > cronjob, but I've realized relying on autovacuum could be dangerous > > because of the truncates. > > Can you tell us a little bit more about the outage? Autovacuum is > designed to quickly relinquish this lock if there is any contention, and > the dangers of disabling autovacuum are significant, so your statement > about autovac being "dangerous" will raise a lot of eyebrows. > > Did your outage involve hot standbys serving read-only traffic, or did > it only involve a read-write database? > > What was the exact nature of the outage and how did you narrow down the > cause to the exclusive lock held specifically during an autovacuum > truncation? My incident was actually not caused by autovacuum. A VACUUM was run against the primary by a cronjob. A web service running read queries against hot standbys went down for several minutes as its queries were stuck in a lock queue. While this was VACUUM, my understanding is that autovacuum could do this as well because it does not see the queries on the hot standby that could be blocked by it, so it won't know to stop its work. I think this issue is part of what lead to the addition of the vacuum_truncate reloption discussed in https://www.postgresql.org/message-id/flat/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ..., e.g. this message: https://www.postgresql.org/message-id/20190408044345.ndxsnveqqlj3m67g%40alap3.anarazel.de. I could be misunderstanding it though! As I recall, I confirmed the cause via query logs. I noticed the table was vacuumed at the time, which lead me to learning about the page truncation behaviour. It has been a couple years though. The cronjob still runs every night, but now with TRUNCATE false. I've been thinking of trying to get rid of it and rely more on autovacuum which is why I've been revisiting this. As well, we're no longer protected by old_snapshot_threshold disabling the page truncation globally, due to that being removed. ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Disabling vacuum truncate for autovacuum @ 2024-12-26 21:43 Jeremy Schneider <[email protected]> parent: Will Storey <[email protected]> 0 siblings, 0 replies; 3+ messages in thread From: Jeremy Schneider @ 2024-12-26 21:43 UTC (permalink / raw) To: Will Storey <[email protected]>; +Cc: [email protected] On Thu, 26 Dec 2024 13:24:03 -0800 Will Storey <[email protected]> wrote: > My incident was actually not caused by autovacuum. A VACUUM was run > against the primary by a cronjob. A web service running read queries > against hot standbys went down for several minutes as its queries > were stuck in a lock queue. > > ... > > As I recall, I confirmed the cause via query logs. I noticed the > table was vacuumed at the time, which lead me to learning about the > page truncation behaviour. It has been a couple years though. Ah - thanks - this is very helpful. I have also seen issues specifically with hot standbys, which continue holding the exclusive lock even when the primary read-write instance releases the lock. A better solution in my opinion would be to enhance the WAL replay process so that it can somehow temporarily relinquish the exclusive lock under contention, similar to what the primary read-write instance is able to do. This is not an easy enhancement to make. Maybe we'd need the primary to put more information into the WAL than it does today. Maybe we'd need to leverage hot_standby_feedback to enable standbys to signal a primary to release the lock. Anyway thanks for the report - we need people reporting these issues on the lists so that there's a little visibility into the impact. Personally I'm still hesitant about the idea of globally disabling vacuum truncation. That was never the goal of the old_snapshot_threshold feature, interesting that you were able to capitalize on the side-effect. Personally I'd still favor disabling it only on the tables that are both frequently vacuumed and also frequently queried on hot standbys. In a pinch, you could disable it for all tables with a bit of dynamic SQL and ensuring that new tables created in the future include the syntax to disable it too. -Jeremy ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2024-12-26 21:43 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-12-26 20:21 Re: Disabling vacuum truncate for autovacuum Jeremy Schneider <[email protected]> 2024-12-26 21:24 ` Will Storey <[email protected]> 2024-12-26 21:43 ` Jeremy Schneider <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox