Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tQvL1-0064Eq-Gg for pgsql-general@arkaria.postgresql.org; Thu, 26 Dec 2024 21:24:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tQvKz-004nTP-95 for pgsql-general@arkaria.postgresql.org; Thu, 26 Dec 2024 21:24:09 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tQvKy-004nTG-PX for pgsql-general@lists.postgresql.org; Thu, 26 Dec 2024 21:24:08 +0000 Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tQvKw-000hf0-3C for pgsql-general@lists.postgresql.org; Thu, 26 Dec 2024 21:24:07 +0000 Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-21636268e43so10165465ad.2 for ; Thu, 26 Dec 2024 13:24:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735248246; x=1735853046; darn=lists.postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=Jxj/WIM+w2J79F3IkUGBha/DLe4Wo+BaKs3ha/x4hJk=; b=nWVXAAb/xrKcVkZVtHPVrjw2G5ldTZ/dV1TVURnxXvkymSUIbSt+zMpWOFdfFldPzW eKDQwYDE/8Kpgd4EUXuHdV+JX4CoBhRQa4ZcYYuhP9UK5lTUr93JwOJQJnRw/yg3jNOF IFjiFNV3s6WY/tdXnZ8tYzaJAc6n+ZrMumuBpiWgtSoTKTosdk4mLqi7srzAmGKPmjej Rvpl+oUn39OKX68ir0veJwpBDU5fquyZ3HP4+uZ1jBVgQIYtT71chMjjRU+4rnQLb80R MQ5tpem7fGEyKE07dNNSzNPj0OBU/hJJ7IVFHzaEW15zXfQVvKcWg86Gr4EGWrgQSaii oClw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735248246; x=1735853046; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Jxj/WIM+w2J79F3IkUGBha/DLe4Wo+BaKs3ha/x4hJk=; b=Jy+7Ll7k8YVHf5f5bXUn67B9nPdpIrwQCZ0TdShUMOcpAkoLuxsC2GQ3ikXPbR6xDT HjXO4fILZYdPHoPQAkcLcGU8EXUpd4UR3nybbh5phNIh69p4wsOhdHrKlr7tMpIk7Rqi zW10EZJJH9MFLgRai4FdY6NTzF8O7BCEj8vBRySHm5gpMucWzyYzlpodVf2cRs10tyaa qeTgVTMiTMsV57wzjCg7PltI27qu8JCd5h0deDbs0xvLhXbdrU1a2XoRha08Usf3EA/U 9MAQqIKrL2dTU8gckWkiSvatHdF0y7veE6ugux+CDPFOVd1kV5SdF2fFwbtkyW7aXghm 3Itg== X-Gm-Message-State: AOJu0YxfszY4uQtibpw+6PS1rYYEWsApuhLEv0rnccgkj9w/LWx0Uz1c k8FOUJi83+UaSFJePzzZoPbotWU44JztgPPIDyRuRuWgbSRq7x9T X-Gm-Gg: ASbGncvp13vN/Wwi5VoBTo7zSbziFY47sxiEAq4xsVqk7j7owzQTm2Ld3iXbcofBwuI kEqT7xvAlb9IaAmABVgccr35DObnEG46NNXDyRL+UkHqINjAKfAnhymy+B/o75xWnvb9JUCZD/R u20t9LIomneYesQzsA6oe+lc9UesQWb56a4ucqJeR1N9Rl6stZw/0KGMKAKKpWWft553B8+oaCg UCA2OmgoAFh7/BHVbFivUNCVAe0ue5XXNBQ/bBDtdscD/eMuh9GyuHSbJdYNXujaDU/dkmtWgx9 E3vQJ7U= X-Google-Smtp-Source: AGHT+IHEUH4999WqGe8/TpPsX9UIY8EobvtoilkFQHrfkhr5Psl0suQj/G+xu5Ou4eirba5ooHu/ag== X-Received: by 2002:a17:903:186:b0:215:8f2e:eeda with SMTP id d9443c01a7336-219e6f273a8mr310285155ad.52.1735248245811; Thu, 26 Dec 2024 13:24:05 -0800 (PST) Received: from dev.null (d23-16-179-220.bchsia.telus.net. [23.16.179.220]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc9f507csm123760655ad.196.2024.12.26.13.24.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Dec 2024 13:24:05 -0800 (PST) Sender: Will Storey Date: Thu, 26 Dec 2024 13:24:03 -0800 From: Will Storey To: Jeremy Schneider Cc: pgsql-general@lists.postgresql.org Subject: Re: Disabling vacuum truncate for autovacuum Message-ID: References: <20241226122108.33e906a3@jeremy-ThinkPad-T430s> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20241226122108.33e906a3@jeremy-ThinkPad-T430s> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu 2024-12-26 12:21:08 -0800, Jeremy Schneider wrote: > On Mon, 16 Dec 2024 16:25:06 -0800 > Will Storey wrote: > > > I would like to disable vacuum's truncate behaviour for autovacuum. > > Previously I had an outage due to its access exclusive lock when it > > was replicated to a hot standby. > > > > When that outage happened it was from a VACUUM call in a cronjob > > rather than autovacuum. I now run such VACUUMs with TRUNCATE false > > which avoids the issue for these. However I've realized that > > autovacuum could cause this as well. This is of interest to me > > because I'm looking at tuning autovacuum and getting rid of the > > cronjob, but I've realized relying on autovacuum could be dangerous > > because of the truncates. > > Can you tell us a little bit more about the outage? Autovacuum is > designed to quickly relinquish this lock if there is any contention, and > the dangers of disabling autovacuum are significant, so your statement > about autovac being "dangerous" will raise a lot of eyebrows. > > Did your outage involve hot standbys serving read-only traffic, or did > it only involve a read-write database? > > What was the exact nature of the outage and how did you narrow down the > cause to the exclusive lock held specifically during an autovacuum > truncation? My incident was actually not caused by autovacuum. A VACUUM was run against the primary by a cronjob. A web service running read queries against hot standbys went down for several minutes as its queries were stuck in a lock queue. While this was VACUUM, my understanding is that autovacuum could do this as well because it does not see the queries on the hot standby that could be blocked by it, so it won't know to stop its work. I think this issue is part of what lead to the addition of the vacuum_truncate reloption discussed in https://www.postgresql.org/message-id/flat/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ%40mail.gmail.com, e.g. this message: https://www.postgresql.org/message-id/20190408044345.ndxsnveqqlj3m67g%40alap3.anarazel.de. I could be misunderstanding it though! As I recall, I confirmed the cause via query logs. I noticed the table was vacuumed at the time, which lead me to learning about the page truncation behaviour. It has been a couple years though. The cronjob still runs every night, but now with TRUNCATE false. I've been thinking of trying to get rid of it and rely more on autovacuum which is why I've been revisiting this. As well, we're no longer protected by old_snapshot_threshold disabling the page truncation globally, due to that being removed.