Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sCIYw-004eUC-Al for pgsql-general@arkaria.postgresql.org; Wed, 29 May 2024 12:37:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sCIXy-00F0zO-2F for pgsql-general@arkaria.postgresql.org; Wed, 29 May 2024 12:36:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sCIXw-00F0z2-Kd for pgsql-general@lists.postgresql.org; Wed, 29 May 2024 12:36:49 +0000 Received: from wfout4-smtp.messagingengine.com ([64.147.123.147]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sCIXp-001L4f-C3 for pgsql-general@postgresql.org; Wed, 29 May 2024 12:36:45 +0000 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfout.west.internal (Postfix) with ESMTP id 04FA61C0010D; Wed, 29 May 2024 08:36:38 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 29 May 2024 08:36:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purefiction.net; h=cc:cc:content-transfer-encoding:content-type:content-type :date:date:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:subject:subject:to:to; s=fm3; t=1716986198; x=1717072598; bh=ychZd7Us7N/T1Sbt4W/WAcUYdXStmggz uHjrF4DDr3U=; b=pxjGJj1hPF0GinvqlzMEMonoK9rKH/hGMz4268MsQptOr5Vn QhuN7iVuEWvC3rcXpkVA+pxgtYcXoqQpiVCG5C/Uxh6hGLYsi2ZyP2HLdRhVFHxY gXKkHvA6USAeRWGl08fdMeOUBi5qRIydv8gwaZ0tveTEIq8HBvDNALOZOdo2ey+r dd857HSGZSTiGw0Z8puJXoAPdkC7AySq4vAXLbY2nFGSuSH3K2aOKUS2e3o+KRYm HeHG2YAAQfPbj9LchnE9BCWRSw3JV0njlKJmZEe8S7TtyLmHFSowrmrvZMQ9SupZ vw9qLPWH69AFGpRmj4Z37mHSKvIjJ3jr25rjaA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1716986198; x= 1717072598; bh=ychZd7Us7N/T1Sbt4W/WAcUYdXStmggzuHjrF4DDr3U=; b=K o67N8XZiR9R32ewig7bkLQ6/yOvrfOvG+WP4BLmRnsy7RFbkc2IrP8JECYxtOEoO Z/wk3N6V9o2IuKzCspVt47Q2X/m9xCddloGY0A5Mtg5oTqbuAmMzdTSJDYFwYBOo z+VKoJlKXjRdeOdroOZIaOFbd9rPNBbMYveeC+z+PSYhl2qghwUk76sK03BB1Xg4 Wr7qTQBtD2QzeoquI2hympm1w3oXfVT+kb0oDF1iNvinyWB9dBe7Bzf8hzU2Hlt9 mMwRXzXvK2lwfnfV9uC+q66G6TxB3NXLlBKwzDGU+IbU2aCAnn7hszMYXPeGWsza iT7k1bTlS/T/k2ZT5xORw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdekuddghedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpegtggfuhfgjffevgffkfhfvofesth hqmhdthhdtjeenucfhrhhomheptehlvgigrghnuggvrhcuufhtrghusghouceorghlvgig sehpuhhrvghfihgtthhiohhnrdhnvghtqeenucggtffrrghtthgvrhhnpedtgeefkeeiud ejgeeluddulefhfeeujefhgeehfeegfeetgeeivdehgefhgfdvudenucevlhhushhtvghr ufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrlhgvgiesphhurhgvfhhitg htihhonhdrnhgvth X-ME-Proxy: Feedback-ID: i341740b3:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 29 May 2024 08:36:37 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: Use of inefficient index in the presence of dead tuples From: Alexander Staubo In-Reply-To: <2771.1716944001@sss.pgh.pa.us> Date: Wed, 29 May 2024 14:36:24 +0200 Cc: "pgsql-general@postgresql.org" Content-Transfer-Encoding: quoted-printable Message-Id: <2C67231C-0A63-4B9D-AA9D-8BF69D29BC3C@purefiction.net> References: <2771.1716944001@sss.pgh.pa.us> To: Tom Lane X-Mailer: Apple Mail (2.3774.500.171.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On 29 May 2024, at 02:53, Tom Lane wrote: >=20 > Alexander Staubo writes: >> (2) Set up schema. It's important to create the index before = insertion, in order to provoke a >> situation where the indexes have dead tuples: >> ... >> (4) Then ensure all tuples are dead except one: >=20 >> DELETE FROM outbox_batches; >> INSERT INTO outbox_batches (receiver, id) VALUES ('dummy', = 'test'); >=20 >> (5) Analyze: >=20 >> ANALYZE outbox_batches; >=20 > So the problem here is that the ANALYZE didn't see any of the dead = rows > and thus there is no way to know that they all match 'dummy'. The = cost > estimation is based on the conclusion that there is exactly one row > that will pass the index condition in each case, and thus the "right" > index doesn't look any cheaper than the "wrong" one --- in fact, it > looks a little worse because of the extra access to the visibility > map that will be incurred by an index-only scan. >=20 > I'm unpersuaded by the idea that ANALYZE should count dead tuples. > Since those are going to go away pretty soon, we would risk > estimating on the basis of no-longer-relevant stats and thus > creating problems worse than the one we solve. Mind you, =E2=80=9Cpretty soon=E2=80=9D could actually be =E2=80=9Chours" = if a pg_dump is running, or some other long-running transaction is = holding back the xmin. Granted, long-running transactions should be = avoided, but they happen, and the result is operationally surprising. I have another use case where I used a transaction to do lock a resource = to prevent concurrent access. I.e. the logic did =E2=80=9CSELECT =E2=80=A6= FROM =E2=80=A6 WHERE id =3D $1 FOR UPDATE=E2=80=9D and held that = transaction open for hours while doing maintenance. This ended up = causing the exact same index issue with dead tuples, with some queries = taking 30 minutes where they previously took just a few milliseconds. In = retrospect, this process should have used advisory locks to avoid = holding back vacuums. But the point stands that a small amount dead = tuple cruft can massively skew performance in surprising ways. > What is interesting here is that had you done ANALYZE *before* > the delete-and-insert, you'd have been fine. So it seems like > somewhat out-of-date stats would have benefited you. >=20 > It would be interesting to see a non-artificial example that took > into account when the last auto-vacuum and auto-analyze really > happened, so we could see if there's any less-fragile way of > dealing with this situation. Just to clarify, this is a real use case, though the repro is of course = artificial since the real production case is inserting and deleting rows = very quickly. According to collected metrics, the average time since the last = autoanalyze is around 20 seconds for this table, same for autovacuum. = The times I have observed poor performance is in situations where the = autovacuum was not able reclaim non-removable rows, i.e. it=E2=80=99s = not the absence of autovacuum, but rather the inability to clear up dead = tuples.