Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sC7ZK-003Ien-8l for pgsql-general@arkaria.postgresql.org; Wed, 29 May 2024 00:53:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sC7ZI-00CWi0-HV for pgsql-general@arkaria.postgresql.org; Wed, 29 May 2024 00:53:28 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sC7ZI-00CWhs-6T for pgsql-general@lists.postgresql.org; Wed, 29 May 2024 00:53:28 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sC7ZC-002Ret-UL for pgsql-general@postgresql.org; Wed, 29 May 2024 00:53:26 +0000 Received: from pro.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 44T0rLto1491324; Tue, 28 May 2024 20:53:21 -0400 From: Tom Lane To: Alexander Staubo cc: "pgsql-general@postgresql.org" Subject: Re: Use of inefficient index in the presence of dead tuples In-reply-to: References: Comments: In-reply-to Alexander Staubo message dated "Tue, 28 May 2024 10:00:22 +0200" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <2770.1716944001.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Tue, 28 May 2024 17:53:21 -0700 Message-ID: <2771.1716944001@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Alexander Staubo writes: > (2) Set up schema. It's important to create the index before insertion, = in order to provoke a > situation where the indexes have dead tuples: > ... > (4) Then ensure all tuples are dead except one: > DELETE FROM outbox_batches; > INSERT INTO outbox_batches (receiver, id) VALUES ('dummy', 'test'); > (5) Analyze: > ANALYZE outbox_batches; So the problem here is that the ANALYZE didn't see any of the dead rows and thus there is no way to know that they all match 'dummy'. The cost estimation is based on the conclusion that there is exactly one row that will pass the index condition in each case, and thus the "right" index doesn't look any cheaper than the "wrong" one --- in fact, it looks a little worse because of the extra access to the visibility map that will be incurred by an index-only scan. I'm unpersuaded by the idea that ANALYZE should count dead tuples. Since those are going to go away pretty soon, we would risk estimating on the basis of no-longer-relevant stats and thus creating problems worse than the one we solve. What is interesting here is that had you done ANALYZE *before* the delete-and-insert, you'd have been fine. So it seems like somewhat out-of-date stats would have benefited you. It would be interesting to see a non-artificial example that took into account when the last auto-vacuum and auto-analyze really happened, so we could see if there's any less-fragile way of dealing with this situation. regards, tom lane