Re: Use of inefficient index in the presence of dead tuples

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tom Lane <[email protected]>
To: Alexander Staubo <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Use of inefficient index in the presence of dead tuples
Date: Tue, 28 May 2024 17:53:21 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Alexander Staubo <[email protected]> writes:
> (2) Set up schema. It's important to create the index before insertion, in order to provoke a
> situation where the indexes have dead tuples:
> ...
> (4) Then ensure all tuples are dead except one:

>     DELETE FROM outbox_batches;
>     INSERT INTO outbox_batches (receiver, id) VALUES ('dummy', 'test');

> (5) Analyze:

>     ANALYZE outbox_batches;

So the problem here is that the ANALYZE didn't see any of the dead rows
and thus there is no way to know that they all match 'dummy'.  The cost
estimation is based on the conclusion that there is exactly one row
that will pass the index condition in each case, and thus the "right"
index doesn't look any cheaper than the "wrong" one --- in fact, it
looks a little worse because of the extra access to the visibility
map that will be incurred by an index-only scan.

I'm unpersuaded by the idea that ANALYZE should count dead tuples.
Since those are going to go away pretty soon, we would risk
estimating on the basis of no-longer-relevant stats and thus
creating problems worse than the one we solve.

What is interesting here is that had you done ANALYZE *before*
the delete-and-insert, you'd have been fine.  So it seems like
somewhat out-of-date stats would have benefited you.

It would be interesting to see a non-artificial example that took
into account when the last auto-vacuum and auto-analyze really
happened, so we could see if there's any less-fragile way of
dealing with this situation.

			regards, tom lane

view thread (8+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Use of inefficient index in the presence of dead tuples
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox