Re: Use of inefficient index in the presence of dead tuples

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Alexander Staubo <[email protected]>
To: Laurenz Albe <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Use of inefficient index in the presence of dead tuples
Date: Tue, 28 May 2024 15:20:46 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

On 28 May 2024, at 13:02, Laurenz Albe <[email protected]> wrote:
> ANALYZE considers only the live rows, so PostgreSQL knows that the query will
> return only few results.  So it chooses the smaller index rather than the one
> that matches the WHERE condition perfectly.
> 
> Unfortunately, it has to wade through all the deleted rows, which is slow.

Sounds like the planner _should_ take the dead tuples into account. I’m surprised there are no parameters to tweak to make the planner understand that one index is more selective even though it is technically larger.

> But try to execute the query a second time, and it will be much faster.
> PostgreSQL marks the index entries as "dead" during the first execution, so the
> second execution won't have to look at the heap any more.

Of course. It’s still not _free_; it’s still trawling through many megabytes of dead data, and going through the shared buffer cache and therefore competing with other queries that need shared buffers. 

> I understand your pain, but your use case is somewhat unusual.

I don’t think rapidly updated tables is an unusual use of Postgres, nor is the problem of long-running transaction preventing dead tuple vacuuming.

> What I would consider in your place is
> a) running an explicit VACUUM after you delete lots of rows or

The rows are deleted individually. It’s just that there are many transactions doing it concurrently.

> b) using partitioning to get rid of old data

Partitioning will generate dead tuples in the original partition when tuples are moved to the other partition, so I’m not sure how that would help?

I did explore a solution which is my “plan B” — adding a “done” column, then using “UPDATE … SET done = true” rather than deleting the rows. This causes dead tuples, of course, but then adding a new index with a “… WHERE NOT done” filter fixes the problem by forcing the query to use the right index. However, with this solution, rows will still have to be deleted *sometime*, so this just delays the problem. But it would allow a “batch cleanup”: “DELETE … WHERE done; VACUUM” in one fell swoop.

view thread (8+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Use of inefficient index in the presence of dead tuples
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox