Feedback-ID: i341740b3:Fastmail
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\))
Subject: Re: Use of inefficient index in the presence of dead tuples
From: Alexander Staubo <alex@purefiction.net>
In-Reply-To: <2771.1716944001@sss.pgh.pa.us>
Date: Wed, 29 May 2024 14:36:24 +0200
Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <2C67231C-0A63-4B9D-AA9D-8BF69D29BC3C@purefiction.net>
References: <DC43B9C3-7BCB-4671-A69E-B0061C710241@purefiction.net>
 <2771.1716944001@sss.pgh.pa.us>
To: Tom Lane <tgl@sss.pgh.pa.us>
Archived-At: <https://www.postgresql.org/message-id/2C67231C-0A63-4B9D-AA9D-8BF69D29BC3C%40purefiction.net>
Precedence: bulk

> On 29 May 2024, at 02:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>=20
> Alexander Staubo <alex@purefiction.net> writes:
>> (2) Set up schema. It's important to create the index before =
insertion, in order to provoke a
>> situation where the indexes have dead tuples:
>> ...
>> (4) Then ensure all tuples are dead except one:
>=20
>>    DELETE FROM outbox_batches;
>>    INSERT INTO outbox_batches (receiver, id) VALUES ('dummy', =
'test');
>=20
>> (5) Analyze:
>=20
>>    ANALYZE outbox_batches;
>=20
> So the problem here is that the ANALYZE didn't see any of the dead =
rows
> and thus there is no way to know that they all match 'dummy'.  The =
cost
> estimation is based on the conclusion that there is exactly one row
> that will pass the index condition in each case, and thus the "right"
> index doesn't look any cheaper than the "wrong" one --- in fact, it
> looks a little worse because of the extra access to the visibility
> map that will be incurred by an index-only scan.
>=20
> I'm unpersuaded by the idea that ANALYZE should count dead tuples.
> Since those are going to go away pretty soon, we would risk
> estimating on the basis of no-longer-relevant stats and thus
> creating problems worse than the one we solve.

Mind you, =E2=80=9Cpretty soon=E2=80=9D could actually be =E2=80=9Chours" =
if a pg_dump is running, or some other long-running transaction is =
holding back the xmin. Granted, long-running transactions should be =
avoided, but they happen, and the result is operationally surprising.

I have another use case where I used a transaction to do lock a resource =
to prevent concurrent access. I.e. the logic did =E2=80=9CSELECT =E2=80=A6=
 FROM =E2=80=A6 WHERE id =3D $1 FOR UPDATE=E2=80=9D and held that =
transaction open for hours while doing maintenance. This ended up =
causing the exact same index issue with dead tuples, with some queries =
taking 30 minutes where they previously took just a few milliseconds. In =
retrospect, this process should have used advisory locks to avoid =
holding back vacuums. But the point stands that a small amount dead =
tuple cruft can massively skew performance in surprising ways.

> What is interesting here is that had you done ANALYZE *before*
> the delete-and-insert, you'd have been fine.  So it seems like
> somewhat out-of-date stats would have benefited you.
>=20
> It would be interesting to see a non-artificial example that took
> into account when the last auto-vacuum and auto-analyze really
> happened, so we could see if there's any less-fragile way of
> dealing with this situation.

Just to clarify, this is a real use case, though the repro is of course =
artificial since the real production case is inserting and deleting rows =
very quickly.

According to collected metrics, the average time since the last =
autoanalyze is around 20 seconds for this table, same for autovacuum. =
The times I have observed poor performance is in situations where the =
autovacuum was not able reclaim non-removable rows, i.e. it=E2=80=99s =
not the absence of autovacuum, but rather the inability to clear up dead =
tuples.