MIME-Version: 1.0
References: <78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com>
In-Reply-To: <78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com>
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Mon, 10 Feb 2025 18:17:42 +0100
Message-ID: <CAEze2WjjOg+gE1VUZ2Omd-26MniaY6-UJghqzLZMHpVkDEUy8w@mail.gmail.com>
Subject: Re: Expanding HOT updates for expression and partial indexes
To: "Burd, Greg" <gregburd@amazon.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CAEze2WjjOg%2BgE1VUZ2Omd-26MniaY6-UJghqzLZMHpVkDEUy8w%40mail.gmail.com>
Precedence: bulk

On Thu, 6 Feb 2025 at 23:24, Burd, Greg <gregburd@amazon.com> wrote:
>
> Attached find a patch that expands the cases where heap-only tuple (HOT) =
updates are possible without changing the basic semantics of HOT. This is a=
ccomplished by examining expression indexes for changes to determine if ind=
exes require updating or not. A similar approach is taken for partial index=
es, the predicate is evaluated and, in some cases, HOT updates are allowed.=
 Even with this patch if any index is changed, all indexes are updated. Onl=
y in cases where none are modified will this patch allow the HOT path.

So, effectively this disables the amsummarizing-based optimizations of
https://postgr.es/c/19d8e2308 ? That sounds like a bad degradation in
behaviour.

> I=E2=80=99m also aware of PHOT [4] and WARM [5] which allow for updating =
some, but not all indexes while remaining on the HOT update path, this patc=
h does not attempt to accomplish that.
>
> [...] This opens the door to future improvements by providing a way to pa=
ss a bitmap of modified indexes along to be addressed by something similar =
to the PHOT/WARM logic.

<sidetrack>

I have serious doubts about the viability of any proposal working to
implement PHOT/WARM in PostgreSQL, as they seem to have an inherent
nature of fundamentally breaking the TID lifecycle:
We won't be able to clean up dead-to-everyone TIDs that were
PHOT-updated, because some index Y may still rely on it, and we can't
remove the TID from that same index Y because there is still a live
PHOT/WARM tuple later in the chain whose values for that index haven't
changed since that dead-to-everyone tuple, and thus this PHOT/WARM
tuple is the one pointed to by that index.
For HOT, this isn't much of an issue, because there is just one TID
that's impacted (and it only occupies a single LP slot, with
LP_REDIRECT). However, with PHOT/WARM, you'd relatively easily be able
to fill a page with TIDs (or even full tuples) you can't clean up with
VACUUM until the moment a the PHOT/WARM/HOT chain is broken (due to
UPDATE leaving the page or the final entry getting DELETE-d).

Unless we are somehow are able to replace the TIDs in indexes from
"intermediate dead PHOT" to "base TID"/"latest TID" (either of which
is probably also problematic for indexes that expect a TID to appear
exactly once in the index at any point in time) I don't think the
system is viable if we maintain only a single data structure to
contain all dead TIDs. If we had a datastore for dead items per index,
that'd be more likely to work, but it also would significantly
increase the memory overhead of vacuuming tables.

</sidetrack>

> I have a few concerns with the patch, things I=E2=80=99d greatly apprecia=
te your thoughts on:
>
> First, I pass an EState along the update path to enable running the check=
s in heapam, this works but leaves me feeling as if I violated separation o=
f concerns. If there is a better way to do this let me know or if you think=
 the cost of creating one in the execIndexing.c ExecIndexesRequiringUpdates=
() is okay that=E2=80=99s another possibility.

I think that doesn't have to be bad.

> Third, there is overhead to this patch, it is no longer a single simple b=
itmap test to choose HOT or not in heap_update().

Why can't it mostly be that simple in simple cases?

I mean, it's clear that "updated indexed column's value =3D=3D non-HOT
update". And that to determine whether an updated *projected* column's
value (i.e., expression index column's value) was actually updated we
need to calculate the previous and current index value, thus execute
the projection twice. But why would we have significant additional
overhead if there are no expression indexes, or when we can know by
bitmap overlap that the only interesting cases are summarizing
indexes?

I would've implemented this with (1) two new bitmaps, one each for
normal and summarizing indexes, each containing which columns are
exclusively used in expression indexes (and which should thus be used
to trigger the (comparatively) expensive recalculation).

Then, I'd maintain a (cached) list of unique projections/expressions
found in indexes, so that 30 indexes on e.g.
((mycolumn::jsonb)->>'metadata') only extend to 1 check for
differences, rather than 30. The "new" output of these expression
evaluations would be stored to be used later as index datums, reducing
the number of per-expression evaluations down to 2 at most, rather
than 2+1 when the index needs an insertion but the expression itself
wasn't updated.

So, it'd be something like (pseudocode):

if (bms_overlap(updated_columns, hotblocking))
    /* if columns only indexed through expressions were updated, do
expensive stuff. Otherwise, it's a normal non-HOT update. */
    if (bms_subset_compare(updated_columns, hot_expression_columns) in
(BMS_EQUAL, BMS_SUBSET1))
        expensive check for expression changes + populate index column data
    else
        normal_update
else if (bms_overlap(updated_columns, summarizing))
    /* same as above for hotblocking, but now summarizing */
    if (bms_subset_compare(updated_columns, sum_expression_columns) in
(BMS_EQUAL, BMS_SUBSET1))
        expensive check for summarized expression changes + populate
summarized index column data
    else
        summarizing_update
else
    hot_update

Note that it is relatively expensive to do check whether any one index
needs to be updated. It's generally cheaper to do all those checks at
once, where possible; using one or 2 more bitmaps would be sufficient.

Also note that this approach doesn't update specific summarizing
indexes, just all of them or none. I think that "update only
summarizing indexes that were updated" should be a separate patch from
"check if indexed expressions' values changed", potentially in the
patchset, but not as part of the main bulk.

> Fourth, I=E2=80=99d like to know which version the community prefers (v3 =
or v4).  I think v4 moves the code in a direction that is cleaner overall, =
but you may disagree.  I realize that the way I use the modified_indexes bi=
tmapset is a tad overloaded (NULL means all indexes should be updated, othe=
rwise only update the indexes in the set which may be all/some/none of the =
indexes) and that may violate the principal of least surprise but I feel th=
at it is better than the TU_UpdateIndexes enum in the code today.

I would be hesitant to let table AMs decide which indexes to update at
that precision. Note that this API would allow the AM to update only
(say) the PK index and no other indexes, which is not allowed to
happen if index consistentcy is required (which it is).

----->8-----

Do you have any documentation on the approaches used, and the specific
differences between v3 and v4? I don't see much of that in your
initial mail, and the patches themselves also don't show much of that
in their details. I'd like at least some documentation of the new
behaviour in src/backend/access/heap/README.HOT at some point before
this got marked as RFC in the commitfest app, though preferably sooner
rather than later.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)