public inbox for [email protected]
help / color / mirror / Atom feedRe: Expanding HOT updates for expression and partial indexes
24+ messages / 4 participants
[nested] [flat]
* Re: Expanding HOT updates for expression and partial indexes
@ 2026-02-16 19:36 Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Jeff Davis @ 2026-02-16 19:36 UTC (permalink / raw)
To: Greg Burd <[email protected]>; pgsql-hackers
On Sun, 2026-02-15 at 15:39 -0500, Greg Burd wrote:
> (2) Types don't influence this decision today. Their equality
> operators are not used, attributes are memcmp(). This is a
> requirement for index-only scans. I think it's insufficient for
> types like JSONB which have internal structure that can be extracted
> to form index keys, but that's for a later commit.
That's an interesting path but would require more infrastructure, and
to justify going down that path we should look for other opportunities
to use that type infra beyond just HOT. Brainstorming: maybe something
in the planner can make use of intelligence around expressions that are
effectively setter/accessor methods on complex types? (Obviously work
for later.)
> I can see how this might be confusing, you're asking a good
> question. Why not just add the mix_attrs as an argument to the table
> AM update call and be done?
>
> 1. Bitmapsets that are NULL mean empty, so how would
> simple_heap_update() signal to heap_update() that it needs to
> determine the modified indexed attributes? We'd have to add a bool
> along with the mix_attrs Bitmapset to indicate: "we've not calculated
> the set yet, you need to do that."
Maybe an extra bool is not ideal, but it's better than moving code onto
the wrong side of an API boundary.
> 2. After fetching the exclusive buffer lock there is the test
> `!ItemIdIsNormal(lp)` to cover the case where a simple_heap_update()
> the otid origin is the syscache, there is no pin or snapshot, and so
> there might be LP_* states other than LP_NORMAL due to concurrent
> pruning. This only happens when updating catalog tuples, so this
> logic need not be present at all in the heapam_tuple_update(). Yes,
> the if() branch will be fast (frequently predicted by the CPU) but
> this feels like logic specific to the update of catalog tuples.
If convenient I'm fine with moving that branch out, but I think it
needs to be done with the buffer locked (right?), so heap_update()
looks like the right place for that test for now.
> 3. HeapDetermineColumnsInfo() actually does more than find the
> modified indexed attributes, it also performs half of the check for
> the requirement to WAL log the replica identity attributes. The
> replacement function in the executor doesn't do this work, so that is
> coded into heapam_tuple_update() but not simple_heap_update(). The
> second half is in ExtractReplicaIdentity() that happens later in the
> heap_update() function after determining if HOT is a possibility or
> not.
IIUC you are saying that the decision is too heap-specific to expose to
the executor. I think that's true today (as ExtractReplicaIdentity() is
in heapam.c), but perhaps that's not fundamental: TOAST is not heap-
specific, replica IDs are not heap-specific, and if you are WAL-logging
a replica identity key it seems like you need to know whether it's
external or not regardless of the AM.
I'm not asking for a change here, just trying to understand the API
boundaries.
> I have moved these changes back into heap_update(), add the mix_attrs
> and mix_attrs_valid to see how things look, that's the attached
> patch.
Thank you -- that's easier to understand.
Why does simple_heap_update() need to do the HeapDetermineColumnsInfo()
inside heap_update()? It seems like you're trying to avoid doing the
same work the executor is doing to determine the modified_attrs bitmap,
but either (a) the work is cheap; or (b) the work to make the bitmap is
expensive.
If (a), then just construct the correct bitmap in simple_heap_update()
and simplify the code. If (b), then optimizing the simple_heap_update()
case isn't good enough, we need to find ways of avoiding the work in
the most common cases in the executor, as well.
Regards,
Jeff Davis
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-02-17 21:15 ` Greg Burd <[email protected]>
2026-02-19 20:32 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
0 siblings, 2 replies; 24+ messages in thread
From: Greg Burd @ 2026-02-17 21:15 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; +Cc: pgsql-hackers
On Mon, Feb 16, 2026, at 2:36 PM, Jeff Davis wrote:
> On Sun, 2026-02-15 at 15:39 -0500, Greg Burd wrote:
>
>> (2) Types don't influence this decision today. Their equality
>> operators are not used, attributes are memcmp(). This is a
>> requirement for index-only scans. I think it's insufficient for
>> types like JSONB which have internal structure that can be extracted
>> to form index keys, but that's for a later commit.
>
> That's an interesting path but would require more infrastructure, and
> to justify going down that path we should look for other opportunities
> to use that type infra beyond just HOT. Brainstorming: maybe something
> in the planner can make use of intelligence around expressions that are
> effectively setter/accessor methods on complex types? (Obviously work
> for later.)
Agreed, but this is my best idea at the moment to re-introduce $subjet into this patch set. I'm open to other ideas, but fundamentally to allow $subjet requires that somewhere we discover that while portions of the JSONB attribute changed during the update the resulting index key attributes extracted from the JSONB did not.
In all patches v1-v29 that was discovered by evaluating the index expressions on the old and new tuples and comparing the index key datum that resulted for equality. When equal, while the attribute changed the indexes need not opening the door for a HOT update. The overhead of that wasn't horrible, but it does change some expectations about how often expressions are evaluated and that might be confusing.
>> I can see how this might be confusing, you're asking a good
>> question. Why not just add the mix_attrs as an argument to the table
>> AM update call and be done?
>>
>> 1. Bitmapsets that are NULL mean empty, so how would
>> simple_heap_update() signal to heap_update() that it needs to
>> determine the modified indexed attributes? We'd have to add a bool
>> along with the mix_attrs Bitmapset to indicate: "we've not calculated
>> the set yet, you need to do that."
>
> Maybe an extra bool is not ideal, but it's better than moving code onto
> the wrong side of an API boundary.
I'm not sure this is an "API boundary", but I'm open to debate on that. The attached patch does add a bool to heap_update() in addition to the Bitmapset of modified/indexed attributes (modified_attrs) and IMO is fine although I don't like that it further complicates (muddies?) an already huge and highly complicated function, heap_update().
>> 2. After fetching the exclusive buffer lock there is the test
>> `!ItemIdIsNormal(lp)` to cover the case where a simple_heap_update()
>> the otid origin is the syscache, there is no pin or snapshot, and so
>> there might be LP_* states other than LP_NORMAL due to concurrent
>> pruning. This only happens when updating catalog tuples, so this
>> logic need not be present at all in the heapam_tuple_update(). Yes,
>> the if() branch will be fast (frequently predicted by the CPU) but
>> this feels like logic specific to the update of catalog tuples.
>
> If convenient I'm fine with moving that branch out, but I think it
> needs to be done with the buffer locked (right?), so heap_update()
> looks like the right place for that test for now.
>
>> 3. HeapDetermineColumnsInfo() actually does more than find the
>> modified indexed attributes, it also performs half of the check for
>> the requirement to WAL log the replica identity attributes. The
>> replacement function in the executor doesn't do this work, so that is
>> coded into heapam_tuple_update() but not simple_heap_update(). The
>> second half is in ExtractReplicaIdentity() that happens later in the
>> heap_update() function after determining if HOT is a possibility or
>> not.
>
> IIUC you are saying that the decision is too heap-specific to expose to
> the executor. I think that's true today (as ExtractReplicaIdentity() is
> in heapam.c), but perhaps that's not fundamental: TOAST is not heap-
> specific, replica IDs are not heap-specific, and if you are WAL-logging
> a replica identity key it seems like you need to know whether it's
> external or not regardless of the AM.
>
> I'm not asking for a change here, just trying to understand the API
> boundaries.
I'm on the fence on this one, maybe all table AMs will need this same logic but at the moment it feels very heap-specific to me. For now it lives in heap_update() and HeapDetermineColumnsInfo(). I think I called this out only to say that if you look at the new vs old method for computing modified_attrs you'll see that's missing and done later in heap_update().
>> I have moved these changes back into heap_update(), add the mix_attrs
>> and mix_attrs_valid to see how things look, that's the attached
>> patch.
>
> Thank you -- that's easier to understand.
Excellent.
I've muddied the review again a bit by abstracting out a function ExecCompareSlots() which can loosely be thought of as a replacement for heap_attr_equal(). Doing that let's me vastly simplify the replication worker changes and reuse it after ExecBRUpdateTriggers().
> Why does simple_heap_update() need to do the HeapDetermineColumnsInfo()
> inside heap_update()? It seems like you're trying to avoid doing the
> same work the executor is doing to determine the modified_attrs bitmap,
> but either (a) the work is cheap; or (b) the work to make the bitmap is
> expensive.
simple_heap_update() is exclusively called during catalog tuple updates and does not involve the executor at all, these are direct calls into heap to store catalog tuples. These updates don't preserve the set of updated attributes (see cf-6221 [0]) and so for them to potentially use the HOT path in heap_update() we need to identify which attributes changed. This requires comparing the new heap tuple vs the one read from the buffer page. This is the same logic as before the patch. If I could come up with a simple method to retain the set of modified attributes during catalog tuple updates then we could excise HeapDetermineColumnsInfo() entirely.
> If (a), then just construct the correct bitmap in simple_heap_update()
> and simplify the code. If (b), then optimizing the simple_heap_update()
> case isn't good enough, we need to find ways of avoiding the work in
> the most common cases in the executor, as well.
Construct the correct bitmap how? That function is called with the otid and the updated HeapTuple, not enough to build the bitmap of modified indexed attributes.
I said in an earlier email that the results for the modified/indexed bitmap and the replica identity are identical. I added some test code (messy, but attached for your amusement should you want) that looks for differences between the two. It turns out that for replica identity the results are identical, for modified/indexed attributes there are a few differences.
In brin.sql at 409:
UPDATE brintest SET int8col = int8col * int4col;
This updates the indexed value of the int8col (attribute 4) which is correctly detected in the new code, but isn't in HeapDetermineColumnsInfo(). The "interesting_cols" are identical. I'm still investigating.
I'm also working on a good benchmark that hopefully can show that under heavy concurrent read/update mixed load the reduced exclusive lock time will allow more work to be performed. Some contrived benchmarks have shown 15% improvement, but I need to zero in on this before publishing results.
> Regards,
> Jeff Davis
Attached is a new version of the patch. The file heap-check.c contains the code I had been using to test for equal results across methods. That code is... well, it's just for me but I include it to show that I'm doing due diligence to ensure this stuff is either the same, or different for a good reason. Thanks for your continued interest in this work.
best.
-greg
[0] https://commitfest.postgresql.org/patch/6221/
Attachments:
[text/x-patch] v20260217-0001-Idenfity-modified-indexed-attributes-in-th.patch (31.6K, 2-v20260217-0001-Idenfity-modified-indexed-attributes-in-th.patch)
download | inline diff:
From b7cf5928cf400cc66ac8f6ecddd37362f46da386 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v20260217] Idenfity modified indexed attributes in the
executor on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update().
ExecCheckIndexedAttrsForChanges() replaces HeapDeterminesColumnsInfo()
and is called before table_tuple_update() crucially without the need
for an exclusive buffer lock on the page that holds the tuple being
updated. This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecCheckIndexedAttrsForChanges() in turn uses ExecCompareSlots() to
identify which attributes have changed and then intersects that with the
set of indexed attributes to identify the modified indexed set.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involed in the decision to include the replica identity key or not.
This now happens in heap_update() when modified_attrs_valid is false.
Catalog tuple updates use simple_heap_update() and don't pass a
modified_attrs Bitmapset into heap_update() indicated by the
modified_attrs_valid bool set to false.
Updates stemming from logical replication also use the new
ExecCheckIndexedAttrsForChanges() in ExecSimpleRelationUpdate().
Before row triggers on UPDATE may use heap_modify_tuple() to update
attributes not identified by ExecGetAllUpdatedCols() as is the case in
tsvector_update_trigger(). ExecBRUpdateTriggers() now identifies this
case and adds such attributes to ri_extraUpdatedCols. See tsearch.sql
tests with this trigger for an example of this.
---
src/backend/access/heap/heapam.c | 80 +++++++++++++++++++++---
src/backend/access/heap/heapam_handler.c | 7 +--
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 15 ++++-
src/backend/executor/execReplication.c | 12 +++-
src/backend/executor/execTuples.c | 74 ++++++++++++++++++++++
src/backend/executor/nodeModifyTable.c | 72 ++++++++++++++++++---
src/backend/replication/logical/worker.c | 15 +++--
src/backend/utils/cache/relcache.c | 15 +++++
src/include/access/heapam.h | 1 +
src/include/access/tableam.h | 8 ++-
src/include/catalog/index.h | 1 +
src/include/executor/executor.h | 10 +++
src/include/nodes/execnodes.h | 6 ++
src/include/utils/rel.h | 1 +
src/include/utils/relcache.h | 1 +
16 files changed, 287 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98d53caeea8..ab8b6ddb8de 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3311,6 +3311,7 @@ TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, bool modified_attrs_valid,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -3320,7 +3321,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
Bitmapset *key_attrs;
Bitmapset *id_attrs;
Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3345,7 +3345,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3487,9 +3487,69 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* new tuple so we must include it as part of the old_key_tuple. See
* ExtractReplicaIdentity.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ if (!modified_attrs_valid)
+ {
+ bool id_has_external = false;
+
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
+ rep_id_key_required = id_has_external ||
+ bms_overlap(modified_attrs, id_attrs);
+ }
+ else
+ {
+ /*
+ * ExtractReplicatIdentity() needs to know if a modified attrbute is
+ * used as a replica indentity or if any of the unmodified indexed
+ * attributes in the old tuple are stored externally and used as a
+ * replica identity.
+ */
+ rep_id_key_required = bms_overlap(modified_attrs, id_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /* Check all unmodified indexed replica identity key attributes */
+ attrs = bms_difference(interesting_attrs, modified_attrs);
+ attrs = bms_int_members(attrs, id_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute
+ * number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into interesting_attrs in
+ * relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3763,7 +3823,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return result;
}
@@ -4111,8 +4171,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4278,7 +4337,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return TM_Ok;
@@ -4562,7 +4621,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
result = heap_update(relation, otid, tup,
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ NULL, false, update_indexes);
switch (result)
{
case TM_SelfModified:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cbef73e5d4b..2d74fa90c7f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -312,12 +312,11 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
}
-
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
@@ -328,7 +327,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, lockmode, modified_attrs, true, update_indexes);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 8df915f63fb..78c063939b0 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3129,17 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ modifiedCols = ExecCompareSlots(relinfo, tupdesc, updatedCols,
+ NULL, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 743b1ee2b28..2122fc33554 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -899,6 +900,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -937,8 +939,16 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo,
+ estate, searchslot, slot);
+
+ /*
+ * We're not going to call ExecCheckIndexedAttrsForChanges here
+ * because we've already identified the changes earlier on thanks to
+ * slot_modify_data.
+ */
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..9ad08b24c2f 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,79 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlots
+ *
+ * Compare old and new TupleTableSlots to detect which attributes have changed.
+ *
+ * This function serves two purposes:
+ * 1) After trigger execution: detect trigger-modified columns
+ * (pass excluding=explicitly SET columns, including=NULL)
+ * 2) Index maintenance: detect changes in indexed columns
+ * (pass including=indexed AND explicitly SET columns, excluding=NULL)
+ *
+ * Parameters:
+ * resultRelInfo - relation information
+ * excluding - bitmapset of attributes to skip
+ * including - bitmapset of attributes to check
+ * tupdesc - RelationGetDescr(relation)
+ * old_tts - old tuple slot
+ * new_tts - new tuple slot
+ *
+ * If including is NULL, check all attributes EXCEPT those in excluding
+ * If excluding is NULL, check ONLY attributes in including
+ * If both are NULL, check all attributes
+ *
+ * Returns a Bitmapset of attribute indices (using FirstLowInvalidHeapAttributeNumber
+ * convention) that differ between the two slots.
+ */
+Bitmapset *
+ExecCompareSlots(ResultRelInfo *resultRelInfo,
+ TupleDesc tupdesc,
+ const Bitmapset *excluding,
+ const Bitmapset *including,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Bitmapset *modified_attrs = NULL;
+
+ for (int i = 0; i < tupdesc->natts; i++)
+ {
+ AttrNumber attnum = i + 1;
+ AttrNumber attidx = attnum - FirstLowInvalidHeapAttributeNumber;
+ Datum old_value,
+ new_value;
+ bool old_null,
+ new_null;
+ CompactAttribute *att;
+
+ /* Determine whether to check this attribute */
+ if ((!bms_is_empty(including) && !bms_is_member(attidx, including)) ||
+ bms_is_member(attidx, excluding))
+ continue;
+
+ att = TupleDescCompactAttr(tupdesc, attnum - 1);
+ old_value = slot_getattr(old_tts, attnum, &old_null);
+ new_value = slot_getattr(new_tts, attnum, &new_null);
+
+ /* A change to/from NULL, so not equal */
+ if (old_null != new_null)
+ {
+ modified_attrs = bms_add_member(modified_attrs, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (old_null)
+ continue;
+
+ if (!datum_image_eq(old_value, new_value, att->attbyval, att->attlen))
+ modified_attrs = bms_add_member(modified_attrs, attidx);
+ }
+
+ return modified_attrs;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 6802fc13e95..afcc3da2835 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecCheckIndexedAttrsForChanges - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,44 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecCheckIndexedAttrsForChanges
+ *
+ * Determine which indexes need updating by finding the set of modified indexed
+ * attributes.
+ *
+ * The goal is for the executor to know, ahead of calling into the table AM to
+ * process the update and before calling into the index AM for inserting new
+ * index tuples, which attributes in the new TupleTableSlot, if any, truely
+ * necessitate a new index tuple.
+ *
+ * Returns a Bitmapset of attributes that intersects with indexes which require
+ * a new index tuple.
+ */
+Bitmapset *
+ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *upd_attrs = NULL; /* attributes set in UPDATE statement */
+ Bitmapset *mix_attrs;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /* Fetch the set of attributes explicity SET in the UPDATE statement */
+ upd_attrs = ExecGetAllUpdatedCols(resultRelInfo, estate);
+
+ /* Find out if any changed */
+ mix_attrs = ExecCompareSlots(resultRelInfo, tupdesc,
+ NULL, upd_attrs, old_tts, new_tts);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2197,14 +2237,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2321,7 +2364,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2335,6 +2387,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2553,8 +2606,9 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3404,8 +3458,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -3430,6 +3484,7 @@ lmerge_matched:
tupleid, NULL, newslot);
mtstate->mt_merge_updated += 1;
}
+
break;
case CMD_DELETE:
@@ -4537,7 +4592,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
@@ -4717,6 +4772,7 @@ ExecModifyTable(PlanState *pstate)
/* Now apply the update. */
slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
oldSlot, slot, node->canSetTag);
+
if (tuplock)
UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
InplaceUpdateTupleLock);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 32725c48623..2d6d9d76f87 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -285,12 +285,14 @@
#include "storage/procarray.h"
#include "tcop/tcopprot.h"
#include "utils/acl.h"
+#include "utils/datum.h"
#include "utils/guc.h"
#include "utils/inval.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
@@ -2917,7 +2919,6 @@ apply_handle_update_internal(ApplyExecutionData *edata,
TupleTableSlot *localslot = NULL;
ConflictTupleInfo conflicttuple = {0};
bool found;
- MemoryContext oldctx;
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
ExecOpenIndices(relinfo, false);
@@ -2956,16 +2957,16 @@ apply_handle_update_internal(ApplyExecutionData *edata,
}
/* Process and store remote tuple in the slot */
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
slot_modify_data(remoteslot, localslot, relmapentry, newtup);
- MemoryContextSwitchTo(oldctx);
EvalPlanQualSetSlot(&epqstate, remoteslot);
InitConflictIndexes(relinfo);
- /* Do the actual update. */
+ /* First check privileges */
TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_UPDATE);
+
+ /* Then do the actual update. */
ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
remoteslot);
}
@@ -3522,10 +3523,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
* Apply the update to the local tuple, putting the result in
* remoteslot_part.
*/
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- slot_modify_data(remoteslot_part, localslot, part_entry,
- newtup);
- MemoryContextSwitchTo(oldctx);
+ slot_modify_data(remoteslot_part, localslot, part_entry, newtup);
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
@@ -3549,6 +3547,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
EvalPlanQualSetSlot(&epqstate, remoteslot_part);
TargetPrivilegesCheck(partrelinfo->ri_RelationDesc,
ACL_UPDATE);
+
ExecSimpleRelationUpdate(partrelinfo, estate, &epqstate,
localslot, remoteslot_part);
}
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..547cf1d054d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2477,6 +2477,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_idattr);
bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5278,6 +5279,7 @@ RelationGetIndexPredicate(Relation relation)
* index (empty if FULL)
* INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
* INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5301,6 +5303,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
Bitmapset *summarizedattrs; /* columns with summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
@@ -5324,6 +5327,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5368,6 +5373,7 @@ restart:
idindexattrs = NULL;
hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5500,10 +5506,14 @@ restart:
bms_free(idindexattrs);
bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ /* indexedattrs not yet initialized */
goto restart;
}
+ /* Set indexed attributes to track all referenced attributes */
+ indexedattrs = bms_union(hotblockingattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5516,6 +5526,8 @@ restart:
relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5530,6 +5542,7 @@ restart:
relation->rd_idattr = bms_copy(idindexattrs);
relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5546,6 +5559,8 @@ restart:
return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..a56f3d1f378 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -368,6 +368,7 @@ extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, bool mix_attrs_valid,
TU_UpdateIndexes *update_indexes);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 251379016b0..3b080aa3711 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *updated_cols,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1524,12 +1525,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_cols, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_cols, update_indexes);
}
/*
@@ -2010,6 +2011,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b259c4141ed..14a39beab6e 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -132,6 +132,7 @@ extern bool CompareIndexInfo(const IndexInfo *info1, const IndexInfo *info2,
const AttrMap *attmap);
extern void BuildSpeculativeIndexInfo(Relation index, IndexInfo *ii);
+extern void BuildUpdateIndexInfo(ResultRelInfo *resultRelInfo);
extern void FormIndexDatum(IndexInfo *indexInfo,
TupleTableSlot *slot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 55a7d930d26..6e263278a4e 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -606,6 +606,12 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlots(ResultRelInfo *resultRelInfo,
+ TupleDesc tupdesc,
+ const Bitmapset *excluding,
+ const Bitmapset *including,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -800,5 +806,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecCheckIndexedAttrsForChanges(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..13284dbd70b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -502,6 +502,12 @@ typedef struct ResultRelInfo
/* true if the above has been computed */
bool ri_extraUpdatedCols_valid;
+ /*
+ * For UPDATE a Bitmapset of the attributes that are both indexed and have
+ * changed in value.
+ */
+ Bitmapset *ri_ChangedIndexedCols;
+
/* Projection to generate new tuple in an INSERT/UPDATE */
ProjectionInfo *ri_projectNew;
/* Slot to hold that tuple */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..df5426fd7fb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -164,6 +164,7 @@ typedef struct RelationData
Bitmapset *rd_idattr; /* included in replica identity index */
Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..5834ab7b903 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -71,6 +71,7 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_IDENTITY_KEY,
INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
--
2.51.2
[text/x-csrc] check-heapam.c (5.1K, 3-check-heapam.c)
download | inline:
#if 0
if (modified_attrs_valid)
{
bool id_key = false;
bool hdci_id_key_req;
Bitmapset *hdci_attrs = HeapDetermineColumnsInfo(relation, hot_attrs, id_attrs,
&oldtup, newtup, &id_key);
/* hdci_id_key_req = bms_overlap(modified_attrs, hdci_attrs) || id_key; */
/* Assert(bms_equal(modified_attrs, hdci_attrs)); */
/* Assert(rep_id_key_required == hdci_id_key_req); */
Assert(id_key == id_has_external);
}
#endif
#if 0
if (modified_attrs_valid)
{
bool id_key = false;
bool hdci_id_key_req;
Bitmapset *hdci_attrs = HeapDetermineColumnsInfo(relation, hot_attrs, id_attrs,
&oldtup, newtup, &id_key);
hdci_id_key_req = bms_overlap(modified_attrs, hdci_attrs) || id_key;
/* Compare bitmapsets and log differences */
if (!bms_equal(modified_attrs, hdci_attrs))
{
Bitmapset *only_in_modified = bms_difference(modified_attrs, hdci_attrs);
Bitmapset *only_in_hdci = bms_difference(hdci_attrs, modified_attrs);
int attidx = -1;
TupleDesc tupdesc = RelationGetDescr(relation);
elog(WARNING, "Bitmapset mismatch in HeapDetermineColumnsInfo for relation %s",
RelationGetRelationName(relation));
elog(WARNING, " ExecCheckIndexedAttrsForChanges: %s", bmsToString(modified_attrs));
elog(WARNING, " HeapDetermineColumnsInfo: %s", bmsToString(hdci_attrs));
/* Log and compare attributes only in modified_attrs */
attidx = -1;
if (bms_num_members(only_in_modified) > 0)
{
elog(WARNING, " Attributes reported changed by ExecCheckIndexedAttrsForChanges but not HeapDetermineColumnsInfo:");
while ((attidx = bms_next_member(only_in_modified, attidx)) >= 0)
{
AttrNumber attnum = attidx + FirstLowInvalidHeapAttributeNumber;
if (attnum > 0 && attnum <= tupdesc->natts)
{
Form_pg_attribute att = TupleDescAttr(tupdesc, attnum - 1);
Datum old_val,
new_val;
bool old_isnull,
new_isnull;
old_val = heap_getattr(&oldtup, attnum, tupdesc, &old_isnull);
new_val = heap_getattr(newtup, attnum, tupdesc, &new_isnull);
if (old_isnull != new_isnull)
{
elog(WARNING, " %s (attnum %d): NULL status differs (old=%s, new=%s) - CORRECT DETECTION",
NameStr(att->attname), attnum,
old_isnull ? "NULL" : "NOT NULL",
new_isnull ? "NULL" : "NOT NULL");
}
else if (!old_isnull && !new_isnull)
{
if (!datum_image_eq(old_val, new_val, att->attbyval, att->attlen))
{
elog(WARNING, " %s (attnum %d): binary values differ - CORRECT DETECTION",
NameStr(att->attname), attnum);
}
else
{
elog(WARNING, " %s (attnum %d): binary values are IDENTICAL - FALSE POSITIVE (HeapDetermineColumnsInfo is correct)",
NameStr(att->attname), attnum);
}
}
else
{
elog(WARNING, " %s (attnum %d): both NULL - FALSE POSITIVE (HeapDetermineColumnsInfo is correct)",
NameStr(att->attname), attnum);
}
}
}
}
/* Log and compare attributes only in hdci_attrs */
attidx = -1;
if (bms_num_members(only_in_hdci) > 0)
{
elog(WARNING, " Attributes reported changed by HeapDetermineColumnsInfo but not ExecCheckIndexedAttrsForChanges:");
while ((attidx = bms_next_member(only_in_hdci, attidx)) >= 0)
{
AttrNumber attnum = attidx + FirstLowInvalidHeapAttributeNumber;
if (attnum > 0 && attnum <= tupdesc->natts)
{
Form_pg_attribute att = TupleDescAttr(tupdesc, attnum - 1);
Datum old_val,
new_val;
bool old_isnull,
new_isnull;
old_val = heap_getattr(&oldtup, attnum, tupdesc, &old_isnull);
new_val = heap_getattr(newtup, attnum, tupdesc, &new_isnull);
if (old_isnull != new_isnull)
{
elog(WARNING, " %s (attnum %d): NULL status differs (old=%s, new=%s) - CORRECT DETECTION",
NameStr(att->attname), attnum,
old_isnull ? "NULL" : "NOT NULL",
new_isnull ? "NULL" : "NOT NULL");
}
else if (!old_isnull && !new_isnull)
{
if (!datum_image_eq(old_val, new_val, att->attbyval, att->attlen))
{
elog(WARNING, " %s (attnum %d): binary values differ - CORRECT DETECTION",
NameStr(att->attname), attnum);
}
else
{
elog(WARNING, " %s (attnum %d): binary values are IDENTICAL - FALSE NEGATIVE (ExecCheckIndexedAttrsForChanges is correct)",
NameStr(att->attname), attnum);
}
}
else
{
elog(WARNING, " %s (attnum %d): both NULL - FALSE NEGATIVE (ExecCheckIndexedAttrsForChanges is correct)",
NameStr(att->attname), attnum);
}
}
}
}
bms_free(only_in_modified);
bms_free(only_in_hdci);
}
/* Compare replica identity logic */
if (rep_id_key_required != hdci_id_key_req)
{
elog(WARNING, "Replica identity logic mismatch for relation %s",
RelationGetRelationName(relation));
elog(WARNING, " ExecCheckIndexedAttrsForChanges (rep_id_key_required): %s",
rep_id_key_required ? "TRUE" : "FALSE");
elog(WARNING, " HeapDetermineColumnsInfo (hdci_id_key_req): %s",
hdci_id_key_req ? "TRUE" : "FALSE");
}
}
#endif
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-02-19 20:32 ` Greg Burd <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Greg Burd @ 2026-02-19 20:32 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; +Cc: pgsql-hackers
Hello,
This is an updated version of the last patch with a few fixes and a layer on top of it that tries to cleanup heap_update().
v20260219:
0001 - Not much changed in this patch, some clean up and fixed a few mistakes. This patch passes all tests without any need to modify them. I've added ExecCompareSlotAttrs() helper function.
0002 - As before in v29 I've split off the top half of heap_update() and moved that into heapam_tuple_update() and simple_heap_update(). I've created helper functions for different steps in these early stages: HeapUpdateHotAllowable(), HeapUpdateRequiresReplicaId(), HeapUpdateDetermineLockmode(). This allows for a cleaner set of bitmaps and logic (as I read it) on the heapam_tuple_update() path. I reuse these helper functions in simple_heap_update() when possible, even trimming up HeapDeterminColumnsInfo() a bit so as to reuse HeapUpdateRequiresReplicaId().
I've tested with code that validates in heapam_tuple_update() that the modified attr bitmaps are identical:
{
Bitmapset *hot_attrs = RelationGetIndexAttrBitmap(relation,
INDEX_ATTR_BITMAP_INDEXED);
Bitmapset *id_attrs = RelationGetIndexAttrBitmap(relation,
INDEX_ATTR_BITMAP_IDENTITY_KEY);
Bitmapset *hdci_attrs = HeapDetermineColumnsInfo(relation, hot_attrs,
&oldtup, tuple);
Assert(bms_equal(mix_attrs, hdci_attrs));
bms_free(hot_attrs);
bms_free(id_attrs);
bms_free(hdci_attrs);
}
Despite that passing two tests became non-deterministic without an "ORDER BY" on a select. You'll see those in generated_virtual.sql and updatable_views.sql in the second patch. I don't know yet why that happened, but the results are otherwise identical.
I will continue to performance test. Most things I've tried differ by less than 0.5% before/after this patch. Some operations where multiple rows are matched in an UPDATE and there are concurrent reads are faster (10-20%), I need to dig into this. I'm expanding the tests I'm running to try to find any cases where holding the buffer lock for less time could possibly result in higher TPS. The goals for $subject were faster TPS, but more importantly to lower index bloat and help shorten vacuum times.
Next up I'll work on re-introducing some of the other work from $subject and other changes in v29 and earlier patch sets.
* avoid the need for index_unchanged_by_update()
* re-add the new index AM function, "amcomparedatums()" or similar
* add a flag for types indicating that they have "sub-attributes" (JSONB, XML, ARRAY)
* store in pg_attribute during CREATE INDEX when types with sub-attributes are in expressions the relation and some representation of what the sub-attribute is
* update JSONB functions that mutate content to use the pg_attribute information and record if there were changes to indexed "sub-attributes" or not
* use that recorded information later in the executor to identify if the indexed sub-attribute changed or not opening the door for $subject without evaluating the before/after expressions
* re-examine partial indexes as well
* consider how one might layer a PHOT/WARM-thingie on this... (in a different thread in the future, like next year)
best.
-greg
Attachments:
[text/x-patch] v20260219-0001-Idenfity-modified-indexed-attributes-in-th.patch (30.5K, 2-v20260219-0001-Idenfity-modified-indexed-attributes-in-th.patch)
download | inline diff:
From c996ab0e27724ff175a24da26603ec3c81a57d40 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v20260219 1/2] Idenfity modified indexed attributes in the
executor on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update().
ExecCheckIndexedAttrsForChanges() replaces HeapDeterminesColumnsInfo()
and is called before table_tuple_update() crucially without the need
for an exclusive buffer lock on the page that holds the tuple being
updated. This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecCheckIndexedAttrsForChanges() in turn uses ExecCompareSlotAttrs() to
identify which attributes have changed and then intersects that with the
set of indexed attributes to identify the modified indexed set.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involed in the decision to include the replica identity key or not.
This now happens in heap_update() when modified_attrs_valid is false.
Catalog tuple updates use simple_heap_update() and don't pass a
modified_attrs Bitmapset into heap_update() indicated by the
modified_attrs_valid bool set to false.
Updates stemming from logical replication also use the new
ExecCheckIndexedAttrsForChanges() in ExecSimpleRelationUpdate().
Before row triggers on UPDATE may use heap_modify_tuple() to update
attributes not identified by ExecGetAllUpdatedCols() as is the case in
tsvector_update_trigger(). ExecBRUpdateTriggers() now identifies
changes to indexed columns not found by ExecGetAllUpdateCols()
and adds their attributes to ri_extraUpdatedCols. See
tsearch.sql tests for an example of this.
---
src/backend/access/heap/heapam.c | 80 +++++++++++++++++---
src/backend/access/heap/heapam_handler.c | 7 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 20 ++++-
src/backend/executor/execReplication.c | 12 ++-
src/backend/executor/execTuples.c | 93 ++++++++++++++++++++++++
src/backend/executor/nodeModifyTable.c | 78 ++++++++++++++++++--
src/backend/replication/logical/worker.c | 10 +--
src/backend/utils/cache/relcache.c | 15 ++++
src/include/access/heapam.h | 1 +
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 8 ++
src/include/utils/rel.h | 1 +
src/include/utils/relcache.h | 1 +
14 files changed, 303 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98d53caeea8..ab8b6ddb8de 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3311,6 +3311,7 @@ TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, bool modified_attrs_valid,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -3320,7 +3321,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
Bitmapset *key_attrs;
Bitmapset *id_attrs;
Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3345,7 +3345,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3487,9 +3487,69 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* new tuple so we must include it as part of the old_key_tuple. See
* ExtractReplicaIdentity.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ if (!modified_attrs_valid)
+ {
+ bool id_has_external = false;
+
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
+ rep_id_key_required = id_has_external ||
+ bms_overlap(modified_attrs, id_attrs);
+ }
+ else
+ {
+ /*
+ * ExtractReplicatIdentity() needs to know if a modified attrbute is
+ * used as a replica indentity or if any of the unmodified indexed
+ * attributes in the old tuple are stored externally and used as a
+ * replica identity.
+ */
+ rep_id_key_required = bms_overlap(modified_attrs, id_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /* Check all unmodified indexed replica identity key attributes */
+ attrs = bms_difference(interesting_attrs, modified_attrs);
+ attrs = bms_int_members(attrs, id_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute
+ * number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into interesting_attrs in
+ * relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3763,7 +3823,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return result;
}
@@ -4111,8 +4171,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4278,7 +4337,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return TM_Ok;
@@ -4562,7 +4621,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
result = heap_update(relation, otid, tup,
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ NULL, false, update_indexes);
switch (result)
{
case TM_SelfModified:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index cbef73e5d4b..2d74fa90c7f 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -312,12 +312,11 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
}
-
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
@@ -328,7 +327,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, lockmode, modified_attrs, true, update_indexes);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 8df915f63fb..62879ad3b4e 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..63a53d88a82 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,16 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo,
+ estate, searchslot, slot);
+
+ /*
+ * We're not going to call ExecCheckIndexedAttrsForChanges here
+ * because we've already identified the changes earlier on thanks to
+ * slot_modify_data.
+ */
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..e95dde2df2e 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,98 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlots
+ *
+ * Compare old and new TupleTableSlots to detect which attributes have changed.
+ *
+ * This function serves two purposes:
+ * 1) After trigger execution: detect trigger-modified columns
+ * (pass excluding=explicitly SET columns, including=NULL)
+ * 2) Index maintenance: detect changes in indexed columns
+ * (pass including=indexed AND explicitly SET columns, excluding=NULL)
+ *
+ * Parameters:
+ * resultRelInfo - relation information
+ * excluding - bitmapset of attributes to skip
+ * including - bitmapset of attributes to check
+ * tupdesc - RelationGetDescr(relation)
+ * old_tts - old tuple slot
+ * new_tts - new tuple slot
+ *
+ * If including is NULL, check all attributes EXCEPT those in excluding
+ * If excluding is NULL, check ONLY attributes in including
+ * If both are NULL, check all attributes
+ *
+ * Returns a Bitmapset of attribute indices (using FirstLowInvalidHeapAttributeNumber
+ * convention) that differ between the two slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum old_value,
+ new_value;
+ bool old_null,
+ new_null;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ old_value = slot_getattr(old_tts, attrnum, &old_null);
+ new_value = slot_getattr(new_tts, attrnum, &new_null);
+
+ /* A change to/from NULL, so not equal */
+ if (old_null != new_null)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (old_null)
+ continue;
+
+ if (!datum_image_eq(old_value, new_value, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..18796baed28 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecCheckIndexedAttrsForChanges - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,50 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecCheckIndexedAttrsForChanges
+ *
+ * Determine which indexes need updating by finding the set of modified indexed
+ * attributes.
+ *
+ * The goal is for the executor to know, ahead of calling into the table AM to
+ * process the update and before calling into the index AM for inserting new
+ * index tuples, which attributes in the new TupleTableSlot, if any, truely
+ * necessitate a new index tuple.
+ *
+ * Returns a Bitmapset of attributes that intersects with indexes which require
+ * a new index tuple.
+ */
+Bitmapset *
+ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *mix_attrs;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) and get
+ * the subset that are also indexed.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /* Find out which, if any, modified indexed attributes changed */
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2241,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2368,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2391,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2614,9 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3466,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -3432,6 +3492,7 @@ lmerge_matched:
tupleid, NULL, newslot);
mtstate->mt_merge_updated += 1;
}
+
break;
case CMD_DELETE:
@@ -4539,7 +4600,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
@@ -4719,6 +4780,7 @@ ExecModifyTable(PlanState *pstate)
/* Now apply the update. */
slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
oldSlot, slot, node->canSetTag);
+
if (tuplock)
UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
InplaceUpdateTupleLock);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 32725c48623..0b044154195 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2917,7 +2917,6 @@ apply_handle_update_internal(ApplyExecutionData *edata,
TupleTableSlot *localslot = NULL;
ConflictTupleInfo conflicttuple = {0};
bool found;
- MemoryContext oldctx;
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
ExecOpenIndices(relinfo, false);
@@ -2956,15 +2955,13 @@ apply_handle_update_internal(ApplyExecutionData *edata,
}
/* Process and store remote tuple in the slot */
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
slot_modify_data(remoteslot, localslot, relmapentry, newtup);
- MemoryContextSwitchTo(oldctx);
EvalPlanQualSetSlot(&epqstate, remoteslot);
InitConflictIndexes(relinfo);
- /* Do the actual update. */
+ /* First check privileges */
TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_UPDATE);
ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
remoteslot);
@@ -3522,10 +3519,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
* Apply the update to the local tuple, putting the result in
* remoteslot_part.
*/
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- slot_modify_data(remoteslot_part, localslot, part_entry,
- newtup);
- MemoryContextSwitchTo(oldctx);
+ slot_modify_data(remoteslot_part, localslot, part_entry, newtup);
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..547cf1d054d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2477,6 +2477,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_idattr);
bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5278,6 +5279,7 @@ RelationGetIndexPredicate(Relation relation)
* index (empty if FULL)
* INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
* INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5301,6 +5303,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
Bitmapset *summarizedattrs; /* columns with summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
@@ -5324,6 +5327,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5368,6 +5373,7 @@ restart:
idindexattrs = NULL;
hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5500,10 +5506,14 @@ restart:
bms_free(idindexattrs);
bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ /* indexedattrs not yet initialized */
goto restart;
}
+ /* Set indexed attributes to track all referenced attributes */
+ indexedattrs = bms_union(hotblockingattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5516,6 +5526,8 @@ restart:
relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5530,6 +5542,7 @@ restart:
relation->rd_idattr = bms_copy(idindexattrs);
relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5546,6 +5559,8 @@ restart:
return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..a56f3d1f378 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -368,6 +368,7 @@ extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, bool mix_attrs_valid,
TU_UpdateIndexes *update_indexes);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 251379016b0..3b080aa3711 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *updated_cols,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1524,12 +1525,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_cols, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_cols, update_indexes);
}
/*
@@ -2010,6 +2011,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..7b0019fe15b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -606,6 +606,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -803,5 +807,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecCheckIndexedAttrsForChanges(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..df5426fd7fb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -164,6 +164,7 @@ typedef struct RelationData
Bitmapset *rd_idattr; /* included in replica identity index */
Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..5834ab7b903 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -71,6 +71,7 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_IDENTITY_KEY,
INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
--
2.51.2
[text/x-patch] v20260219-0002-Refactor-heap_update-and-move-attribute-de.patch (63.0K, 3-v20260219-0002-Refactor-heap_update-and-move-attribute-de.patch)
download | inline diff:
From 9aba4466008a79556e31570b3a6bd94451007daa Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Thu, 19 Feb 2026 14:08:17 -0500
Subject: [PATCH v20260219 2/2] Refactor heap_update() and move attribute
determination into callers
This refactoring relocates column modification determination from
heap_update() to its callers (simple_heap_update and
heapam_tuple_update), moving the logic upstream to executor/handler
level.
- Remove modified_attrs and modified_attrs_valid parameters from
heap_update()
- Extract buffer management (pin, lock, page fetch) into
simple_heap_update() and heapam_tuple_update() before calling
heap_update()
- Create helper functions: HeapUpdateHotAllowable(),
HeapUpdateRequiresReplicaId(), HeapUpdateDetermineLockmode()
- Pass pre-calculated attributes (hot_allowed, rep_id_key_required, and
lockmode) to heap_update() instead of deriving them within function
- Add ORDER BY clauses to generated_virtual.sql and updatable_views.sql
to ensure deterministic result ordering (XXX: under review...)
---
src/backend/access/heap/heapam.c | 775 +++++++++---------
src/backend/access/heap/heapam_handler.c | 105 ++-
src/backend/executor/execTuples.c | 55 +-
src/backend/executor/nodeModifyTable.c | 7 +-
src/backend/utils/cache/relcache.c | 35 +-
src/include/access/heapam.h | 25 +-
src/include/utils/rel.h | 1 -
src/include/utils/relcache.h | 1 -
.../regress/expected/generated_virtual.out | 4 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 4 +-
src/test/regress/sql/updatable_views.sql | 2 +-
12 files changed, 539 insertions(+), 479 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ab8b6ddb8de..93fd714ce58 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -29,6 +29,8 @@
*
*-------------------------------------------------------------------------
*/
+#include "access/sysattr.h"
+#include "nodes/lockoptions.h"
#include "postgres.h"
#include "access/heapam.h"
@@ -37,6 +39,7 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
@@ -51,6 +54,7 @@
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -62,16 +66,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
#ifdef USE_ASSERT_CHECKING
-static void check_lock_if_inplace_updateable_rel(Relation relation,
- const ItemPointerData *otid,
- HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3300,7 +3296,10 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
+ *
+ * It's required that the caller has acquired the pin and lock on the buffer.
+ * That lock and pin will be managed here, not in the caller.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3308,30 +3307,19 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* generated by another transaction).
*/
TM_Result
-heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
+heap_update(Relation relation, HeapTupleData *oldtup, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *modified_attrs, bool modified_attrs_valid,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode *lockmode,
+ Buffer buffer, Page page, BlockNumber block, ItemId lp,
+ bool hot_allowed, Buffer vmbuffer, bool rep_id_key_required)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- ItemId lp;
- HeapTupleData oldtup;
HeapTuple heaptup;
HeapTuple old_key_tuple = NULL;
bool old_key_copied = false;
- Page page;
- BlockNumber block;
MultiXactStatus mxact_status;
- Buffer buffer,
- newbuf,
- vmbuffer = InvalidBuffer,
+ Buffer newbuf,
vmbuffer_new = InvalidBuffer;
bool need_toast;
Size newtupsize,
@@ -3339,13 +3327,11 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3353,204 +3339,13 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
infomask_new_tuple,
infomask2_new_tuple;
- Assert(ItemPointerIsValid(otid));
-
- /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
- Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
- RelationGetNumberOfAttributes(relation));
-
+ Assert(BufferIsLockedByMe(buffer));
+ Assert(ItemIdIsNormal(lp));
AssertHasSnapshotForToast(relation);
- /*
- * Forbid this during a parallel operation, lest it allocate a combo CID.
- * Other workers might need that combo CID for visibility checks, and we
- * have no provision for broadcasting it to them.
- */
- if (IsInParallelMode())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot update tuples during a parallel operation")));
-
-#ifdef USE_ASSERT_CHECKING
- check_lock_if_inplace_updateable_rel(relation, otid, newtup);
-#endif
-
- /*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
- *
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
-
- block = ItemPointerGetBlockNumber(otid);
- INJECTION_POINT("heap_update-before-pin", NULL);
- buffer = ReadBuffer(relation, block);
- page = BufferGetPage(buffer);
-
- /*
- * Before locking the buffer, pin the visibility map page if it appears to
- * be necessary. Since we haven't got the lock yet, someone else might be
- * in the middle of changing this, so we'll need to recheck after we have
- * the lock.
- */
- if (PageIsAllVisible(page))
- visibilitymap_pin(relation, block, &vmbuffer);
-
- LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-
- lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
-
- /*
- * Usually, a buffer pin and/or snapshot blocks pruning of otid, ensuring
- * we see LP_NORMAL here. When the otid origin is a syscache, we may have
- * neither a pin nor a snapshot. Hence, we may see other LP_ states, each
- * of which indicates concurrent pruning.
- *
- * Failing with TM_Updated would be most accurate. However, unlike other
- * TM_Updated scenarios, we don't know the successor ctid in LP_UNUSED and
- * LP_DEAD cases. While the distinction between TM_Updated and TM_Deleted
- * does matter to SQL statements UPDATE and MERGE, those SQL statements
- * hold a snapshot that ensures LP_NORMAL. Hence, the choice between
- * TM_Updated and TM_Deleted affects only the wording of error messages.
- * Settle on TM_Deleted, for two reasons. First, it avoids complicating
- * the specification of when tmfd->ctid is valid. Second, it creates
- * error log evidence that we took this branch.
- *
- * Since it's possible to see LP_UNUSED at otid, it's also possible to see
- * LP_NORMAL for a tuple that replaced LP_UNUSED. If it's a tuple for an
- * unrelated row, we'll fail with "duplicate key value violates unique".
- * XXX if otid is the live, newer version of the newtup row, we'll discard
- * changes originating in versions of this catalog row after the version
- * the caller got from syscache. See syscache-update-pruned.spec.
- */
- if (!ItemIdIsNormal(lp))
- {
- Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
-
- UnlockReleaseBuffer(buffer);
- Assert(!have_tuple_lock);
- if (vmbuffer != InvalidBuffer)
- ReleaseBuffer(vmbuffer);
- tmfd->ctid = *otid;
- tmfd->xmax = InvalidTransactionId;
- tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
-
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
- return TM_Deleted;
- }
-
- /*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
- */
- oldtup.t_tableOid = RelationGetRelid(relation);
- oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
- oldtup.t_len = ItemIdGetLength(lp);
- oldtup.t_self = *otid;
-
- /* the new tuple is ready, except for this: */
+ /* The new tuple is ready, except for this */
newtup->t_tableOid = RelationGetRelid(relation);
- /*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
- */
- if (!modified_attrs_valid)
- {
- bool id_has_external = false;
-
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
- rep_id_key_required = id_has_external ||
- bms_overlap(modified_attrs, id_attrs);
- }
- else
- {
- /*
- * ExtractReplicatIdentity() needs to know if a modified attrbute is
- * used as a replica indentity or if any of the unmodified indexed
- * attributes in the old tuple are stored externally and used as a
- * replica identity.
- */
- rep_id_key_required = bms_overlap(modified_attrs, id_attrs);
- if (!rep_id_key_required)
- {
- Bitmapset *attrs;
- TupleDesc tupdesc = RelationGetDescr(relation);
- int attidx = -1;
-
- /* Check all unmodified indexed replica identity key attributes */
- attrs = bms_difference(interesting_attrs, modified_attrs);
- attrs = bms_int_members(attrs, id_attrs);
-
- while ((attidx = bms_next_member(attrs, attidx)) >= 0)
- {
- /*
- * attidx is zero-based, attrnum is the normal attribute
- * number
- */
- AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
- Datum value;
- bool isnull;
-
- /*
- * System attributes are not added into interesting_attrs in
- * relcache.
- */
- Assert(attrnum > 0);
-
- value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
-
- /* No need to check attributes that can't be stored externally */
- if (isnull ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /* Check if the old tuple's attribute is stored externally */
- if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
- {
- rep_id_key_required = true;
- break;
- }
- }
-
- bms_free(attrs);
- }
- }
-
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
* This allows for more concurrency when we are running simultaneously
@@ -3562,9 +3357,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (*lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3581,22 +3375,15 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(*lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
- /*
- * Note: beyond this point, use oldtup not otid to refer to old tuple.
- * otid may very well point at newtup->t_self, which we will overwrite
- * with the new tuple's location, so there's great risk of confusion if we
- * use otid anymore.
- */
-
l2:
checked_lockers = false;
locker_remains = false;
- result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
+ result = HeapTupleSatisfiesUpdate(oldtup, cid, buffer);
/* see below about the "no wait" case */
Assert(result != TM_BeingModified || wait);
@@ -3628,8 +3415,8 @@ l2:
*/
/* must copy state data before unlocking buffer */
- xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
- infomask = oldtup.t_data->t_infomask;
+ xwait = HeapTupleHeaderGetRawXmax(oldtup->t_data);
+ infomask = oldtup->t_data->t_infomask;
/*
* Now we have to do something about the existing locker. If it's a
@@ -3669,13 +3456,12 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &oldtup->t_self, *lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
- relation, &oldtup.t_self, XLTW_Update,
- &remain);
+ relation, &oldtup->t_self, XLTW_Update, &remain);
checked_lockers = true;
locker_remains = remain != 0;
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -3685,9 +3471,9 @@ l2:
* could update this tuple before we get to this point. Check
* for xmax change, and start over if so.
*/
- if (xmax_infomask_changed(oldtup.t_data->t_infomask,
+ if (xmax_infomask_changed(oldtup->t_data->t_infomask,
infomask) ||
- !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
+ !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup->t_data),
xwait))
goto l2;
}
@@ -3712,8 +3498,8 @@ l2:
* before this one, which are important to keep in case this
* subxact aborts.
*/
- if (!HEAP_XMAX_IS_LOCKED_ONLY(oldtup.t_data->t_infomask))
- update_xact = HeapTupleGetUpdateXid(oldtup.t_data);
+ if (!HEAP_XMAX_IS_LOCKED_ONLY(oldtup->t_data->t_infomask))
+ update_xact = HeapTupleGetUpdateXid(oldtup->t_data);
else
update_xact = InvalidTransactionId;
@@ -3754,9 +3540,9 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &oldtup->t_self, *lockmode,
LockWaitBlock, &have_tuple_lock);
- XactLockTableWait(xwait, relation, &oldtup.t_self,
+ XactLockTableWait(xwait, relation, &oldtup->t_self,
XLTW_Update);
checked_lockers = true;
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -3766,20 +3552,20 @@ l2:
* other xact could update this tuple before we get to this point.
* Check for xmax change, and start over if so.
*/
- if (xmax_infomask_changed(oldtup.t_data->t_infomask, infomask) ||
+ if (xmax_infomask_changed(oldtup->t_data->t_infomask, infomask) ||
!TransactionIdEquals(xwait,
- HeapTupleHeaderGetRawXmax(oldtup.t_data)))
+ HeapTupleHeaderGetRawXmax(oldtup->t_data)))
goto l2;
/* Otherwise check if it committed or aborted */
- UpdateXmaxHintBits(oldtup.t_data, buffer, xwait);
- if (oldtup.t_data->t_infomask & HEAP_XMAX_INVALID)
+ UpdateXmaxHintBits(oldtup->t_data, buffer, xwait);
+ if (oldtup->t_data->t_infomask & HEAP_XMAX_INVALID)
can_continue = true;
}
if (can_continue)
result = TM_Ok;
- else if (!ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid))
+ else if (!ItemPointerEquals(&oldtup->t_self, &oldtup->t_data->t_ctid))
result = TM_Updated;
else
result = TM_Deleted;
@@ -3792,39 +3578,32 @@ l2:
result == TM_Updated ||
result == TM_Deleted ||
result == TM_BeingModified);
- Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
+ Assert(!(oldtup->t_data->t_infomask & HEAP_XMAX_INVALID));
Assert(result != TM_Updated ||
- !ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid));
+ !ItemPointerEquals(&oldtup->t_self, &oldtup->t_data->t_ctid));
}
if (crosscheck != InvalidSnapshot && result == TM_Ok)
{
/* Perform additional check for transaction-snapshot mode RI updates */
- if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
+ if (!HeapTupleSatisfiesVisibility(oldtup, crosscheck, buffer))
result = TM_Updated;
}
if (result != TM_Ok)
{
- tmfd->ctid = oldtup.t_data->t_ctid;
- tmfd->xmax = HeapTupleHeaderGetUpdateXid(oldtup.t_data);
+ tmfd->ctid = oldtup->t_data->t_ctid;
+ tmfd->xmax = HeapTupleHeaderGetUpdateXid(oldtup->t_data);
if (result == TM_SelfModified)
- tmfd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
+ tmfd->cmax = HeapTupleHeaderGetCmax(oldtup->t_data);
else
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &oldtup->t_self, *lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified attrs is passed in and free'd by the caller, or NULL */
- bms_free(interesting_attrs);
return result;
}
@@ -3851,9 +3630,9 @@ l2:
* If the tuple we're updating is locked, we need to preserve the locking
* info in the old tuple's Xmax. Prepare a new Xmax value for this.
*/
- compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
- oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2,
+ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup->t_data),
+ oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2,
xid, *lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3865,12 +3644,12 @@ l2:
* tuple. (In rare cases that might also be InvalidTransactionId and yet
* not have the HEAP_XMAX_INVALID bit set; that's fine.)
*/
- if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
- HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
+ if ((oldtup->t_data->t_infomask & HEAP_XMAX_INVALID) ||
+ HEAP_LOCKED_UPGRADED(oldtup->t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
- xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+ xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup->t_data);
if (!TransactionIdIsValid(xmax_new_tuple))
{
@@ -3885,7 +3664,7 @@ l2:
* Note that since we're doing an update, the only possibility is that
* the lockers had FOR KEY SHARE lock.
*/
- if (oldtup.t_data->t_infomask & HEAP_XMAX_IS_MULTI)
+ if (oldtup->t_data->t_infomask & HEAP_XMAX_IS_MULTI)
{
GetMultiXactIdHintBits(xmax_new_tuple, &infomask_new_tuple,
&infomask2_new_tuple);
@@ -3913,7 +3692,7 @@ l2:
* Replace cid with a combo CID if necessary. Note that we already put
* the plain cid into the new tuple.
*/
- HeapTupleHeaderAdjustCmax(oldtup.t_data, &cid, &iscombo);
+ HeapTupleHeaderAdjustCmax(oldtup->t_data, &cid, &iscombo);
/*
* If the toaster needs to be activated, OR if the new tuple will not fit
@@ -3930,12 +3709,12 @@ l2:
relation->rd_rel->relkind != RELKIND_MATVIEW)
{
/* toast table entries should never be recursively toasted */
- Assert(!HeapTupleHasExternal(&oldtup));
+ Assert(!HeapTupleHasExternal(oldtup));
Assert(!HeapTupleHasExternal(newtup));
need_toast = false;
}
else
- need_toast = (HeapTupleHasExternal(&oldtup) ||
+ need_toast = (HeapTupleHasExternal(oldtup) ||
HeapTupleHasExternal(newtup) ||
newtup->t_len > TOAST_TUPLE_THRESHOLD);
@@ -3968,9 +3747,9 @@ l2:
* updating, because the potentially created multixact would otherwise
* be wrong.
*/
- compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
- oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2,
+ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup->t_data),
+ oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2,
xid, *lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -3980,18 +3759,18 @@ l2:
START_CRIT_SECTION();
/* Clear obsolete visibility flags ... */
- oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
- oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
- HeapTupleClearHotUpdated(&oldtup);
+ oldtup->t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
+ oldtup->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
+ HeapTupleClearHotUpdated(oldtup);
/* ... and store info about transaction updating this tuple */
Assert(TransactionIdIsValid(xmax_lock_old_tuple));
- HeapTupleHeaderSetXmax(oldtup.t_data, xmax_lock_old_tuple);
- oldtup.t_data->t_infomask |= infomask_lock_old_tuple;
- oldtup.t_data->t_infomask2 |= infomask2_lock_old_tuple;
- HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
+ HeapTupleHeaderSetXmax(oldtup->t_data, xmax_lock_old_tuple);
+ oldtup->t_data->t_infomask |= infomask_lock_old_tuple;
+ oldtup->t_data->t_infomask2 |= infomask2_lock_old_tuple;
+ HeapTupleHeaderSetCmax(oldtup->t_data, cid, iscombo);
/* temporarily make it look not-updated, but locked */
- oldtup.t_data->t_ctid = oldtup.t_self;
+ oldtup->t_data->t_ctid = oldtup->t_self;
/*
* Clear all-frozen bit on visibility map if needed. We could
@@ -4014,10 +3793,10 @@ l2:
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
+ xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup->t_self);
xlrec.xmax = xmax_lock_old_tuple;
- xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2);
+ xlrec.infobits_set = compute_infobits(oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2);
xlrec.flags =
cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0;
XLogRegisterData(&xlrec, SizeOfHeapLock);
@@ -4039,7 +3818,7 @@ l2:
if (need_toast)
{
/* Note we always use WAL and FSM during updates */
- heaptup = heap_toast_insert_or_update(relation, newtup, &oldtup, 0);
+ heaptup = heap_toast_insert_or_update(relation, newtup, oldtup, 0);
newtupsize = MAXALIGN(heaptup->t_len);
}
else
@@ -4126,42 +3905,21 @@ l2:
* will include checking the relation level, there is no benefit to a
* separate check for the new tuple.
*/
- CheckForSerializableConflictIn(relation, &oldtup.t_self,
+ CheckForSerializableConflictIn(relation, &oldtup->t_self,
BufferGetBlockNumber(buffer));
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
- if (newbuf == buffer)
- {
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
- use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
- }
+ if ((newbuf == buffer) && hot_allowed)
+ use_hot_update = true;
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4170,8 +3928,7 @@ l2:
* logged. Pass old key required as true only if the replica identity key
* columns are modified or it has external data.
*/
- old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- rep_id_key_required,
+ old_key_tuple = ExtractReplicaIdentity(relation, oldtup, rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4194,7 +3951,7 @@ l2:
if (use_hot_update)
{
/* Mark the old tuple as HOT-updated */
- HeapTupleSetHotUpdated(&oldtup);
+ HeapTupleSetHotUpdated(oldtup);
/* And mark the new tuple as heap-only */
HeapTupleSetHeapOnly(heaptup);
/* Mark the caller's copy too, in case different from heaptup */
@@ -4203,7 +3960,7 @@ l2:
else
{
/* Make sure tuples are correctly marked as not-HOT */
- HeapTupleClearHotUpdated(&oldtup);
+ HeapTupleClearHotUpdated(oldtup);
HeapTupleClearHeapOnly(heaptup);
HeapTupleClearHeapOnly(newtup);
}
@@ -4212,17 +3969,17 @@ l2:
/* Clear obsolete visibility flags, possibly set by ourselves above... */
- oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
- oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
+ oldtup->t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
+ oldtup->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
/* ... and store info about transaction updating this tuple */
Assert(TransactionIdIsValid(xmax_old_tuple));
- HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple);
- oldtup.t_data->t_infomask |= infomask_old_tuple;
- oldtup.t_data->t_infomask2 |= infomask2_old_tuple;
- HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
+ HeapTupleHeaderSetXmax(oldtup->t_data, xmax_old_tuple);
+ oldtup->t_data->t_infomask |= infomask_old_tuple;
+ oldtup->t_data->t_infomask2 |= infomask2_old_tuple;
+ HeapTupleHeaderSetCmax(oldtup->t_data, cid, iscombo);
/* record address of new tuple in t_ctid of old one */
- oldtup.t_data->t_ctid = heaptup->t_self;
+ oldtup->t_data->t_ctid = heaptup->t_self;
/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
if (PageIsAllVisible(BufferGetPage(buffer)))
@@ -4255,12 +4012,12 @@ l2:
*/
if (RelationIsAccessibleInLogicalDecoding(relation))
{
- log_heap_new_cid(relation, &oldtup);
+ log_heap_new_cid(relation, oldtup);
log_heap_new_cid(relation, heaptup);
}
recptr = log_heap_update(relation, buffer,
- newbuf, &oldtup, heaptup,
+ newbuf, oldtup, heaptup,
old_key_tuple,
all_visible_cleared,
all_visible_cleared_new);
@@ -4285,7 +4042,7 @@ l2:
* both tuple versions in one call to inval.c so we can avoid redundant
* sinval messages.)
*/
- CacheInvalidateHeapTuple(relation, &oldtup, heaptup);
+ CacheInvalidateHeapTuple(relation, oldtup, heaptup);
/* Now we can release the buffer(s) */
if (newbuf != buffer)
@@ -4300,7 +4057,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &oldtup->t_self, *lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4314,32 +4071,9 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified attrs is passed in and free'd by the caller, or NULL */
- bms_free(interesting_attrs);
-
return TM_Ok;
}
@@ -4348,7 +4082,7 @@ l2:
* Confirm adequate lock held during heap_update(), per rules from
* README.tuplock section "Locking to write inplace-updated tables".
*/
-static void
+void
check_lock_if_inplace_updateable_rel(Relation relation,
const ItemPointerData *otid,
HeapTuple newtup)
@@ -4510,6 +4244,162 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
}
+/*
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs, bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(mix_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes?
+ */
+ if (bms_is_subset(mix_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+bool
+HeapUpdateRequiresReplicaId(Relation relation, const Bitmapset *mix_attrs,
+ HeapTupleData *tuple)
+{
+ bool rep_id_key_required;
+ Bitmapset *rid_attrs,
+ *idx_attrs;
+
+ rid_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_IDENTITY_KEY);
+
+ if (bms_is_empty(rid_attrs))
+ {
+ bms_free(rid_attrs);
+ return false;
+ }
+
+ idx_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * ExtractReplicatIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the unmodified indexed
+ * attributes in the old tuple are stored externally and used as a replica
+ * identity.
+ */
+ rep_id_key_required = bms_overlap(mix_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx;
+
+ /* Check only unmodified indexed replica identity key attributes */
+ idx_attrs = bms_del_members(idx_attrs, mix_attrs);
+ rid_attrs = bms_int_members(rid_attrs, idx_attrs);
+
+ /*
+ * Start traversing the bitmap after any system or whole row
+ * attributes, they don't influence replica identity.
+ */
+ attidx = -FirstLowInvalidHeapAttributeNumber;
+
+ while ((attidx = bms_next_member(rid_attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into interesting_attrs in
+ * relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(tuple, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+ }
+
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+
+ return rep_id_key_required;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *mix_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(mix_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
/*
* Check which columns are being updated.
*
@@ -4520,12 +4410,10 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
* listed as interesting) of the old tuple is a member of external_cols and is
* stored externally.
*/
-static Bitmapset *
+Bitmapset *
HeapDetermineColumnsInfo(Relation relation,
Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+ HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
Bitmapset *modified = NULL;
@@ -4567,10 +4455,11 @@ HeapDetermineColumnsInfo(Relation relation,
}
/*
- * Extract the corresponding values. XXX this is pretty inefficient
- * if there are many indexed columns. Should we do a single
- * heap_deform_tuple call on each tuple, instead? But that doesn't
- * work for system columns ...
+ * Extract the corresponding values.
+ *
+ * XXX this is pretty inefficient if there are many indexed columns.
+ * Should we do a single heap_deform_tuple call on each tuple,
+ * instead? But that doesn't work for system columns ...
*/
value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
@@ -4581,48 +4470,146 @@ HeapDetermineColumnsInfo(Relation relation,
modified = bms_add_member(modified, attidx);
continue;
}
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
}
return modified;
}
/*
- * simple_heap_update - replace a tuple
- *
- * This routine may be used to update a tuple when concurrent updates of
- * the target tuple are not expected (for example, because we have a lock
- * on the relation associated with the tuple). Any failure is reported
- * via ereport().
+ * This routine may be used to update a tuple when concurrent updates of the
+ * target tuple are not expected (for example, because we have a lock on the
+ * relation associated with the tuple). Any failure is reported via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ Buffer buffer;
+ Buffer vmbuffer = InvalidBuffer;
+ Page page;
+ BlockNumber block;
+ Bitmapset *sum_attrs,
+ *mix_attrs,
+ *idx_attrs;
+ ItemId lp;
+ HeapTupleData oldtup;
+ bool hot_allowed;
+ bool summarized_only;
+ bool rep_id_key_required = false;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode,
- NULL, false, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
+ Assert(HeapTupleHeaderGetNatts(tuple->t_data) <=
+ RelationGetNumberOfAttributes(relation));
+
+ /*
+ * Forbid this during a parallel operation, lest it allocate a combo CID.
+ * Other workers might need that combo CID for visibility checks, and we
+ * have no provision for broadcasting it to them.
+ */
+ if (IsInParallelMode())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("cannot update tuples during a parallel operation")));
+
+#ifdef USE_ASSERT_CHECKING
+ check_lock_if_inplace_updateable_rel(relation, otid, tuple);
+#endif
+
+ /*
+ * We must fetch these bitmaps of attributes from relcache to be checked
+ * for various operations below before obtaining a buffer lock because if
+ * we are doing an update on one of the relevant system catalogs we could
+ * deadlock if we try to fetch them later on. Relcache will return copies
+ * of each bitmap, so we need not worry about relcache flush happening
+ * midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_INDEXED);
+ sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ block = ItemPointerGetBlockNumber(otid);
+ INJECTION_POINT("heap_update-before-pin", NULL);
+ buffer = ReadBuffer(relation, block);
+ page = BufferGetPage(buffer);
+
+ /*
+ * Before locking the buffer, pin the visibility map page if it appears to
+ * be necessary. Since we haven't got the lock yet, someone else might be
+ * in the middle of changing this, so we'll need to recheck after we have
+ * the lock.
+ */
+ if (PageIsAllVisible(page))
+ visibilitymap_pin(relation, block, &vmbuffer);
+
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+ lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+
+ /*
+ * Usually, a buffer pin and/or snapshot blocks pruning of otid, ensuring
+ * we see LP_NORMAL here. When the otid origin is a syscache, we may have
+ * neither a pin nor a snapshot. Hence, we may see other LP_ states, each
+ * of which indicates concurrent pruning.
+ *
+ * Failing with TM_Updated would be most accurate. However, unlike other
+ * TM_Updated scenarios, we don't know the successor ctid in LP_UNUSED and
+ * LP_DEAD cases. While the distinction between TM_Updated and TM_Deleted
+ * does matter to SQL statements UPDATE and MERGE, those SQL statements
+ * hold a snapshot that ensures LP_NORMAL. Hence, the choice between
+ * TM_Updated and TM_Deleted affects only the wording of error messages.
+ * Settle on TM_Deleted, for two reasons. First, it avoids complicating
+ * the specification of when tmfd->ctid is valid. Second, it creates
+ * error log evidence that we took this branch.
+ *
+ * Since it's possible to see LP_UNUSED at otid, it's also possible to see
+ * LP_NORMAL for a tuple that replaced LP_UNUSED. If it's a tuple for an
+ * unrelated row, we'll fail with "duplicate key value violates unique".
+ * XXX if otid is the live, newer version of the newtup row, we'll discard
+ * changes originating in versions of this catalog row after the version
+ * the caller got from syscache. See syscache-update-pruned.spec.
+ */
+ if (!ItemIdIsNormal(lp))
+ {
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ UnlockReleaseBuffer(buffer);
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+ *update_indexes = TU_None;
+
+ bms_free(idx_attrs);
+ bms_free(sum_attrs);
+ /* mix_attrs not yet initialized */
+
+ elog(ERROR, "tuple concurrently deleted");
+ }
+
+ /*
+ * Partially construct the oldtup for HeapDetermineColumnsInfo to work and
+ * then pass that on to heap_update.
+ */
+ oldtup.t_tableOid = RelationGetRelid(relation);
+ oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+ oldtup.t_len = ItemIdGetLength(lp);
+ oldtup.t_self = *otid;
+
+ mix_attrs = HeapDetermineColumnsInfo(relation, idx_attrs, &oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+ rep_id_key_required = HeapUpdateRequiresReplicaId(relation, mix_attrs, &oldtup);
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+
+ result = heap_update(relation, &oldtup, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, &lockmode, buffer, page, block, lp, hot_allowed,
+ vmbuffer, rep_id_key_required);
+
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4632,6 +4619,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4646,8 +4637,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
elog(ERROR, "unrecognized heap_update status: %u", result);
break;
}
-}
+ bms_free(idx_attrs);
+ bms_free(sum_attrs);
+}
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2d74fa90c7f..54d117ea151 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -44,6 +44,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,18 +317,97 @@ static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *modified_attrs, TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
- bool shouldFree = true;
- HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool shouldFree = false;
+ HeapTuple tuple;
+ bool rep_id_key_required = false;
+ bool hot_allowed;
+ bool summarized_only;
+ HeapTupleData oldtup;
+ Buffer buffer;
+ Buffer vmbuffer = InvalidBuffer;
+ Page page;
+ BlockNumber block;
+ ItemId lp;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+ /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
+ Assert(HeapTupleHeaderGetNatts(tuple->t_data) <=
+ RelationGetNumberOfAttributes(relation));
+
+ /*
+ * Forbid this during a parallel operation, lest it allocate a combo CID.
+ * Other workers might need that combo CID for visibility checks, and we
+ * have no provision for broadcasting it to them.
+ */
+ if (IsInParallelMode())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("cannot update tuples during a parallel operation")));
+
+#ifdef USE_ASSERT_CHECKING
+ check_lock_if_inplace_updateable_rel(relation, otid, tuple);
+#endif
+
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+
+ block = ItemPointerGetBlockNumber(otid);
+ INJECTION_POINT("heap_update-before-pin", NULL);
+ buffer = ReadBuffer(relation, block);
+ page = BufferGetPage(buffer);
+
+ /*
+ * Before locking the buffer, pin the visibility map page if it appears to
+ * be necessary. Since we haven't got the lock yet, someone else might be
+ * in the middle of changing this, so we'll need to recheck after we have
+ * the lock.
+ */
+ if (PageIsAllVisible(page))
+ visibilitymap_pin(relation, block, &vmbuffer);
+
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+ lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+
+ Assert(ItemIdIsNormal(lp));
+
+ oldtup.t_tableOid = RelationGetRelid(relation);
+ oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+ oldtup.t_len = ItemIdGetLength(lp);
+ oldtup.t_self = *otid;
+
+ rep_id_key_required = HeapUpdateRequiresReplicaId(relation, mix_attrs, &oldtup);
+
+#if 1
+ {
+ Bitmapset *hot_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_INDEXED);
+ Bitmapset *id_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_IDENTITY_KEY);
+ Bitmapset *hdci_attrs = HeapDetermineColumnsInfo(relation, hot_attrs,
+ &oldtup, tuple);
+
+ Assert(bms_equal(mix_attrs, hdci_attrs));
+ bms_free(hot_attrs);
+ bms_free(id_attrs);
+ bms_free(hdci_attrs);
+ }
+#endif
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
- result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, modified_attrs, true, update_indexes);
+ result = heap_update(relation, &oldtup, tuple, cid, crosscheck, wait, tmfd,
+ lockmode, buffer, page, block, lp, hot_allowed,
+ vmbuffer, rep_id_key_required);
+
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -335,21 +415,16 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
*
* Note: heap_update returns the tid (location) of the new tuple in the
* t_self field.
- *
- * If the update is not HOT, we must update all indexes. If the update is
- * HOT, it could be that we updated summarized columns, so we either
- * update only summarized indexes, or none at all.
*/
+ *update_indexes = TU_None;
if (result != TM_Ok)
- {
- Assert(*update_indexes == TU_None);
*update_indexes = TU_None;
- }
else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
+ *update_indexes = TU_None;
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index e95dde2df2e..2d7e2c46587 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -1933,46 +1933,31 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
/*
* ExecCompareSlots
*
- * Compare old and new TupleTableSlots to detect which attributes have changed.
- *
- * This function serves two purposes:
- * 1) After trigger execution: detect trigger-modified columns
- * (pass excluding=explicitly SET columns, including=NULL)
- * 2) Index maintenance: detect changes in indexed columns
- * (pass including=indexed AND explicitly SET columns, excluding=NULL)
- *
- * Parameters:
- * resultRelInfo - relation information
- * excluding - bitmapset of attributes to skip
- * including - bitmapset of attributes to check
- * tupdesc - RelationGetDescr(relation)
- * old_tts - old tuple slot
- * new_tts - new tuple slot
- *
- * If including is NULL, check all attributes EXCEPT those in excluding
- * If excluding is NULL, check ONLY attributes in including
- * If both are NULL, check all attributes
- *
- * Returns a Bitmapset of attribute indices (using FirstLowInvalidHeapAttributeNumber
- * convention) that differ between the two slots.
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
*/
Bitmapset *
-ExecCompareSlotAttrs(TupleDesc tupdesc,
- const Bitmapset *attrs,
- TupleTableSlot *old_tts,
- TupleTableSlot *new_tts)
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
{
int attidx = -1;
Bitmapset *modified = NULL;
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
- Datum old_value,
- new_value;
- bool old_null,
- new_null;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
CompactAttribute *att;
/*
@@ -2001,21 +1986,21 @@ ExecCompareSlotAttrs(TupleDesc tupdesc,
}
att = TupleDescCompactAttr(tupdesc, attrnum - 1);
- old_value = slot_getattr(old_tts, attrnum, &old_null);
- new_value = slot_getattr(new_tts, attrnum, &new_null);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
/* A change to/from NULL, so not equal */
- if (old_null != new_null)
+ if (null1 != null2)
{
modified = bms_add_member(modified, attidx);
continue;
}
/* Both NULL, no change/unmodified */
- if (old_null)
+ if (null2)
continue;
- if (!datum_image_eq(old_value, new_value, att->attbyval, att->attlen))
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
modified = bms_add_member(modified, attidx);
}
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 18796baed28..082dee94422 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -213,7 +213,7 @@ ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
Relation relation = resultRelInfo->ri_RelationDesc;
TupleDesc tupdesc = RelationGetDescr(relation);
Bitmapset *attrs,
- *mix_attrs;
+ *mix_attrs = NULL;
/* If no indexes, we're done */
if (resultRelInfo->ri_NumIndices == 0)
@@ -227,8 +227,9 @@ ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
- /* Find out which, if any, modified indexed attributes changed */
- mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+ /* Find out which, if any, modified indexed attributes changed value */
+ if (!bms_is_empty(attrs))
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
bms_free(attrs);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 547cf1d054d..f30505d8ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,7 +2475,6 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
@@ -5277,8 +5276,7 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
* INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
@@ -5302,9 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
Bitmapset *indexedattrs; /* columns referenced by indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5323,8 +5320,6 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
case INDEX_ATTR_BITMAP_INDEXED:
@@ -5371,7 +5366,6 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
indexedattrs = NULL;
foreach(l, indexoidlist)
@@ -5432,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5441,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5504,15 +5498,19 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
- /* indexedattrs not yet initialized */
+ bms_free(indexedattrs);
goto restart;
}
- /* Set indexed attributes to track all referenced attributes */
- indexedattrs = bms_union(hotblockingattrs, summarizedattrs);
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
@@ -5522,8 +5520,6 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
bms_free(relation->rd_indexedattr);
@@ -5540,7 +5536,6 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
@@ -5555,8 +5550,6 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
case INDEX_ATTR_BITMAP_INDEXED:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a56f3d1f378..f1858d14b42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -364,12 +364,11 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
TM_FailureData *tmfd, bool changingPart);
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
-extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
+extern TM_Result heap_update(Relation relation, HeapTuple oldtup, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *mix_attrs, bool mix_attrs_valid,
- TU_UpdateIndexes *update_indexes);
+ TM_FailureData *tmfd, const LockTupleMode *lockmode,
+ Buffer buffer, Page page, BlockNumber block, ItemId lp,
+ bool hot_allowed, Buffer vmbuffer, bool rep_id_key_required);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -431,6 +430,22 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only);
+extern bool HeapUpdateRequiresReplicaId(Relation relation, const Bitmapset *mix_attrs,
+ HeapTupleData *tuple);
+extern Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ HeapTuple oldtup, HeapTuple newtup);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *mix_attrs);
+#ifdef USE_ASSERT_CHECKING
+extern void check_lock_if_inplace_updateable_rel(Relation relation,
+ const ItemPointerData *otid,
+ HeapTuple newtup);
+#endif
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index df5426fd7fb..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,7 +162,6 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 5834ab7b903..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,6 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 249e68be654..6aea0346ee2 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -260,7 +260,7 @@ MERGE INTO gtestm t USING gtestm AS s ON 2 * t.a = s.b WHEN MATCHED THEN DELETE
DROP TABLE gtestm;
-- views
CREATE VIEW gtest1v AS SELECT * FROM gtest1;
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+---
3 | 6
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index 81152b39a79..1142bb93525 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -115,7 +115,7 @@ DROP TABLE gtestm;
-- views
CREATE VIEW gtest1v AS SELECT * FROM gtest1;
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
INSERT INTO gtest1v VALUES (4, 8); -- error
INSERT INTO gtest1v VALUES (5, DEFAULT); -- ok
INSERT INTO gtest1v VALUES (6, 66), (7, 77); -- error
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-02-19 20:43 ` Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Andres Freund @ 2026-02-19 20:43 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
Hi,
On 2026-02-17 16:15:02 -0500, Greg Burd wrote:
> > Why does simple_heap_update() need to do the HeapDetermineColumnsInfo()
> > inside heap_update()? It seems like you're trying to avoid doing the
> > same work the executor is doing to determine the modified_attrs bitmap,
> > but either (a) the work is cheap; or (b) the work to make the bitmap is
> > expensive.
>
> simple_heap_update() is exclusively called during catalog tuple updates and
> does not involve the executor at all, these are direct calls into heap to
> store catalog tuples.
Just FYI, there are probably a fair number of extensions using
simple_heap_update(). That number used to be a lot higher, but I don't think
there should be a hard assumption about it just being used for catalog updates
in the code.
Greetings,
Andres Freund
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
@ 2026-02-19 22:31 ` Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-02-19 22:31 UTC (permalink / raw)
To: Andres Freund <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
On Thu, Feb 19, 2026, at 3:43 PM, Andres Freund wrote:
> Hi,
>
> On 2026-02-17 16:15:02 -0500, Greg Burd wrote:
>> > Why does simple_heap_update() need to do the HeapDetermineColumnsInfo()
>> > inside heap_update()? It seems like you're trying to avoid doing the
>> > same work the executor is doing to determine the modified_attrs bitmap,
>> > but either (a) the work is cheap; or (b) the work to make the bitmap is
>> > expensive.
>>
>> simple_heap_update() is exclusively called during catalog tuple updates and
>> does not involve the executor at all, these are direct calls into heap to
>> store catalog tuples.
Hey Andres,
> Just FYI, there are probably a fair number of extensions using
> simple_heap_update(). That number used to be a lot higher, but I don't think
> there should be a hard assumption about it just being used for catalog updates
> in the code.
Makes sense. These patches don't change simple_heap_update() in any functional way and so extensions should be fine.
> Greetings,
>
> Andres Freund
best.
-greg
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-02-23 19:23 ` Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-02-23 19:23 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Jeff Davis <[email protected]>
Hello.
Attached is a new patch set that fixes a few issues identified in the last set.
0001 - creates a new way to identify the set of attributes both modified by the update and referenced by one or more indexes on the target relation being updated. This patch keeps the HeapDetermineColumnsInfo() path within heap_update() for calls from simple_heap_update() when modified_attrs_valid is set to false. I'm not a huge fan of this, but it does serve as a way to illustrate a minimal set of changes easing review a bit.
0002 - splits out the top portion of heap_update() into both heapam_tuple_update() and simple_heap_update(), adds a few helper functions and tries to reduce repeated code. The goal here was to remove some of the mess related to the various bitmaps used to make decisions during the update.
Performance tests so far haven't shown a regression of note for this set of changes.
I'm still working on:
a) cleaning this up a bit more
b) create ExecCheckUpdateRequiresReplicaId() in executor
c) look for a way to cleanly pass/maintain per-table AM state during update
d) root cause for difference in tests
e) look into UPDATE WHERE > 1 row performance
best.
-greg
Attachments:
[text/x-patch] v20260226-0001-Idenfity-modified-indexed-attributes-in-th.patch (29.7K, 2-v20260226-0001-Idenfity-modified-indexed-attributes-in-th.patch)
download | inline diff:
From 9ade28743ef557b4727d39388194cc6c56952503 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v20260226 1/2] Idenfity modified indexed attributes in the
executor on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update().
ExecCheckIndexedAttrsForChanges() replaces HeapDeterminesColumnsInfo()
and is called before table_tuple_update() crucially without the need
for an exclusive buffer lock on the page that holds the tuple being
updated. This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecCheckIndexedAttrsForChanges() in turn uses ExecCompareSlotAttrs() to
identify which attributes have changed and then intersects that with the
set of indexed attributes to identify the modified indexed set.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involed in the decision to include the replica identity key or not.
This now happens in heap_update() when modified_attrs_valid is false.
Catalog tuple updates use simple_heap_update() and don't pass a
modified_attrs Bitmapset into heap_update() indicated by the
modified_attrs_valid bool set to false.
Updates stemming from logical replication also use the new
ExecCheckIndexedAttrsForChanges() in ExecSimpleRelationUpdate().
Before row triggers on UPDATE may use heap_modify_tuple() to update
attributes not identified by ExecGetAllUpdatedCols() as is the case in
tsvector_update_trigger(). ExecBRUpdateTriggers() now identifies
changes to indexed columns not found by ExecGetAllUpdateCols()
and adds their attributes to ri_extraUpdatedCols. See
tsearch.sql tests for an example of this.
---
src/backend/access/heap/heapam.c | 80 +++++++++++++++++++++---
src/backend/access/heap/heapam_handler.c | 7 +--
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 20 +++++-
src/backend/executor/execReplication.c | 7 ++-
src/backend/executor/execTuples.c | 78 +++++++++++++++++++++++
src/backend/executor/nodeModifyTable.c | 78 ++++++++++++++++++++---
src/backend/replication/logical/worker.c | 10 +--
src/backend/utils/cache/relcache.c | 15 +++++
src/include/access/heapam.h | 1 +
src/include/access/tableam.h | 8 ++-
src/include/executor/executor.h | 8 +++
src/include/utils/rel.h | 1 +
src/include/utils/relcache.h | 1 +
14 files changed, 283 insertions(+), 36 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98d53caeea8..ab8b6ddb8de 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3311,6 +3311,7 @@ TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, bool modified_attrs_valid,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -3320,7 +3321,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
Bitmapset *key_attrs;
Bitmapset *id_attrs;
Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3345,7 +3345,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3487,9 +3487,69 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* new tuple so we must include it as part of the old_key_tuple. See
* ExtractReplicaIdentity.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ if (!modified_attrs_valid)
+ {
+ bool id_has_external = false;
+
+ modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
+ id_attrs, &oldtup,
+ newtup, &id_has_external);
+ rep_id_key_required = id_has_external ||
+ bms_overlap(modified_attrs, id_attrs);
+ }
+ else
+ {
+ /*
+ * ExtractReplicatIdentity() needs to know if a modified attrbute is
+ * used as a replica indentity or if any of the unmodified indexed
+ * attributes in the old tuple are stored externally and used as a
+ * replica identity.
+ */
+ rep_id_key_required = bms_overlap(modified_attrs, id_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /* Check all unmodified indexed replica identity key attributes */
+ attrs = bms_difference(interesting_attrs, modified_attrs);
+ attrs = bms_int_members(attrs, id_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute
+ * number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into interesting_attrs in
+ * relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3763,7 +3823,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return result;
}
@@ -4111,8 +4171,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4278,7 +4337,7 @@ l2:
bms_free(sum_attrs);
bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
+ /* modified attrs is passed in and free'd by the caller, or NULL */
bms_free(interesting_attrs);
return TM_Ok;
@@ -4562,7 +4621,8 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
result = heap_update(relation, otid, tup,
GetCurrentCommandId(true), InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ NULL, false, update_indexes);
switch (result)
{
case TM_SelfModified:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b83e2013d50..2690593fe4c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -312,12 +312,11 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart);
}
-
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
@@ -328,7 +327,7 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, lockmode, modified_attrs, true, update_indexes);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..64efa55dfe3 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..910f3db37cf 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,11 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..1064ebe845b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,83 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ continue;
+
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..18796baed28 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecCheckIndexedAttrsForChanges - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,50 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecCheckIndexedAttrsForChanges
+ *
+ * Determine which indexes need updating by finding the set of modified indexed
+ * attributes.
+ *
+ * The goal is for the executor to know, ahead of calling into the table AM to
+ * process the update and before calling into the index AM for inserting new
+ * index tuples, which attributes in the new TupleTableSlot, if any, truely
+ * necessitate a new index tuple.
+ *
+ * Returns a Bitmapset of attributes that intersects with indexes which require
+ * a new index tuple.
+ */
+Bitmapset *
+ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *mix_attrs;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) and get
+ * the subset that are also indexed.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /* Find out which, if any, modified indexed attributes changed */
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2241,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2368,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecCheckIndexedAttrsForChanges(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2391,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2614,9 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3466,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -3432,6 +3492,7 @@ lmerge_matched:
tupleid, NULL, newslot);
mtstate->mt_merge_updated += 1;
}
+
break;
case CMD_DELETE:
@@ -4539,7 +4600,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
@@ -4719,6 +4780,7 @@ ExecModifyTable(PlanState *pstate)
/* Now apply the update. */
slot = ExecUpdate(&context, resultRelInfo, tupleid, oldtuple,
oldSlot, slot, node->canSetTag);
+
if (tuplock)
UnlockTuple(resultRelInfo->ri_RelationDesc, tupleid,
InplaceUpdateTupleLock);
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index bae8c011390..8a9b78e2e28 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2919,7 +2919,6 @@ apply_handle_update_internal(ApplyExecutionData *edata,
TupleTableSlot *localslot = NULL;
ConflictTupleInfo conflicttuple = {0};
bool found;
- MemoryContext oldctx;
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
ExecOpenIndices(relinfo, false);
@@ -2958,15 +2957,13 @@ apply_handle_update_internal(ApplyExecutionData *edata,
}
/* Process and store remote tuple in the slot */
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
slot_modify_data(remoteslot, localslot, relmapentry, newtup);
- MemoryContextSwitchTo(oldctx);
EvalPlanQualSetSlot(&epqstate, remoteslot);
InitConflictIndexes(relinfo);
- /* Do the actual update. */
+ /* First check privileges */
TargetPrivilegesCheck(relinfo->ri_RelationDesc, ACL_UPDATE);
ExecSimpleRelationUpdate(relinfo, estate, &epqstate, localslot,
remoteslot);
@@ -3524,10 +3521,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
* Apply the update to the local tuple, putting the result in
* remoteslot_part.
*/
- oldctx = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
- slot_modify_data(remoteslot_part, localslot, part_entry,
- newtup);
- MemoryContextSwitchTo(oldctx);
+ slot_modify_data(remoteslot_part, localslot, part_entry, newtup);
EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..547cf1d054d 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2477,6 +2477,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_idattr);
bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5278,6 +5279,7 @@ RelationGetIndexPredicate(Relation relation)
* index (empty if FULL)
* INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
* INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5301,6 +5303,7 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
Bitmapset *summarizedattrs; /* columns with summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
@@ -5324,6 +5327,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5368,6 +5373,7 @@ restart:
idindexattrs = NULL;
hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5500,10 +5506,14 @@ restart:
bms_free(idindexattrs);
bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ /* indexedattrs not yet initialized */
goto restart;
}
+ /* Set indexed attributes to track all referenced attributes */
+ indexedattrs = bms_union(hotblockingattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5516,6 +5526,8 @@ restart:
relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5530,6 +5542,7 @@ restart:
relation->rd_idattr = bms_copy(idindexattrs);
relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5546,6 +5559,8 @@ restart:
return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..a56f3d1f378 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -368,6 +368,7 @@ extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, bool mix_attrs_valid,
TU_UpdateIndexes *update_indexes);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 119593b7b46..bc360a9b529 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *updated_cols,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1524,12 +1525,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_cols, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_cols, update_indexes);
}
/*
@@ -2010,6 +2011,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..7b0019fe15b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -606,6 +606,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -803,5 +807,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecCheckIndexedAttrsForChanges(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..df5426fd7fb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -164,6 +164,7 @@ typedef struct RelationData
Bitmapset *rd_idattr; /* included in replica identity index */
Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..5834ab7b903 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -71,6 +71,7 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_IDENTITY_KEY,
INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
--
2.51.2
[text/x-patch] v20260226-0002-Refactor-heap_update-and-move-attribute-de.patch (61.6K, 3-v20260226-0002-Refactor-heap_update-and-move-attribute-de.patch)
download | inline diff:
From f1ee582b999c0765159017717aa06d48008c7713 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Thu, 19 Feb 2026 14:08:17 -0500
Subject: [PATCH v20260226 2/2] Refactor heap_update() and move attribute
determination into callers
This refactoring relocates column modification determination from
heap_update() to its callers (simple_heap_update and
heapam_tuple_update), moving the logic upstream to executor/handler
level.
- Remove modified_attrs and modified_attrs_valid parameters from
heap_update()
- Extract buffer management (pin, lock, page fetch) into
simple_heap_update() and heapam_tuple_update() before calling
heap_update()
- Create helper functions: HeapUpdateHotAllowable(),
HeapUpdateRequiresReplicaId(), HeapUpdateDetermineLockmode()
- Pass pre-calculated attributes (hot_allowed, rep_id_key_required, and
lockmode) to heap_update() instead of deriving them within function
- Add ORDER BY clauses to generated_virtual.sql and updatable_views.sql
to ensure deterministic result ordering
Three tests were adjusted to avoid instability due to tuple ordering
during heap page scans. This avoids nondeterministic results.
---
src/backend/access/heap/heapam.c | 768 +++++++++---------
src/backend/access/heap/heapam_handler.c | 97 ++-
src/backend/executor/nodeModifyTable.c | 7 +-
src/backend/utils/cache/relcache.c | 35 +-
src/include/access/heapam.h | 25 +-
src/include/utils/rel.h | 1 -
src/include/utils/relcache.h | 1 -
.../regress/expected/generated_virtual.out | 4 +-
src/test/regress/expected/triggers.out | 16 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 4 +-
src/test/regress/sql/triggers.sql | 4 +-
src/test/regress/sql/updatable_views.sql | 2 +-
13 files changed, 517 insertions(+), 451 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ab8b6ddb8de..e6023fed48c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,12 +37,15 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "storage/lmgr.h"
@@ -51,6 +54,7 @@
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -62,16 +66,8 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
HeapTuple newtup, HeapTuple old_key_tuple,
bool all_visible_cleared, bool new_all_visible_cleared);
#ifdef USE_ASSERT_CHECKING
-static void check_lock_if_inplace_updateable_rel(Relation relation,
- const ItemPointerData *otid,
- HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3300,7 +3296,10 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
+ *
+ * It's required that the caller has acquired the pin and lock on the buffer.
+ * That lock and pin will be managed here, not in the caller.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3308,30 +3307,19 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* generated by another transaction).
*/
TM_Result
-heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
+heap_update(Relation relation, HeapTupleData *oldtup, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *modified_attrs, bool modified_attrs_valid,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode *lockmode,
+ Buffer buffer, Page page, BlockNumber block, ItemId lp,
+ bool hot_allowed, Buffer vmbuffer, bool rep_id_key_required)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- ItemId lp;
- HeapTupleData oldtup;
HeapTuple heaptup;
HeapTuple old_key_tuple = NULL;
bool old_key_copied = false;
- Page page;
- BlockNumber block;
MultiXactStatus mxact_status;
- Buffer buffer,
- newbuf,
- vmbuffer = InvalidBuffer,
+ Buffer newbuf,
vmbuffer_new = InvalidBuffer;
bool need_toast;
Size newtupsize,
@@ -3339,13 +3327,11 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3353,204 +3339,13 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
infomask_new_tuple,
infomask2_new_tuple;
- Assert(ItemPointerIsValid(otid));
-
- /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
- Assert(HeapTupleHeaderGetNatts(newtup->t_data) <=
- RelationGetNumberOfAttributes(relation));
-
+ Assert(BufferIsLockedByMe(buffer));
+ Assert(ItemIdIsNormal(lp));
AssertHasSnapshotForToast(relation);
- /*
- * Forbid this during a parallel operation, lest it allocate a combo CID.
- * Other workers might need that combo CID for visibility checks, and we
- * have no provision for broadcasting it to them.
- */
- if (IsInParallelMode())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot update tuples during a parallel operation")));
-
-#ifdef USE_ASSERT_CHECKING
- check_lock_if_inplace_updateable_rel(relation, otid, newtup);
-#endif
-
- /*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
- *
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
-
- block = ItemPointerGetBlockNumber(otid);
- INJECTION_POINT("heap_update-before-pin", NULL);
- buffer = ReadBuffer(relation, block);
- page = BufferGetPage(buffer);
-
- /*
- * Before locking the buffer, pin the visibility map page if it appears to
- * be necessary. Since we haven't got the lock yet, someone else might be
- * in the middle of changing this, so we'll need to recheck after we have
- * the lock.
- */
- if (PageIsAllVisible(page))
- visibilitymap_pin(relation, block, &vmbuffer);
-
- LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
-
- lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
-
- /*
- * Usually, a buffer pin and/or snapshot blocks pruning of otid, ensuring
- * we see LP_NORMAL here. When the otid origin is a syscache, we may have
- * neither a pin nor a snapshot. Hence, we may see other LP_ states, each
- * of which indicates concurrent pruning.
- *
- * Failing with TM_Updated would be most accurate. However, unlike other
- * TM_Updated scenarios, we don't know the successor ctid in LP_UNUSED and
- * LP_DEAD cases. While the distinction between TM_Updated and TM_Deleted
- * does matter to SQL statements UPDATE and MERGE, those SQL statements
- * hold a snapshot that ensures LP_NORMAL. Hence, the choice between
- * TM_Updated and TM_Deleted affects only the wording of error messages.
- * Settle on TM_Deleted, for two reasons. First, it avoids complicating
- * the specification of when tmfd->ctid is valid. Second, it creates
- * error log evidence that we took this branch.
- *
- * Since it's possible to see LP_UNUSED at otid, it's also possible to see
- * LP_NORMAL for a tuple that replaced LP_UNUSED. If it's a tuple for an
- * unrelated row, we'll fail with "duplicate key value violates unique".
- * XXX if otid is the live, newer version of the newtup row, we'll discard
- * changes originating in versions of this catalog row after the version
- * the caller got from syscache. See syscache-update-pruned.spec.
- */
- if (!ItemIdIsNormal(lp))
- {
- Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
-
- UnlockReleaseBuffer(buffer);
- Assert(!have_tuple_lock);
- if (vmbuffer != InvalidBuffer)
- ReleaseBuffer(vmbuffer);
- tmfd->ctid = *otid;
- tmfd->xmax = InvalidTransactionId;
- tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
-
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
- return TM_Deleted;
- }
-
- /*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
- */
- oldtup.t_tableOid = RelationGetRelid(relation);
- oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
- oldtup.t_len = ItemIdGetLength(lp);
- oldtup.t_self = *otid;
-
- /* the new tuple is ready, except for this: */
+ /* The new tuple is ready, except for this */
newtup->t_tableOid = RelationGetRelid(relation);
- /*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
- */
- if (!modified_attrs_valid)
- {
- bool id_has_external = false;
-
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
- rep_id_key_required = id_has_external ||
- bms_overlap(modified_attrs, id_attrs);
- }
- else
- {
- /*
- * ExtractReplicatIdentity() needs to know if a modified attrbute is
- * used as a replica indentity or if any of the unmodified indexed
- * attributes in the old tuple are stored externally and used as a
- * replica identity.
- */
- rep_id_key_required = bms_overlap(modified_attrs, id_attrs);
- if (!rep_id_key_required)
- {
- Bitmapset *attrs;
- TupleDesc tupdesc = RelationGetDescr(relation);
- int attidx = -1;
-
- /* Check all unmodified indexed replica identity key attributes */
- attrs = bms_difference(interesting_attrs, modified_attrs);
- attrs = bms_int_members(attrs, id_attrs);
-
- while ((attidx = bms_next_member(attrs, attidx)) >= 0)
- {
- /*
- * attidx is zero-based, attrnum is the normal attribute
- * number
- */
- AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
- Datum value;
- bool isnull;
-
- /*
- * System attributes are not added into interesting_attrs in
- * relcache.
- */
- Assert(attrnum > 0);
-
- value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
-
- /* No need to check attributes that can't be stored externally */
- if (isnull ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /* Check if the old tuple's attribute is stored externally */
- if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
- {
- rep_id_key_required = true;
- break;
- }
- }
-
- bms_free(attrs);
- }
- }
-
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
* This allows for more concurrency when we are running simultaneously
@@ -3562,9 +3357,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (*lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3581,22 +3375,15 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(*lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
- /*
- * Note: beyond this point, use oldtup not otid to refer to old tuple.
- * otid may very well point at newtup->t_self, which we will overwrite
- * with the new tuple's location, so there's great risk of confusion if we
- * use otid anymore.
- */
-
l2:
checked_lockers = false;
locker_remains = false;
- result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
+ result = HeapTupleSatisfiesUpdate(oldtup, cid, buffer);
/* see below about the "no wait" case */
Assert(result != TM_BeingModified || wait);
@@ -3628,8 +3415,8 @@ l2:
*/
/* must copy state data before unlocking buffer */
- xwait = HeapTupleHeaderGetRawXmax(oldtup.t_data);
- infomask = oldtup.t_data->t_infomask;
+ xwait = HeapTupleHeaderGetRawXmax(oldtup->t_data);
+ infomask = oldtup->t_data->t_infomask;
/*
* Now we have to do something about the existing locker. If it's a
@@ -3669,13 +3456,12 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &oldtup->t_self, *lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
MultiXactIdWait((MultiXactId) xwait, mxact_status, infomask,
- relation, &oldtup.t_self, XLTW_Update,
- &remain);
+ relation, &oldtup->t_self, XLTW_Update, &remain);
checked_lockers = true;
locker_remains = remain != 0;
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -3685,9 +3471,9 @@ l2:
* could update this tuple before we get to this point. Check
* for xmax change, and start over if so.
*/
- if (xmax_infomask_changed(oldtup.t_data->t_infomask,
+ if (xmax_infomask_changed(oldtup->t_data->t_infomask,
infomask) ||
- !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup.t_data),
+ !TransactionIdEquals(HeapTupleHeaderGetRawXmax(oldtup->t_data),
xwait))
goto l2;
}
@@ -3712,8 +3498,8 @@ l2:
* before this one, which are important to keep in case this
* subxact aborts.
*/
- if (!HEAP_XMAX_IS_LOCKED_ONLY(oldtup.t_data->t_infomask))
- update_xact = HeapTupleGetUpdateXid(oldtup.t_data);
+ if (!HEAP_XMAX_IS_LOCKED_ONLY(oldtup->t_data->t_infomask))
+ update_xact = HeapTupleGetUpdateXid(oldtup->t_data);
else
update_xact = InvalidTransactionId;
@@ -3754,9 +3540,9 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &oldtup->t_self, *lockmode,
LockWaitBlock, &have_tuple_lock);
- XactLockTableWait(xwait, relation, &oldtup.t_self,
+ XactLockTableWait(xwait, relation, &oldtup->t_self,
XLTW_Update);
checked_lockers = true;
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -3766,20 +3552,20 @@ l2:
* other xact could update this tuple before we get to this point.
* Check for xmax change, and start over if so.
*/
- if (xmax_infomask_changed(oldtup.t_data->t_infomask, infomask) ||
+ if (xmax_infomask_changed(oldtup->t_data->t_infomask, infomask) ||
!TransactionIdEquals(xwait,
- HeapTupleHeaderGetRawXmax(oldtup.t_data)))
+ HeapTupleHeaderGetRawXmax(oldtup->t_data)))
goto l2;
/* Otherwise check if it committed or aborted */
- UpdateXmaxHintBits(oldtup.t_data, buffer, xwait);
- if (oldtup.t_data->t_infomask & HEAP_XMAX_INVALID)
+ UpdateXmaxHintBits(oldtup->t_data, buffer, xwait);
+ if (oldtup->t_data->t_infomask & HEAP_XMAX_INVALID)
can_continue = true;
}
if (can_continue)
result = TM_Ok;
- else if (!ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid))
+ else if (!ItemPointerEquals(&oldtup->t_self, &oldtup->t_data->t_ctid))
result = TM_Updated;
else
result = TM_Deleted;
@@ -3792,39 +3578,32 @@ l2:
result == TM_Updated ||
result == TM_Deleted ||
result == TM_BeingModified);
- Assert(!(oldtup.t_data->t_infomask & HEAP_XMAX_INVALID));
+ Assert(!(oldtup->t_data->t_infomask & HEAP_XMAX_INVALID));
Assert(result != TM_Updated ||
- !ItemPointerEquals(&oldtup.t_self, &oldtup.t_data->t_ctid));
+ !ItemPointerEquals(&oldtup->t_self, &oldtup->t_data->t_ctid));
}
if (crosscheck != InvalidSnapshot && result == TM_Ok)
{
/* Perform additional check for transaction-snapshot mode RI updates */
- if (!HeapTupleSatisfiesVisibility(&oldtup, crosscheck, buffer))
+ if (!HeapTupleSatisfiesVisibility(oldtup, crosscheck, buffer))
result = TM_Updated;
}
if (result != TM_Ok)
{
- tmfd->ctid = oldtup.t_data->t_ctid;
- tmfd->xmax = HeapTupleHeaderGetUpdateXid(oldtup.t_data);
+ tmfd->ctid = oldtup->t_data->t_ctid;
+ tmfd->xmax = HeapTupleHeaderGetUpdateXid(oldtup->t_data);
if (result == TM_SelfModified)
- tmfd->cmax = HeapTupleHeaderGetCmax(oldtup.t_data);
+ tmfd->cmax = HeapTupleHeaderGetCmax(oldtup->t_data);
else
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &oldtup->t_self, *lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified attrs is passed in and free'd by the caller, or NULL */
- bms_free(interesting_attrs);
return result;
}
@@ -3851,9 +3630,9 @@ l2:
* If the tuple we're updating is locked, we need to preserve the locking
* info in the old tuple's Xmax. Prepare a new Xmax value for this.
*/
- compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
- oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2,
+ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup->t_data),
+ oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2,
xid, *lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3865,12 +3644,12 @@ l2:
* tuple. (In rare cases that might also be InvalidTransactionId and yet
* not have the HEAP_XMAX_INVALID bit set; that's fine.)
*/
- if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
- HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
+ if ((oldtup->t_data->t_infomask & HEAP_XMAX_INVALID) ||
+ HEAP_LOCKED_UPGRADED(oldtup->t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
- xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);
+ xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup->t_data);
if (!TransactionIdIsValid(xmax_new_tuple))
{
@@ -3885,7 +3664,7 @@ l2:
* Note that since we're doing an update, the only possibility is that
* the lockers had FOR KEY SHARE lock.
*/
- if (oldtup.t_data->t_infomask & HEAP_XMAX_IS_MULTI)
+ if (oldtup->t_data->t_infomask & HEAP_XMAX_IS_MULTI)
{
GetMultiXactIdHintBits(xmax_new_tuple, &infomask_new_tuple,
&infomask2_new_tuple);
@@ -3913,7 +3692,7 @@ l2:
* Replace cid with a combo CID if necessary. Note that we already put
* the plain cid into the new tuple.
*/
- HeapTupleHeaderAdjustCmax(oldtup.t_data, &cid, &iscombo);
+ HeapTupleHeaderAdjustCmax(oldtup->t_data, &cid, &iscombo);
/*
* If the toaster needs to be activated, OR if the new tuple will not fit
@@ -3930,12 +3709,12 @@ l2:
relation->rd_rel->relkind != RELKIND_MATVIEW)
{
/* toast table entries should never be recursively toasted */
- Assert(!HeapTupleHasExternal(&oldtup));
+ Assert(!HeapTupleHasExternal(oldtup));
Assert(!HeapTupleHasExternal(newtup));
need_toast = false;
}
else
- need_toast = (HeapTupleHasExternal(&oldtup) ||
+ need_toast = (HeapTupleHasExternal(oldtup) ||
HeapTupleHasExternal(newtup) ||
newtup->t_len > TOAST_TUPLE_THRESHOLD);
@@ -3968,9 +3747,9 @@ l2:
* updating, because the potentially created multixact would otherwise
* be wrong.
*/
- compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
- oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2,
+ compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup->t_data),
+ oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2,
xid, *lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -3980,18 +3759,18 @@ l2:
START_CRIT_SECTION();
/* Clear obsolete visibility flags ... */
- oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
- oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
- HeapTupleClearHotUpdated(&oldtup);
+ oldtup->t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
+ oldtup->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
+ HeapTupleClearHotUpdated(oldtup);
/* ... and store info about transaction updating this tuple */
Assert(TransactionIdIsValid(xmax_lock_old_tuple));
- HeapTupleHeaderSetXmax(oldtup.t_data, xmax_lock_old_tuple);
- oldtup.t_data->t_infomask |= infomask_lock_old_tuple;
- oldtup.t_data->t_infomask2 |= infomask2_lock_old_tuple;
- HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
+ HeapTupleHeaderSetXmax(oldtup->t_data, xmax_lock_old_tuple);
+ oldtup->t_data->t_infomask |= infomask_lock_old_tuple;
+ oldtup->t_data->t_infomask2 |= infomask2_lock_old_tuple;
+ HeapTupleHeaderSetCmax(oldtup->t_data, cid, iscombo);
/* temporarily make it look not-updated, but locked */
- oldtup.t_data->t_ctid = oldtup.t_self;
+ oldtup->t_data->t_ctid = oldtup->t_self;
/*
* Clear all-frozen bit on visibility map if needed. We could
@@ -4014,10 +3793,10 @@ l2:
XLogBeginInsert();
XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
- xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self);
+ xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup->t_self);
xlrec.xmax = xmax_lock_old_tuple;
- xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
- oldtup.t_data->t_infomask2);
+ xlrec.infobits_set = compute_infobits(oldtup->t_data->t_infomask,
+ oldtup->t_data->t_infomask2);
xlrec.flags =
cleared_all_frozen ? XLH_LOCK_ALL_FROZEN_CLEARED : 0;
XLogRegisterData(&xlrec, SizeOfHeapLock);
@@ -4039,7 +3818,7 @@ l2:
if (need_toast)
{
/* Note we always use WAL and FSM during updates */
- heaptup = heap_toast_insert_or_update(relation, newtup, &oldtup, 0);
+ heaptup = heap_toast_insert_or_update(relation, newtup, oldtup, 0);
newtupsize = MAXALIGN(heaptup->t_len);
}
else
@@ -4126,42 +3905,21 @@ l2:
* will include checking the relation level, there is no benefit to a
* separate check for the new tuple.
*/
- CheckForSerializableConflictIn(relation, &oldtup.t_self,
+ CheckForSerializableConflictIn(relation, &oldtup->t_self,
BufferGetBlockNumber(buffer));
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
- if (newbuf == buffer)
- {
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
- use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
- }
+ if ((newbuf == buffer) && hot_allowed)
+ use_hot_update = true;
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4170,8 +3928,7 @@ l2:
* logged. Pass old key required as true only if the replica identity key
* columns are modified or it has external data.
*/
- old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- rep_id_key_required,
+ old_key_tuple = ExtractReplicaIdentity(relation, oldtup, rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4194,7 +3951,7 @@ l2:
if (use_hot_update)
{
/* Mark the old tuple as HOT-updated */
- HeapTupleSetHotUpdated(&oldtup);
+ HeapTupleSetHotUpdated(oldtup);
/* And mark the new tuple as heap-only */
HeapTupleSetHeapOnly(heaptup);
/* Mark the caller's copy too, in case different from heaptup */
@@ -4203,7 +3960,7 @@ l2:
else
{
/* Make sure tuples are correctly marked as not-HOT */
- HeapTupleClearHotUpdated(&oldtup);
+ HeapTupleClearHotUpdated(oldtup);
HeapTupleClearHeapOnly(heaptup);
HeapTupleClearHeapOnly(newtup);
}
@@ -4212,17 +3969,17 @@ l2:
/* Clear obsolete visibility flags, possibly set by ourselves above... */
- oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
- oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
+ oldtup->t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
+ oldtup->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
/* ... and store info about transaction updating this tuple */
Assert(TransactionIdIsValid(xmax_old_tuple));
- HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple);
- oldtup.t_data->t_infomask |= infomask_old_tuple;
- oldtup.t_data->t_infomask2 |= infomask2_old_tuple;
- HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
+ HeapTupleHeaderSetXmax(oldtup->t_data, xmax_old_tuple);
+ oldtup->t_data->t_infomask |= infomask_old_tuple;
+ oldtup->t_data->t_infomask2 |= infomask2_old_tuple;
+ HeapTupleHeaderSetCmax(oldtup->t_data, cid, iscombo);
/* record address of new tuple in t_ctid of old one */
- oldtup.t_data->t_ctid = heaptup->t_self;
+ oldtup->t_data->t_ctid = heaptup->t_self;
/* clear PD_ALL_VISIBLE flags, reset all visibilitymap bits */
if (PageIsAllVisible(BufferGetPage(buffer)))
@@ -4255,12 +4012,12 @@ l2:
*/
if (RelationIsAccessibleInLogicalDecoding(relation))
{
- log_heap_new_cid(relation, &oldtup);
+ log_heap_new_cid(relation, oldtup);
log_heap_new_cid(relation, heaptup);
}
recptr = log_heap_update(relation, buffer,
- newbuf, &oldtup, heaptup,
+ newbuf, oldtup, heaptup,
old_key_tuple,
all_visible_cleared,
all_visible_cleared_new);
@@ -4285,7 +4042,7 @@ l2:
* both tuple versions in one call to inval.c so we can avoid redundant
* sinval messages.)
*/
- CacheInvalidateHeapTuple(relation, &oldtup, heaptup);
+ CacheInvalidateHeapTuple(relation, oldtup, heaptup);
/* Now we can release the buffer(s) */
if (newbuf != buffer)
@@ -4300,7 +4057,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &oldtup->t_self, *lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4314,32 +4071,9 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified attrs is passed in and free'd by the caller, or NULL */
- bms_free(interesting_attrs);
-
return TM_Ok;
}
@@ -4348,7 +4082,7 @@ l2:
* Confirm adequate lock held during heap_update(), per rules from
* README.tuplock section "Locking to write inplace-updated tables".
*/
-static void
+void
check_lock_if_inplace_updateable_rel(Relation relation,
const ItemPointerData *otid,
HeapTuple newtup)
@@ -4510,6 +4244,155 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
}
+/*
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs, bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(mix_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes?
+ */
+ if (bms_is_subset(mix_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+bool
+HeapUpdateRequiresReplicaId(Relation relation, const Bitmapset *mix_attrs,
+ HeapTupleData *tuple)
+{
+ bool rep_id_key_required;
+ Bitmapset *rid_attrs,
+ *idx_attrs;
+
+ rid_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_IDENTITY_KEY);
+
+ if (bms_is_empty(rid_attrs))
+ {
+ bms_free(rid_attrs);
+ return false;
+ }
+
+ idx_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * ExtractReplicatIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the unmodified indexed
+ * attributes in the old tuple are stored externally and used as a replica
+ * identity.
+ */
+ rep_id_key_required = bms_overlap(mix_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx;
+
+ /* Check only unmodified indexed replica identity key attributes */
+ idx_attrs = bms_del_members(idx_attrs, mix_attrs);
+ idx_attrs = bms_int_members(idx_attrs, rid_attrs);
+
+ attidx = -1;
+ while ((attidx = bms_next_member(idx_attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into interesting_attrs in
+ * relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(tuple, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+ }
+
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+
+ return rep_id_key_required;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *mix_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(mix_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
/*
* Check which columns are being updated.
*
@@ -4520,12 +4403,10 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
* listed as interesting) of the old tuple is a member of external_cols and is
* stored externally.
*/
-static Bitmapset *
+Bitmapset *
HeapDetermineColumnsInfo(Relation relation,
Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+ HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
Bitmapset *modified = NULL;
@@ -4567,10 +4448,11 @@ HeapDetermineColumnsInfo(Relation relation,
}
/*
- * Extract the corresponding values. XXX this is pretty inefficient
- * if there are many indexed columns. Should we do a single
- * heap_deform_tuple call on each tuple, instead? But that doesn't
- * work for system columns ...
+ * Extract the corresponding values.
+ *
+ * XXX this is pretty inefficient if there are many indexed columns.
+ * Should we do a single heap_deform_tuple call on each tuple,
+ * instead? But that doesn't work for system columns ...
*/
value1 = heap_getattr(oldtup, attrnum, tupdesc, &isnull1);
value2 = heap_getattr(newtup, attrnum, tupdesc, &isnull2);
@@ -4581,48 +4463,146 @@ HeapDetermineColumnsInfo(Relation relation,
modified = bms_add_member(modified, attidx);
continue;
}
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
}
return modified;
}
/*
- * simple_heap_update - replace a tuple
- *
- * This routine may be used to update a tuple when concurrent updates of
- * the target tuple are not expected (for example, because we have a lock
- * on the relation associated with the tuple). Any failure is reported
- * via ereport().
+ * This routine may be used to update a tuple when concurrent updates of the
+ * target tuple are not expected (for example, because we have a lock on the
+ * relation associated with the tuple). Any failure is reported via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ Buffer buffer;
+ Buffer vmbuffer = InvalidBuffer;
+ Page page;
+ BlockNumber block;
+ Bitmapset *sum_attrs,
+ *mix_attrs,
+ *idx_attrs;
+ ItemId lp;
+ HeapTupleData oldtup;
+ bool hot_allowed;
+ bool summarized_only;
+ bool rep_id_key_required = false;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode,
- NULL, false, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
+ Assert(HeapTupleHeaderGetNatts(tuple->t_data) <=
+ RelationGetNumberOfAttributes(relation));
+
+ /*
+ * Forbid this during a parallel operation, lest it allocate a combo CID.
+ * Other workers might need that combo CID for visibility checks, and we
+ * have no provision for broadcasting it to them.
+ */
+ if (IsInParallelMode())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("cannot update tuples during a parallel operation")));
+
+#ifdef USE_ASSERT_CHECKING
+ check_lock_if_inplace_updateable_rel(relation, otid, tuple);
+#endif
+
+ /*
+ * We must fetch these bitmaps of attributes from relcache to be checked
+ * for various operations below before obtaining a buffer lock because if
+ * we are doing an update on one of the relevant system catalogs we could
+ * deadlock if we try to fetch them later on. Relcache will return copies
+ * of each bitmap, so we need not worry about relcache flush happening
+ * midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_INDEXED);
+ sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ block = ItemPointerGetBlockNumber(otid);
+ INJECTION_POINT("heap_update-before-pin", NULL);
+ buffer = ReadBuffer(relation, block);
+ page = BufferGetPage(buffer);
+
+ /*
+ * Before locking the buffer, pin the visibility map page if it appears to
+ * be necessary. Since we haven't got the lock yet, someone else might be
+ * in the middle of changing this, so we'll need to recheck after we have
+ * the lock.
+ */
+ if (PageIsAllVisible(page))
+ visibilitymap_pin(relation, block, &vmbuffer);
+
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+ lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+
+ /*
+ * Usually, a buffer pin and/or snapshot blocks pruning of otid, ensuring
+ * we see LP_NORMAL here. When the otid origin is a syscache, we may have
+ * neither a pin nor a snapshot. Hence, we may see other LP_ states, each
+ * of which indicates concurrent pruning.
+ *
+ * Failing with TM_Updated would be most accurate. However, unlike other
+ * TM_Updated scenarios, we don't know the successor ctid in LP_UNUSED and
+ * LP_DEAD cases. While the distinction between TM_Updated and TM_Deleted
+ * does matter to SQL statements UPDATE and MERGE, those SQL statements
+ * hold a snapshot that ensures LP_NORMAL. Hence, the choice between
+ * TM_Updated and TM_Deleted affects only the wording of error messages.
+ * Settle on TM_Deleted, for two reasons. First, it avoids complicating
+ * the specification of when tmfd->ctid is valid. Second, it creates
+ * error log evidence that we took this branch.
+ *
+ * Since it's possible to see LP_UNUSED at otid, it's also possible to see
+ * LP_NORMAL for a tuple that replaced LP_UNUSED. If it's a tuple for an
+ * unrelated row, we'll fail with "duplicate key value violates unique".
+ * XXX if otid is the live, newer version of the newtup row, we'll discard
+ * changes originating in versions of this catalog row after the version
+ * the caller got from syscache. See syscache-update-pruned.spec.
+ */
+ if (!ItemIdIsNormal(lp))
+ {
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ UnlockReleaseBuffer(buffer);
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+ *update_indexes = TU_None;
+
+ bms_free(idx_attrs);
+ bms_free(sum_attrs);
+ /* mix_attrs not yet initialized */
+
+ elog(ERROR, "tuple concurrently deleted");
+ }
+
+ /*
+ * Partially construct the oldtup for HeapDetermineColumnsInfo to work and
+ * then pass that on to heap_update.
+ */
+ oldtup.t_tableOid = RelationGetRelid(relation);
+ oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+ oldtup.t_len = ItemIdGetLength(lp);
+ oldtup.t_self = *otid;
+
+ mix_attrs = HeapDetermineColumnsInfo(relation, idx_attrs, &oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+ rep_id_key_required = HeapUpdateRequiresReplicaId(relation, mix_attrs, &oldtup);
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+
+ result = heap_update(relation, &oldtup, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, &lockmode, buffer, page, block, lp, hot_allowed,
+ vmbuffer, rep_id_key_required);
+
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4632,6 +4612,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4646,8 +4630,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
elog(ERROR, "unrecognized heap_update status: %u", result);
break;
}
-}
+ bms_free(idx_attrs);
+ bms_free(sum_attrs);
+}
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 2690593fe4c..2eb06b6d593 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -44,6 +44,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,18 +317,81 @@ static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *modified_attrs, TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
- bool shouldFree = true;
- HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool shouldFree = false;
+ HeapTuple tuple;
+ bool rep_id_key_required = false;
+ bool hot_allowed;
+ bool summarized_only;
+ HeapTupleData oldtup;
+ Buffer buffer;
+ Buffer vmbuffer = InvalidBuffer;
+ Page page;
+ BlockNumber block;
+ ItemId lp;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+ /* Cheap, simplistic check that the tuple matches the rel's rowtype. */
+ Assert(HeapTupleHeaderGetNatts(tuple->t_data) <=
+ RelationGetNumberOfAttributes(relation));
+
+ /*
+ * Forbid this during a parallel operation, lest it allocate a combo CID.
+ * Other workers might need that combo CID for visibility checks, and we
+ * have no provision for broadcasting it to them.
+ */
+ if (IsInParallelMode())
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
+ errmsg("cannot update tuples during a parallel operation")));
+
+#ifdef USE_ASSERT_CHECKING
+ check_lock_if_inplace_updateable_rel(relation, otid, tuple);
+#endif
+
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+
+ block = ItemPointerGetBlockNumber(otid);
+ INJECTION_POINT("heap_update-before-pin", NULL);
+ buffer = ReadBuffer(relation, block);
+ page = BufferGetPage(buffer);
+
+ /*
+ * Before locking the buffer, pin the visibility map page if it appears to
+ * be necessary. Since we haven't got the lock yet, someone else might be
+ * in the middle of changing this, so we'll need to recheck after we have
+ * the lock.
+ */
+ if (PageIsAllVisible(page))
+ visibilitymap_pin(relation, block, &vmbuffer);
+
+ LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+
+ lp = PageGetItemId(page, ItemPointerGetOffsetNumber(otid));
+
+ Assert(ItemIdIsNormal(lp));
+
+ oldtup.t_tableOid = RelationGetRelid(relation);
+ oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
+ oldtup.t_len = ItemIdGetLength(lp);
+ oldtup.t_self = *otid;
+
+ rep_id_key_required = HeapUpdateRequiresReplicaId(relation, mix_attrs, &oldtup);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
- result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, modified_attrs, true, update_indexes);
+ result = heap_update(relation, &oldtup, tuple, cid, crosscheck, wait, tmfd,
+ lockmode, buffer, page, block, lp, hot_allowed,
+ vmbuffer, rep_id_key_required);
+
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -335,21 +399,20 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
*
* Note: heap_update returns the tid (location) of the new tuple in the
* t_self field.
- *
- * If the update is not HOT, we must update all indexes. If the update is
- * HOT, it could be that we updated summarized columns, so we either
- * update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ {
+ *update_indexes = TU_All;
+ }
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 18796baed28..082dee94422 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -213,7 +213,7 @@ ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
Relation relation = resultRelInfo->ri_RelationDesc;
TupleDesc tupdesc = RelationGetDescr(relation);
Bitmapset *attrs,
- *mix_attrs;
+ *mix_attrs = NULL;
/* If no indexes, we're done */
if (resultRelInfo->ri_NumIndices == 0)
@@ -227,8 +227,9 @@ ExecCheckIndexedAttrsForChanges(ResultRelInfo *resultRelInfo,
attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
- /* Find out which, if any, modified indexed attributes changed */
- mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+ /* Find out which, if any, modified indexed attributes changed value */
+ if (!bms_is_empty(attrs))
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
bms_free(attrs);
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 547cf1d054d..f30505d8ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,7 +2475,6 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
@@ -5277,8 +5276,7 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
* INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
@@ -5302,9 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
Bitmapset *indexedattrs; /* columns referenced by indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5323,8 +5320,6 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
case INDEX_ATTR_BITMAP_INDEXED:
@@ -5371,7 +5366,6 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
indexedattrs = NULL;
foreach(l, indexoidlist)
@@ -5432,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5441,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5504,15 +5498,19 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
- /* indexedattrs not yet initialized */
+ bms_free(indexedattrs);
goto restart;
}
- /* Set indexed attributes to track all referenced attributes */
- indexedattrs = bms_union(hotblockingattrs, summarizedattrs);
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
@@ -5522,8 +5520,6 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
bms_free(relation->rd_indexedattr);
@@ -5540,7 +5536,6 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
@@ -5555,8 +5550,6 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
case INDEX_ATTR_BITMAP_INDEXED:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index a56f3d1f378..f1858d14b42 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -364,12 +364,11 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
TM_FailureData *tmfd, bool changingPart);
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
-extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
+extern TM_Result heap_update(Relation relation, HeapTuple oldtup, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- const Bitmapset *mix_attrs, bool mix_attrs_valid,
- TU_UpdateIndexes *update_indexes);
+ TM_FailureData *tmfd, const LockTupleMode *lockmode,
+ Buffer buffer, Page page, BlockNumber block, ItemId lp,
+ bool hot_allowed, Buffer vmbuffer, bool rep_id_key_required);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -431,6 +430,22 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only);
+extern bool HeapUpdateRequiresReplicaId(Relation relation, const Bitmapset *mix_attrs,
+ HeapTupleData *tuple);
+extern Bitmapset *HeapDetermineColumnsInfo(Relation relation,
+ Bitmapset *interesting_cols,
+ HeapTuple oldtup, HeapTuple newtup);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *mix_attrs);
+#ifdef USE_ASSERT_CHECKING
+extern void check_lock_if_inplace_updateable_rel(Relation relation,
+ const ItemPointerData *otid,
+ HeapTuple newtup);
+#endif
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index df5426fd7fb..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,7 +162,6 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 5834ab7b903..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,6 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 249e68be654..6aea0346ee2 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -260,7 +260,7 @@ MERGE INTO gtestm t USING gtestm AS s ON 2 * t.a = s.b WHEN MATCHED THEN DELETE
DROP TABLE gtestm;
-- views
CREATE VIEW gtest1v AS SELECT * FROM gtest1;
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+---
3 | 6
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 98dee63b50a..ef98fd0cccf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -959,16 +959,24 @@ NOTICE: main_view BEFORE UPDATE STATEMENT (before_view_upd_stmt)
NOTICE: main_view AFTER UPDATE STATEMENT (after_view_upd_stmt)
UPDATE 0
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
-NOTICE: OLD: (21,10)
-NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (20,31)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
+NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
+NOTICE: OLD: (21,10)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 32;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (21,32)
NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
-DELETE 3
+DELETE 1
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index 81152b39a79..1142bb93525 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -115,7 +115,7 @@ DROP TABLE gtestm;
-- views
CREATE VIEW gtest1v AS SELECT * FROM gtest1;
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
INSERT INTO gtest1v VALUES (4, 8); -- error
INSERT INTO gtest1v VALUES (5, DEFAULT); -- ok
INSERT INTO gtest1v VALUES (6, 66), (7, 77); -- error
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index ea39817ee3d..6ceb61608ae 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -660,7 +660,9 @@ UPDATE main_view SET b = 32 WHERE a = 21 AND b = 31 RETURNING a, b;
UPDATE main_view SET b = 0 WHERE false;
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+DELETE FROM main_view WHERE a = 21 AND b = 32;
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
\set QUIET true
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-02-25 21:03 ` Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Jeff Davis @ 2026-02-25 21:03 UTC (permalink / raw)
To: Greg Burd <[email protected]>; pgsql-hackers
On Mon, 2026-02-23 at 14:23 -0500, Greg Burd wrote:
> Hello.
>
> Attached is a new patch set that fixes a few issues identified in the
> last set.
>
> 0001 - creates a new way to identify the set of attributes both
> modified by the update and referenced by one or more indexes on the
> target relation being updated. This patch keeps the
> HeapDetermineColumnsInfo() path within heap_update() for calls from
> simple_heap_update() when modified_attrs_valid is set to false. I'm
> not a huge fan of this, but it does serve as a way to illustrate a
> minimal set of changes easing review a bit.
>
> 0002 - splits out the top portion of heap_update() into both
> heapam_tuple_update() and simple_heap_update(), adds a few helper
> functions and tries to reduce repeated code. The goal here was to
> remove some of the mess related to the various bitmaps used to make
> decisions during the update.
IIUC, a minimal version of this patch set might be:
* add 'mix_attrs' bitmap to API for table_tuple_update
* have executor calculate the bitmap, using the old slot to see if
expression results have changed
* have simple_heap_update calculate the bitmap using heap_fetch to get
the old tuple (would be a redundant pin, but not sure if that's a
problem or not)
And leave the rest mostly unchanged.
Did I miss something? If not, it would be nice to see such a minimal
patch and/or understand why we don't follow that approach.
Regards,
Jeff Davis
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-02-26 22:08 ` Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-02-26 22:08 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; +Cc: pgsql-hackers
On Wed, Feb 25, 2026, at 4:03 PM, Jeff Davis wrote:
> On Mon, 2026-02-23 at 14:23 -0500, Greg Burd wrote:
>> Hello.
>>
>> Attached is a new patch set that fixes a few issues identified in the
>> last set.
>>
>> 0001 - creates a new way to identify the set of attributes both
>> modified by the update and referenced by one or more indexes on the
>> target relation being updated. This patch keeps the
>> HeapDetermineColumnsInfo() path within heap_update() for calls from
>> simple_heap_update() when modified_attrs_valid is set to false. I'm
>> not a huge fan of this, but it does serve as a way to illustrate a
>> minimal set of changes easing review a bit.
>>
>> 0002 - splits out the top portion of heap_update() into both
>> heapam_tuple_update() and simple_heap_update(), adds a few helper
>> functions and tries to reduce repeated code. The goal here was to
>> remove some of the mess related to the various bitmaps used to make
>> decisions during the update.
>
> IIUC, a minimal version of this patch set might be:
>
> * add 'mix_attrs' bitmap to API for table_tuple_update
> * have executor calculate the bitmap, using the old slot to see if
> expression results have changed
> * have simple_heap_update calculate the bitmap using heap_fetch to get
> the old tuple (would be a redundant pin, but not sure if that's a
> problem or not)
>
> And leave the rest mostly unchanged.
>
> Did I miss something? If not, it would be nice to see such a minimal
> patch and/or understand why we don't follow that approach.
Hey Jeff, thanks for sticking with me on this journey. :)
I think your approach makes sense, here's a summary of what's attached (v30) and at the bottom of this email are some early performance measurements.
* in the executor
* identify the mix_attrs
* one new argument to table_tuple_update( ..., mix_attrs, ...)
* heapam_tuple_update( ..., mix_attrs, ...)
* calculates hot_allowed using mix_attrs
* calculates lockmode using key_attrs and mix_attrs
* two new arguments to heap_update(..., mix_attrs, hot_allowed, ...)
* on return determines what to do with TU_UpdateIndexes
* heap_update( ..., mix_attrs, hot_allowed, ... )
* takes buffer lock
* calculates rep_id_key_req, passes that to ExtractReplicaId()
* if newbuf==buffer && hot_allowed -> HOT
* releases buffer lock
* simple_heap_update( ... no changes to API ... )
* now needs to compare old/new tuples *BEFORE* calling heap_update()
* uses heap_fetch() to turn otid -> oldtuple
* calls HeapUpdateModIdxAttrs()
* calculates lockmode
* calculates if hot is allowed
* calls into heap_update(..., mix_attrs, hot_allowed, ...)
* on return determines what to do with TU_UpdateIndexes
* renamed HeapDetermineColumnsInfo() to HeapUpdateModIdxAttrs()
* removed logic related to rep_id_key_req, that is in heap_update()
> Regards,
> Jeff Davis
There are a pair of functions now for finding "mix_attrs" that replace the singular HeapDetermineColumnsInfo() function:
ExecUpdateModIdxAttrs()
HeapUpdateModIdxAttrs()
These do essentially the same thing, only with different available information and where the latter is called within the context of a buffer lock.
In ExecUpdateModIdxAttrs() we compare two TupleTableSlots, the existing and the plan slot, using a new helper function ExecCompareSlotAttrs(). This gives us the "mix_attrs" (modified indexed attributes) bitmap. In this function we have the ResultRelInfo and EState so it is possible to use the ExecGetAllUpdatedCols() function to potentially reduce the set of attributes we need to check for changes. The function only reviews indexed attributes that also exist in that set, which led to an interesting discovery... see below.
In HeapUpdateModIdxAttrs() we start with an old TID and a HeapTuple, so first we need to fetch that old HeapTuple so we can compare old/new datum and find any modified indexed attributes.
A new function HeapUpdateHotAllowable() is used in heapam_tuple_update() and simple_heap_update() encapsulating that logic in one place including the "only summarized" test. Heap will use the HOT path if that function returns true and the tuple fits on the same buffer page. No logic changes, just moved the decision making around a bit.
A new function HeapUpdateDetermineLockMode() is used to choose exclusive or shared lock mode ahead of calling into heap_update(). Again, same logic as before.
It turns out that ExecGetAllUpdatedCols() doesn't get all updated columns as the name advertises. It finds all the attributes (columns) that were mentioned in the UPDATE statement or any triggers that will fire during the update, but that overlooks any attributes changed within before-row triggers that invoke functions which call heap_modify_tuple(). This happens when tsvector_update_trigger() is called in tsearch.sql, the code modifies an indexed attribute not mentioned in the UPDATE or triggers. I've fixed this oversight and to me this makes sense, but tell me if you disagree.
generated_virtual.sql and updatable_views.sql had tests where the scan order of the tuples on the pages seems to now be non-deterministic. I've updated those tests to ensure stability. AFAICT my changes in this patch should not change any HOT decision or any replica identity key WAL logging decision, but somehow they uncovered this instability. Or there is a bug, but I've not spotted that as yet. Feel free to point out the obvious if you do. :)
Just to be clear, this patch doesn't include any of $subject. In tests I've not measured any performance regressions, and that's not surprising as the sum total computational effort is nearly identical before/after the patch. Yes, the patch moves some computation outside the buffer lock on the heap page and that might open the door to more concurrency or slightly different behavior when updates are highly contentious. There may be more occasions where TU_Updated is returned, or some speed improvements when updating more than one row at a time.
My hope is to get this into a shape where we're comfortable with these changes and it can be committed even though none of $subject is achieved because it does lay some ground work for those future HOT expanding and WARM/PHOT enabling ideas I've been working on.
Things on my TODO list, short term:
* Re-introduce the index AM's new function to allow indexes to play a role in when they require new index entries
* I'm not a fan of TU_UpdateIndexes, it's *very* heap-specific, I'd like to eliminate this
Longer term, so as to return to working on $subject:
* Allow types to indicate that they maintain "sub-attributes" that might be used to form index key datum
* Identify in the executor for each attribute SET if a) it has sub-attributes, and if so b) does the new value for the attribute change any sub-attribute that is used to form index keys
* With the previous two ideas I think we can safely re-introduce HOT for expressions without re-evaluating the expressions and comparing index datum (read: without the overhead I've measured in the past)
* PHOT or WARM or <other nifty name here>, teach heap how to only update changed indexes (rather than the all or nothing approach we have today)
I look forward to community feedback.
best.
-greg
----------------- PERFORMANCE TEST RESULTS
DISCLAIMER: "claude" and I worked on the perf-cf5556-v30.sh script, as I'm sure is apparent. I think people call it "vibe coding" when you try to contain the enthusiasm of your friendly LLM and direct it toward some goal. IME it's like trying to control a room full of dangerously knowledgeable and overly eager to please kindergarten-aged parrots. I admit to needing more time to review the script, the test cases, and the results to fully explore these changes and validate that they actually measure something meaningful. If you find something silly or a glaring mistake, go easy on me (and "claude") but do let me (us?) know.
$ ./perf-cf5556-v30.sh
Checking for running PostgreSQL instances...
✓ No other PostgreSQL instances running
╔════════════════════════════════════════════════════════════════════╗
║ CF-5556 PERFORMANCE TEST SUITE
╚════════════════════════════════════════════════════════════════════╝
Configuration:
Test duration : 60s
Clients / Jobs : 8 / 4
Results directory : /tmp/cf5556-perf-results/20260226_150623
Setup extensions : NO
Test extensions : NO
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
BUILDING AND TESTING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Baseline: d0833fdae7e (origin/master)
Patches: 1 patch(es) to test cumulatively
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
VERSION: baseline (d0833fdae7e)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Building PostgreSQL...
✓ PostgreSQL built
Starting server...
✓ Server started
shared_preload_libraries: pg_stat_statements
Setting up test databases...
Creating driver_license table (100k rows, 5 BTREE indexes)...
✓ driver_license ready (100000 rows)
Creating t_jsonb table (10k rows, 3 BTREE expression indexes)...
✓ t_jsonb ready (10k rows)
Creating t_gin table (10k rows, GIN index — control)...
✓ t_gin ready (10k rows, GIN — control)
Running isolated tests (60s each)...
license_write_single TPS: 69816.475176 Lat: 0.115ms
jsonb_write_single TPS: 56571.238721 Lat: 0.141ms
jsonb_write_batch TPS: 3606.918699 Lat: 2.218ms
gin_write_single TPS: 64040.989986 Lat: 0.125ms
pgbench_tpcb-like TPS: 20133.238199 Lat: 0.397ms
pgbench_simple-update TPS: 19219.239741 Lat: 0.416ms
Running concurrent read/write tests...
Running concurrent test: 2 writers + 6 readers...
jsonb_2w_6r Write: 14776.018733 TPS Read: 80627.112253 TPS
Write: 0.135 ms Read: 0.074 ms
Running concurrent test: 4 writers + 4 readers...
jsonb_4w_4r Write: 29399.436764 TPS Read: 52688.734439 TPS
Write: 0.136 ms Read: 0.076 ms
Running concurrent test: 6 writers + 2 readers...
jsonb_6w_2r Write: 43151.037340 TPS Read: 25295.378828 TPS
Write: 0.139 ms Read: 0.079 ms
Running concurrent test: 2 writers + 6 readers...
license_2w_6r Write: 18891.968944 TPS Read: 74113.652071 TPS
Write: 0.106 ms Read: 0.081 ms
Running concurrent test: 4 writers + 4 readers...
license_4w_4r Write: 37305.015382 TPS Read: 48489.296785 TPS
Write: 0.107 ms Read: 0.082 ms
Running concurrent test: 6 writers + 2 readers...
license_6w_2r Write: 54403.687584 TPS Read: 23519.461792 TPS
Write: 0.110 ms Read: 0.085 ms
Stopping server...
✓ Server stopped
fatal: a branch named 'cf-5556-test-all-patches' already exists
Applying all 1 patches cumulatively...
Applying v20260226b-0001-Idenfity-modified-indexed-attributes-in-t.patch...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
VERSION: patched (526c2a8733d)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Building PostgreSQL...
✓ PostgreSQL built
Starting server...
✓ Server started
shared_preload_libraries: pg_stat_statements
Setting up test databases...
Creating driver_license table (100k rows, 5 BTREE indexes)...
✓ driver_license ready (100000 rows)
Creating t_jsonb table (10k rows, 3 BTREE expression indexes)...
✓ t_jsonb ready (10k rows)
Creating t_gin table (10k rows, GIN index — control)...
✓ t_gin ready (10k rows, GIN — control)
Running isolated tests (60s each)...
license_write_single TPS: 70093.895595 Lat: 0.114ms
jsonb_write_single TPS: 56751.907107 Lat: 0.141ms
jsonb_write_batch TPS: 4141.086856 Lat: 1.932ms
gin_write_single TPS: 63845.491951 Lat: 0.125ms
pgbench_tpcb-like TPS: 19911.229480 Lat: 0.402ms
pgbench_simple-update TPS: 19840.566625 Lat: 0.403ms
Running concurrent read/write tests...
Running concurrent test: 2 writers + 6 readers...
jsonb_2w_6r Write: 14821.571968 TPS Read: 81057.390470 TPS
Write: 0.135 ms Read: 0.074 ms
Running concurrent test: 4 writers + 4 readers...
jsonb_4w_4r Write: 29428.063408 TPS Read: 52533.626129 TPS
Write: 0.136 ms Read: 0.076 ms
Running concurrent test: 6 writers + 2 readers...
jsonb_6w_2r Write: 43204.958598 TPS Read: 25301.939523 TPS
Write: 0.139 ms Read: 0.079 ms
Running concurrent test: 2 writers + 6 readers...
license_2w_6r Write: 18958.924353 TPS Read: 74548.095482 TPS
Write: 0.105 ms Read: 0.080 ms
Running concurrent test: 4 writers + 4 readers...
license_4w_4r Write: 37185.369146 TPS Read: 48580.299936 TPS
Write: 0.108 ms Read: 0.082 ms
Running concurrent test: 6 writers + 2 readers...
license_6w_2r Write: 54461.228141 TPS Read: 23692.388873 TPS
Write: 0.110 ms Read: 0.084 ms
Stopping server...
✓ Server stopped
╔════════════════════════════════════════════════════════════════════╗
║ RESULTS SUMMARY
╚════════════════════════════════════════════════════════════════════╝
═══════════════════════════════════════════════════════════════════════════════════
ISOLATED WORKLOAD COMPARISON (Patched vs Baseline)
═══════════════════════════════════════════════════════════════════════════════════
Table Workload Baseline TPS Patched TPS Δ%
───────────────────────────────────────────────────────────────────────────────────
gin write_single 64041.0 63845.5 -0.3%
jsonb write_batch 3606.9 4141.1 +14.8%
jsonb write_single 56571.2 56751.9 +0.3%
license write_single 69816.5 70093.9 +0.4%
pgbench simple-update 19219.2 19840.6 +3.2%
pgbench tpcb-like 20133.2 19911.2 -1.1%
───────────────────────────────────────────────────────────────────────────────────
═══════════════════════════════════════════════════════════════════════════════════
CONCURRENT WORKLOAD ANALYSIS (Write Pressure Impact on Reads)
═══════════════════════════════════════════════════════════════════════════════════
Table Write:Read Base Write Patch Write Base Read Patch Read
───────────────────────────────────────────────────────────────────────────────────
jsonb 2w_6r 14776.0 14821.6 80627.1 81057.4
license 2w_6r 18892.0 18958.9 74113.7 74548.1
jsonb 4w_4r 29399.4 29428.1 52688.7 52533.6
license 4w_4r 37305.0 37185.4 48489.3 48580.3
jsonb 6w_2r 43151.0 43205.0 25295.4 25301.9
license 6w_2r 54403.7 54461.2 23519.5 23692.4
───────────────────────────────────────────────────────────────────────────────────
Output files:
/tmp/cf5556-perf-results/20260226_150623/results.txt (raw results)
/tmp/cf5556-perf-results/20260226_150623/*_server.log (server startup/error logs)
/tmp/cf5556-perf-results/20260226_150623/*_setup.log (database setup logs)
/tmp/cf5556-perf-results/20260226_150623/*_build.log (build logs)
/tmp/cf5556-perf-results/20260226_150623/*_*.txt (pgbench output)
/tmp/cf5556-perf-results/20260226_150623/*_*.sql (test queries)
Cleaning up...
✓ Cleanup complete
Attachments:
[text/x-patch] v30-0001-Idenfity-modified-indexed-attributes-in-t.patch (57.1K, 2-v30-0001-Idenfity-modified-indexed-attributes-in-t.patch)
download | inline diff:
From 44ed75bd2b22f07401ca5270911f793426926f41 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v30] Idenfity modified indexed attributes in the
executor on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes" (mix_attrs,
was known as modified_attrs) now happens outside the buffer lock and can
inform other decisions unrelated to heap.
ExecUpdateModIdxAttrs() replaces HeapDeterminesColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecUpdateModIdxAttrs() in turn uses ExecCompareSlotAttrs() to identify
which attributes have changed and then intersects that with the set of
indexed attributes to identify the modified indexed set, the mix_attrs.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involved in the decision to include the replica identity key or not.
This moved into heap_update() and out of HeapDetermineColumnsInfo()
which has been renamed to HeapUpdateModIdxAttrs() as it is still
required within simple_heap_update() to be able to identify mix_attrs
given only an old TID and a new HeapTuple.
Updates stemming from logical replication also use the new
ExecUpdateModIdxAttrs() in ExecSimpleRelationUpdate().
This patch also introduces a few helper functions: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a boolean indicating if the update can permit HOT or not. If during
heap_update() the new tuple will fit on the same page and that boolean
is true, the update is HOT. None of the logic related to when HOT is
allowed has changed.
Triggers are free to use heap_modify_tuple() and update attributes not
found in the UPDATE statement or triggers that fire due to an UPDATE.
When that happens the executor has no knowledge of those changes. This
forced HeapDetermineColumnsInfo() to scan all indexed attributes on a
relation rather than only the intersection of indexed and those
identified by ExecGetAllUpdatedCols(). This occurs in at least one test
that uses the tsvector_update_trigger() function (tsearch.sql).
ExecBRUpdateTriggers() has been changed to identify changes to indexed
columns not found by ExecGetAllUpdateCols() and add those attributes to
ri_extraUpdatedCols.
Three tests were adjusted to avoid instability due to tuple ordering
during heap page scans. This avoids non-deterministic results.
---
src/backend/access/heap/heapam.c | 481 +++++++++++-------
src/backend/access/heap/heapam_handler.c | 32 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 20 +-
src/backend/executor/execReplication.c | 7 +-
src/backend/executor/execTuples.c | 78 +++
src/backend/executor/nodeModifyTable.c | 93 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 9 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/updatable_views.sql | 2 +-
17 files changed, 576 insertions(+), 228 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98d53caeea8..8acfee942e8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,20 +37,26 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -67,11 +73,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3300,7 +3303,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3310,17 +3313,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *rid_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3339,13 +3338,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3376,33 +3374,14 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well
+ * as the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * NOTE: relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin", NULL);
@@ -3456,20 +3435,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3480,16 +3456,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicatIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(mix_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, mix_attrs);
+ attrs = bms_int_members(attrs, rid_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3502,9 +3521,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3521,7 +3539,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3532,7 +3550,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* with the new tuple's location, so there's great risk of confusion if we
* use otid anymore.
*/
-
l2:
checked_lockers = false;
locker_remains = false;
@@ -3600,7 +3617,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3609,7 +3626,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3694,7 +3711,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3754,17 +3771,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3794,7 +3808,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3911,7 +3925,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4071,37 +4085,16 @@ l2:
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
- if (newbuf == buffer)
- {
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
- use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
- }
+ if ((newbuf == buffer) && hot_allowed)
+ use_hot_update = true;
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4111,8 +4104,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4241,7 +4233,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4255,31 +4247,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4452,28 +4425,113 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(mix_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes? If that's the case we need to update those
+ * indexes, but this can be a HOT update.
+ */
+ if (bms_is_subset(mix_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *mix_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(mix_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4489,7 +4547,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
@@ -4502,7 +4560,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
}
@@ -4518,29 +4576,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ mix_attrs = bms_add_member(mix_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return mix_attrs;
}
/*
@@ -4552,17 +4593,108 @@ HeapDetermineColumnsInfo(Relation relation,
* via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *mix_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("mix_attrs") so as to see if a HOT update is allowable or
+ * not. When updating heap tuples via execution of UPDATE statements this
+ * set is constructed before calling into the table AM's tuple_update()
+ * function by the function ExecUpdateModIdxAttrs() which compares the
+ * old/new TupleTableSlots. However, here we have the old TID and the new
+ * tuple, not two TupleTableSlots, but we still need to construct a simlar
+ * bitmap so as to be able to know if HOT updates are allowed or not. To
+ * do that we first have to fetch the old tuple itself. Because
+ * heapam_fetch_row_version() is static, we have to replicate that code
+ * here. This is a bit repetitive because heap_update() will again find
+ * and form the old HeapTuple from the old TID and in most cases the
+ * callers (ignoring extensions, always catalog tuple updates) already had
+ * the set of changed attributes (e.g. the "replaces" array), but for now
+ * this minor repetition of work is necessary.
+ */
+
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* mix_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ mix_attrs = HeapUpdateModIdxAttrs(relation, oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, mix_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4572,6 +4704,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4588,7 +4724,6 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
}
}
-
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index b83e2013d50..aff68f80fac 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -44,6 +43,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,19 +316,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, mix_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -341,16 +348,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..64efa55dfe3 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..c2e77740e76 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,11 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..1064ebe845b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,83 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ continue;
+
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..4927fc88e61 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,68 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "mix_attrs". An overlap of a single index's attributes and this "mix" set
+ * signals that the attributes in the new_tts used to form the index datum have
+ * changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * NOTE: There is a simlar function called HeapUpdateModIDxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModIdxAttrs(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) from the
+ * executor state and then find the intersection with the indexed
+ * attributes. Attributes that are SET might not change value, so we have
+ * to examine them for changes.
+ */
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
+ if (!bms_is_empty(attrs))
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2259,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2386,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2409,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2632,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3483,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4539,7 +4616,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..f30505d8ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,8 +2475,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5276,8 +5276,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5300,8 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5320,10 +5320,10 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5366,8 +5366,8 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5426,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5435,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5498,12 +5498,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ bms_free(indexedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5512,10 +5520,10 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5528,8 +5536,8 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5542,10 +5550,10 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..7abc8e24f21 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,10 +365,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -430,6 +429,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *mix_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 119593b7b46..d4b12da9cac 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1524,12 +1525,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_attrs, update_indexes);
}
/*
@@ -2010,6 +2011,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..266d5309103 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -17,6 +17,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -606,6 +607,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -803,5 +808,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModIdxAttrs(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,8 +162,8 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,8 +69,8 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
[application/x-shellscript] perf-cf5556-v30.sh (30.0K, 3-perf-cf5556-v30.sh)
download
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-02-26 23:01 ` Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-02-26 23:01 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; +Cc: pgsql-hackers
Okay, here's hoping that the CI bot likes v31. :)
-greg
Attachments:
[text/x-patch] v31-0001-Idenfity-modified-indexed-attributes-in-the-exec.patch (57.2K, 2-v31-0001-Idenfity-modified-indexed-attributes-in-the-exec.patch)
download | inline diff:
From 8a486eeb6fa3422aef8c69da1c636452e3dd9c76 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v31] Idenfity modified indexed attributes in the executor on
UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes" (mix_attrs,
was known as modified_attrs) now happens outside the buffer lock and can
inform other decisions unrelated to heap.
ExecUpdateModIdxAttrs() replaces HeapDeterminesColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecUpdateModIdxAttrs() in turn uses ExecCompareSlotAttrs() to identify
which attributes have changed and then intersects that with the set of
indexed attributes to identify the modified indexed set, the mix_attrs.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involed in the decision to include the replica identity key or not.
This moved into heap_update() and out of HeapDetermineColumnsInfo()
which has been renamed to HeapUpdateModIdxAttrs() as it is still
required within simple_heap_update() to be able to identify mix_attrs
given only an old TID and a new HeapTuple.
Updates stemming from logical replication also use the new
ExecUpdateModIdxAttrs() in ExecSimpleRelationUpdate().
This patch also introduces a few helper functions: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a booleaning indicating if the update allows HOT updates or not.
If during heap_update() the new tuple will fit on the same page and that
boolean is true, the update is HOT. None of the logic related to when
HOT is allowed has changed.
Triggers are free to use heap_modify_tuple() and update attributes not
found in the UPDATE statement or triggers that fire due to an UPDATE.
When that happens the executor has no knowledge of those changes. This
forced HeapDetermineColumnsInfo() to scan all indexed attributes on a
relation rather than only the intersection of indexed and those
identified by ExecGetAllUpdatedCols(). This occurs in at least one test
that uses the tsvector_update_trigger() function (tsearch.sql).
ExecBRUpdateTriggers() has been changed to identify changes to indexed
columns not found by ExecGetAllUpdateCols() and add those attributes to
ri_extraUpdatedCols.
Three tests were adjusted to avoid instability due to tuple ordering
during heap page scans. This avoids nondeterministic results.
---
src/backend/access/heap/heapam.c | 481 +++++++++++-------
src/backend/access/heap/heapam_handler.c | 32 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 20 +-
src/backend/executor/execReplication.c | 7 +-
src/backend/executor/execTuples.c | 78 +++
src/backend/executor/nodeModifyTable.c | 93 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 9 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/updatable_views.sql | 2 +-
17 files changed, 576 insertions(+), 228 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 98d53caeea8..6b36f62a6f2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,20 +37,26 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -67,11 +73,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3300,7 +3303,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3310,17 +3313,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *rid_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3339,13 +3338,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3376,33 +3374,14 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well
+ * as the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * NOTE: relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin", NULL);
@@ -3456,20 +3435,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3480,16 +3456,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicaIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(mix_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, mix_attrs);
+ attrs = bms_int_members(attrs, rid_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3502,9 +3521,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3521,7 +3539,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3532,7 +3550,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* with the new tuple's location, so there's great risk of confusion if we
* use otid anymore.
*/
-
l2:
checked_lockers = false;
locker_remains = false;
@@ -3600,7 +3617,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3609,7 +3626,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3694,7 +3711,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3754,17 +3771,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3794,7 +3808,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3911,7 +3925,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4071,37 +4085,16 @@ l2:
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
- if (newbuf == buffer)
- {
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
- use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
- }
+ if ((newbuf == buffer) && hot_allowed)
+ use_hot_update = true;
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4111,8 +4104,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4241,7 +4233,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4255,31 +4247,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4452,28 +4425,113 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(mix_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes? If that's the case we need to update those
+ * indexes, but this can be a HOT update.
+ */
+ if (bms_is_subset(mix_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *mix_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(mix_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4489,7 +4547,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
@@ -4502,7 +4560,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
}
@@ -4518,29 +4576,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ mix_attrs = bms_add_member(mix_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return mix_attrs;
}
/*
@@ -4552,17 +4593,108 @@ HeapDetermineColumnsInfo(Relation relation,
* via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *mix_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("mix_attrs") so as to see if a HOT update is allowable or
+ * not. When updating heap tuples via execution of UPDATE statements this
+ * set is constructed before calling into the table AM's tuple_update()
+ * function by the function ExecUpdateModIdxAttrs() which compares the
+ * old/new TupleTableSlots. However, here we have the old TID and the new
+ * tuple, not two TupleTableSlots, but we still need to construct a simlar
+ * bitmap so as to be able to know if HOT updates are allowed or not. To
+ * do that we first have to fetch the old tuple itself. Because
+ * heapam_fetch_row_version() is static, we have to replicate that code
+ * here. This is a bit repetitive because heap_update() will again find
+ * and form the old HeapTuple from the old TID and in most cases the
+ * callers (ignoring extensions, always catalog tuple updates) already had
+ * the set of changed attributes (e.g. the "replaces" array), but for now
+ * this minor repetition of work is necessary.
+ */
+
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* mix_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ mix_attrs = HeapUpdateModIdxAttrs(relation, oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, mix_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4572,6 +4704,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4588,7 +4724,6 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
}
}
-
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..4600af61793 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -44,6 +43,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,19 +316,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, mix_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -341,16 +348,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..64efa55dfe3 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..c2e77740e76 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,11 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..1064ebe845b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,83 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ continue;
+
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..4927fc88e61 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,68 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "mix_attrs". An overlap of a single index's attributes and this "mix" set
+ * signals that the attributes in the new_tts used to form the index datum have
+ * changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * NOTE: There is a simlar function called HeapUpdateModIDxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModIdxAttrs(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) from the
+ * executor state and then find the intersection with the indexed
+ * attributes. Attributes that are SET might not change value, so we have
+ * to examine them for changes.
+ */
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
+ if (!bms_is_empty(attrs))
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2259,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2386,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2409,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2632,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3483,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4539,7 +4616,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..f30505d8ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,8 +2475,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5276,8 +5276,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5300,8 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5320,10 +5320,10 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5366,8 +5366,8 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5426,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5435,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5498,12 +5498,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ bms_free(indexedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5512,10 +5520,10 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5528,8 +5536,8 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5542,10 +5550,10 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..7abc8e24f21 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,10 +365,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -430,6 +429,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *mix_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..19c58a76854 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1523,12 +1524,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_attrs, update_indexes);
}
/*
@@ -2009,6 +2010,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..266d5309103 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -17,6 +17,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -606,6 +607,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -803,5 +808,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModIdxAttrs(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,8 +162,8 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,8 +69,8 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-02 19:08 ` Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-03-02 19:08 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; +Cc: pgsql-hackers
Hello Jeff, hackers,
In v33 I've updated a test in triggers.sql to address differences across platforms identified by the cf-bot and rebased the work.
I thought it might be prudent to add tests that validate all the corner cases of HOT that I could come up with, maybe too many (you tell me). In addition, because code that impacts HOT is also involved in what is WAL logged for the purposes of logical replication, I've added tests that try to explore the corners of that too. The goal of these first few patches is to NOT change the behavior of these things, but to only move the logic into the executor and out of heap then it makes sense to validate that explicitly.
At some point when I get back to $subjet I'll want to document how things changed. The best way to do that is by changing tests along with code. So, that is "0001" in this v33 patch set.
I've also run longer performance tests which show minimal performance differences between master and patched.
Workload 60s 300s 600s
jsonb_write_batch +14.8% -7.0% +0.1%
jsonb_write_single +0.3% +0.2% -0.0%
license_write_single +0.4% +0.2% -0.1%
gin_write_single -0.3% -0.2% -0.4%
pgbench_simple-update +3.2% +6.2% +0.9%
pgbench_tpcb-like -1.1% +0.8% -2.0%
Changing tests isn't something I take lightly, I dug into this quite a bit. I ran an analysis of ALL regression tests comparing master vs patched after instrumenting the code (see below) so I could record HOT and replica identity decisions and record where the tuple landed on the page.
Patched code produced:
simple_heap_update: 17,028 calls (72.5% - catalog updates, direct heap ops)
heapam_tuple_update: 6,462 calls (27.5% - executor path via table AM)
Total entry points: 23,490
This matched master's log line output for the same tests.
Replica identity decisions were identical, 342 unique patterns with 0 differences.
HOT eligibility was also identical, 398 unique patterns matched, again 0 differences.
The physical placement of tuples on pages was 99.991% identical, only 2 of 23,473 updates had different buffer placement.
Across test runs there were a few differences noted for pg_sequence, target, and wslot. Both master and patched agreed on hot_allowed=1 (logic identical), but in some cases use_hot_update differed (buffer placement, newbuf =?= buffer). To me this reads as non-deterministic behavior, not a bug introduced in this patch.
At this point I'd say that v33 patch is functionally correct and performance neutral. This set of changes isn't exactly exciting on the surface, but I feel that it opens the door to other changes that will be more interesting/valuable down the line.
Thank you for your time and interest.
best.
-greg
COMPARISON TESTING NOTES:
---------------------------------------------------------------------------------------
src/backend/access/heap/heapam.c
3514: elog(LOG, "PATCHED heap_update (replica check): rel=%s otid=(%u,%u) rep_id_key_required=%d",
3515- RelationGetRelationName(relation),
3516- ItemPointerGetBlockNumber(otid),
3517- ItemPointerGetOffsetNumber(otid),
3518- rep_id_key_required);
3519-
--
4106: elog(LOG, "PATCHED heap_update (final HOT): rel=%s otid=(%u,%u) hot_allowed=%d newbuf==buffer=%d use_hot_update=%d",
4107- RelationGetRelationName(relation),
4108- ItemPointerGetBlockNumber(otid),
4109- ItemPointerGetOffsetNumber(otid),
4110- hot_allowed, (newbuf == buffer), use_hot_update);
4111-
--
4693: elog(LOG, "PATCHED simple_heap_update: rel=%s otid=(%u,%u) hot_allowed=%d summarized_only=%d lockmode=%d",
4694- RelationGetRelationName(relation),
4695- ItemPointerGetBlockNumber(otid),
4696- ItemPointerGetOffsetNumber(otid),
4697- hot_allowed, summarized_only, lockmode);
4698-
src/backend/access/heap/heapam_handler.c
333: elog(LOG, "PATCHED heapam_tuple_update: rel=%s otid=(%u,%u) hot_allowed=%d summarized_only=%d lockmode=%d",
334- RelationGetRelationName(relation),
335- ItemPointerGetBlockNumber(otid),
336- ItemPointerGetOffsetNumber(otid),
337- hot_allowed, summarized_only, *lockmode);
Attachments:
[text/x-patch] v33-0001-Add-comprehensive-tests-for-HOT-updates-and-repl.patch (102.7K, 2-v33-0001-Add-comprehensive-tests-for-HOT-updates-and-repl.patch)
download | inline diff:
From 95cda56b5e9d53232c9b5f95abe423e878b1fe78 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Fri, 27 Feb 2026 12:19:06 -0500
Subject: [PATCH v33 1/2] Add comprehensive tests for HOT updates and replica
identity
Adds regression and isolation tests covering:
- HOT update decisions across various index types (B-tree, BRIN,
partial, expression, multi-column, unique constraints)
- Replica identity key extraction for logical replication
(DEFAULT, FULL, USING INDEX, NOTHING modes)
- Concurrent HOT update scenarios (locking, blocking, index scans,
HOT chains, FOR UPDATE/KEY SHARE interactions)
Regression tests:
- hot_updates.sql: 10 scenarios testing HOT eligibility
- replica_identity_logging.sql: 11 scenarios verifying replica
identity keys logged to WAL via test_decoding
Isolation tests:
- hot_updates_concurrent.spec: concurrent updates on same/different rows
- hot_updates_index_scan.spec: interactions with index scans and row locks
- hot_updates_chain.spec: HOT chain building and snapshot isolation
---
.../isolation/expected/hot_updates_chain.out | 144 ++++
.../expected/hot_updates_concurrent.out | 143 ++++
.../expected/hot_updates_index_scan.out | 126 +++
src/test/isolation/isolation_schedule | 3 +
.../isolation/specs/hot_updates_chain.spec | 110 +++
.../specs/hot_updates_concurrent.spec | 107 +++
.../specs/hot_updates_index_scan.spec | 91 +++
src/test/regress/expected/hot_updates.out | 725 ++++++++++++++++++
.../expected/replica_identity_logging.out | 396 ++++++++++
src/test/regress/parallel_schedule | 7 +
src/test/regress/sql/hot_updates.sql | 553 +++++++++++++
.../regress/sql/replica_identity_logging.sql | 349 +++++++++
12 files changed, 2754 insertions(+)
create mode 100644 src/test/isolation/expected/hot_updates_chain.out
create mode 100644 src/test/isolation/expected/hot_updates_concurrent.out
create mode 100644 src/test/isolation/expected/hot_updates_index_scan.out
create mode 100644 src/test/isolation/specs/hot_updates_chain.spec
create mode 100644 src/test/isolation/specs/hot_updates_concurrent.spec
create mode 100644 src/test/isolation/specs/hot_updates_index_scan.spec
create mode 100644 src/test/regress/expected/hot_updates.out
create mode 100644 src/test/regress/expected/replica_identity_logging.out
create mode 100644 src/test/regress/sql/hot_updates.sql
create mode 100644 src/test/regress/sql/replica_identity_logging.sql
diff --git a/src/test/isolation/expected/hot_updates_chain.out b/src/test/isolation/expected/hot_updates_chain.out
new file mode 100644
index 00000000000..503252009ea
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_chain.out
@@ -0,0 +1,144 @@
+Parsed test spec with 5 sessions
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s1_hot_update3 s1_commit s1_select s1_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_hot_update3: UPDATE hot_test SET non_indexed_col = 'update3' WHERE id = 1;
+step s1_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|update3
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s2_begin s2_select_before s1_begin s1_hot_update1 s1_hot_update2 s1_commit s2_select_after s2_commit
+step s2_begin: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2_select_before: SELECT non_indexed_col FROM hot_test WHERE id = 1;
+non_indexed_col
+---------------
+initial
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_commit: COMMIT;
+step s2_select_after: SELECT non_indexed_col FROM hot_test WHERE id = 1;
+non_indexed_col
+---------------
+initial
+(1 row)
+
+step s2_commit: COMMIT;
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s1_commit s3_begin s3_non_hot_update s3_commit s1_select
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_commit: COMMIT;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s3_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|update2
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update1 s1_commit s3_begin s3_non_hot_update s3_commit s4_begin s4_hot_after_non_hot s4_commit s4_select s4_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_commit: COMMIT;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s3_commit: COMMIT;
+step s4_begin: BEGIN;
+step s4_hot_after_non_hot: UPDATE hot_test SET non_indexed_col = 'after_non_hot' WHERE id = 1;
+step s4_commit: COMMIT;
+step s4_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|after_non_hot
+(1 row)
+
+step s4_verify_hot:
+ -- Check for new HOT chain after non-HOT update broke the previous chain
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s5_begin s5_hot_update_row2_1 s5_hot_update_row2_2 s1_commit s5_commit s1_select s5_select s1_verify_hot s5_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s5_begin: BEGIN;
+step s5_hot_update_row2_1: UPDATE hot_test SET non_indexed_col = 'row2_update1' WHERE id = 2;
+step s5_hot_update_row2_2: UPDATE hot_test SET non_indexed_col = 'row2_update2' WHERE id = 2;
+step s1_commit: COMMIT;
+step s5_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|update2
+(1 row)
+
+step s5_select: SELECT * FROM hot_test WHERE id = 2;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 2| 200|row2_update2
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+step s5_verify_hot:
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
diff --git a/src/test/isolation/expected/hot_updates_concurrent.out b/src/test/isolation/expected/hot_updates_concurrent.out
new file mode 100644
index 00000000000..b1a8b0cb7b2
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_concurrent.out
@@ -0,0 +1,143 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s1_begin s1_hot_update s2_begin s2_hot_update s1_commit s2_commit s1_select s2_select s2_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s2_begin: BEGIN;
+step s2_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s2' WHERE id = 1; <waiting ...>
+step s1_commit: COMMIT;
+step s2_hot_update: <... completed>
+step s2_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s2
+(1 row)
+
+step s2_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s2
+(1 row)
+
+step s2_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s3_begin s3_non_hot_update s1_commit s3_commit s3_select s3_verify_index
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1; <waiting ...>
+step s1_commit: COMMIT;
+step s3_non_hot_update: <... completed>
+step s3_commit: COMMIT;
+step s3_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|updated_s1
+(1 row)
+
+step s3_verify_index:
+ -- Verify index was updated (proves non-HOT)
+ SELECT COUNT(*) = 1 AS index_updated FROM hot_test WHERE indexed_col = 150;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 100;
+
+index_updated
+-------------
+t
+(1 row)
+
+old_value_gone
+--------------
+t
+(1 row)
+
+
+starting permutation: s3_begin s3_non_hot_update s1_begin s1_hot_update s3_commit s1_commit s1_select s1_verify_hot
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1; <waiting ...>
+step s3_commit: COMMIT;
+step s1_hot_update: <... completed>
+step s1_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|updated_s1
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s4_begin s4_hot_update_row2 s1_commit s4_commit s1_select s4_select s1_verify_hot s4_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s4_begin: BEGIN;
+step s4_hot_update_row2: UPDATE hot_test SET non_indexed_col = 'updated_s4' WHERE id = 2;
+step s1_commit: COMMIT;
+step s4_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s1
+(1 row)
+
+step s4_select: SELECT * FROM hot_test WHERE id = 2;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 2| 200|updated_s4
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+step s4_verify_hot:
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
diff --git a/src/test/isolation/expected/hot_updates_index_scan.out b/src/test/isolation/expected/hot_updates_index_scan.out
new file mode 100644
index 00000000000..d72322b2146
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_index_scan.out
@@ -0,0 +1,126 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s1_begin s1_hot_update s2_begin s2_index_scan s1_commit s2_commit
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s2_begin: BEGIN;
+step s2_index_scan: SELECT * FROM hot_test WHERE indexed_col = 500;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
+
+starting permutation: s1_begin s1_non_hot_update s1_commit s2_begin s2_index_scan_new s2_commit s2_verify_index
+step s1_begin: BEGIN;
+step s1_non_hot_update: UPDATE hot_test SET indexed_col = 555 WHERE id = 50;
+step s1_commit: COMMIT;
+step s2_begin: BEGIN;
+step s2_index_scan_new: SELECT * FROM hot_test WHERE indexed_col = 555;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 555|initial50
+(1 row)
+
+step s2_commit: COMMIT;
+step s2_verify_index:
+ -- After non-HOT update, verify index reflects the change
+ SELECT COUNT(*) = 1 AS found_new_value FROM hot_test WHERE indexed_col = 555;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 500;
+
+found_new_value
+---------------
+t
+(1 row)
+
+old_value_gone
+--------------
+t
+(1 row)
+
+
+starting permutation: s3_begin s3_select_for_update s1_begin s1_hot_update s3_commit s1_commit s1_verify_hot
+step s3_begin: BEGIN;
+step s3_select_for_update: SELECT * FROM hot_test WHERE id = 50 FOR UPDATE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50; <waiting ...>
+step s3_commit: COMMIT;
+step s1_hot_update: <... completed>
+step s1_commit: COMMIT;
+step s1_verify_hot:
+ -- Verify HOT chain exists for row with id=50
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid)
+ ) AS has_hot_chain;
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s3_begin s3_select_for_update s1_commit s3_commit
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s3_begin: BEGIN;
+step s3_select_for_update: SELECT * FROM hot_test WHERE id = 50 FOR UPDATE; <waiting ...>
+step s1_commit: COMMIT;
+step s3_select_for_update: <... completed>
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|hot_updated
+(1 row)
+
+step s3_commit: COMMIT;
+
+starting permutation: s4_begin s4_select_for_key_share s1_begin s1_hot_update s4_commit s1_commit s1_verify_hot
+step s4_begin: BEGIN;
+step s4_select_for_key_share: SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s4_commit: COMMIT;
+step s1_commit: COMMIT;
+step s1_verify_hot:
+ -- Verify HOT chain exists for row with id=50
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid)
+ ) AS has_hot_chain;
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s4_begin s4_select_for_key_share s1_begin s1_non_hot_update s4_commit s1_commit
+step s4_begin: BEGIN;
+step s4_select_for_key_share: SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_non_hot_update: UPDATE hot_test SET indexed_col = 555 WHERE id = 50;
+step s4_commit: COMMIT;
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..46525b0a62a 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -19,6 +19,9 @@ test: multiple-row-versions
test: index-only-scan
test: index-only-bitmapscan
test: predicate-lock-hot-tuple
+test: hot_updates_concurrent
+test: hot_updates_index_scan
+test: hot_updates_chain
test: update-conflict-out
test: deadlock-simple
test: deadlock-hard
diff --git a/src/test/isolation/specs/hot_updates_chain.spec b/src/test/isolation/specs/hot_updates_chain.spec
new file mode 100644
index 00000000000..85cd2176133
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_chain.spec
@@ -0,0 +1,110 @@
+# Test HOT update chains and their interaction with VACUUM and page pruning
+#
+# This test verifies that HOT update chains are correctly maintained when
+# multiple HOT updates occur on the same row, and that VACUUM correctly
+# handles HOT chains.
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test VALUES (1, 100, 'initial');
+ INSERT INTO hot_test VALUES (2, 200, 'initial');
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: Create HOT chain with multiple updates
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update1 { UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1; }
+step s1_hot_update2 { UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1; }
+step s1_hot_update3 { UPDATE hot_test SET non_indexed_col = 'update3' WHERE id = 1; }
+step s1_commit { COMMIT; }
+step s1_select { SELECT * FROM hot_test WHERE id = 1; }
+step s1_verify_hot {
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 2: Read while HOT chain is being built
+session s2
+step s2_begin { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2_select_before { SELECT non_indexed_col FROM hot_test WHERE id = 1; }
+step s2_select_after { SELECT non_indexed_col FROM hot_test WHERE id = 1; }
+step s2_commit { COMMIT; }
+
+# Session 3: Break HOT chain with non-HOT update
+session s3
+step s3_begin { BEGIN; }
+step s3_non_hot_update { UPDATE hot_test SET indexed_col = 150 WHERE id = 1; }
+step s3_commit { COMMIT; }
+
+# Session 4: Try to build HOT chain after non-HOT update
+session s4
+step s4_begin { BEGIN; }
+step s4_hot_after_non_hot { UPDATE hot_test SET non_indexed_col = 'after_non_hot' WHERE id = 1; }
+step s4_commit { COMMIT; }
+step s4_select { SELECT * FROM hot_test WHERE id = 1; }
+step s4_verify_hot {
+ -- Check for new HOT chain after non-HOT update broke the previous chain
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Session 5: Multiple sessions building separate HOT chains on different rows
+session s5
+step s5_begin { BEGIN; }
+step s5_hot_update_row2_1 { UPDATE hot_test SET non_indexed_col = 'row2_update1' WHERE id = 2; }
+step s5_hot_update_row2_2 { UPDATE hot_test SET non_indexed_col = 'row2_update2' WHERE id = 2; }
+step s5_commit { COMMIT; }
+step s5_select { SELECT * FROM hot_test WHERE id = 2; }
+step s5_verify_hot {
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Build HOT chain within single transaction
+# All updates should form a HOT chain
+permutation s1_begin s1_hot_update1 s1_hot_update2 s1_hot_update3 s1_commit s1_select s1_verify_hot
+
+# REPEATABLE READ should see consistent snapshot across HOT chain updates
+# Session 2 starts before updates, should see 'initial' throughout
+permutation s2_begin s2_select_before s1_begin s1_hot_update1 s1_hot_update2 s1_commit s2_select_after s2_commit
+
+# HOT chain followed by non-HOT update
+# Non-HOT update breaks the HOT chain
+permutation s1_begin s1_hot_update1 s1_hot_update2 s1_commit s3_begin s3_non_hot_update s3_commit s1_select
+
+# HOT update after non-HOT update can start new HOT chain
+# After breaking chain with indexed column update, new HOT updates can start fresh chain
+permutation s1_begin s1_hot_update1 s1_commit s3_begin s3_non_hot_update s3_commit s4_begin s4_hot_after_non_hot s4_commit s4_select s4_verify_hot
+
+# Multiple sessions building separate HOT chains on different rows
+permutation s1_begin s1_hot_update1 s1_hot_update2 s5_begin s5_hot_update_row2_1 s5_hot_update_row2_2 s1_commit s5_commit s1_select s5_select s1_verify_hot s5_verify_hot
diff --git a/src/test/isolation/specs/hot_updates_concurrent.spec b/src/test/isolation/specs/hot_updates_concurrent.spec
new file mode 100644
index 00000000000..eac78d62ac5
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_concurrent.spec
@@ -0,0 +1,107 @@
+# Test concurrent HOT updates and validate HOT chains
+#
+# This test verifies that HOT updates work correctly when multiple sessions
+# are updating the same table concurrently, and validates that HOT chains
+# are actually created using heap_page_items().
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test VALUES (1, 100, 'initial1');
+ INSERT INTO hot_test VALUES (2, 200, 'initial2');
+ INSERT INTO hot_test VALUES (3, 300, 'initial3');
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: HOT update (modify non-indexed column)
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update { UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1; }
+step s1_commit { COMMIT; }
+step s1_select { SELECT * FROM hot_test WHERE id = 1; }
+step s1_verify_hot {
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 2: HOT update (modify non-indexed column on same row)
+session s2
+step s2_begin { BEGIN; }
+step s2_hot_update { UPDATE hot_test SET non_indexed_col = 'updated_s2' WHERE id = 1; }
+step s2_commit { COMMIT; }
+step s2_select { SELECT * FROM hot_test WHERE id = 1; }
+step s2_verify_hot {
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 3: Non-HOT update (modify indexed column)
+session s3
+step s3_begin { BEGIN; }
+step s3_non_hot_update { UPDATE hot_test SET indexed_col = 150 WHERE id = 1; }
+step s3_commit { COMMIT; }
+step s3_select { SELECT * FROM hot_test WHERE id = 1; }
+step s3_verify_index {
+ -- Verify index was updated (proves non-HOT)
+ SELECT COUNT(*) = 1 AS index_updated FROM hot_test WHERE indexed_col = 150;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 100;
+}
+
+# Session 4: Concurrent HOT updates on different rows
+session s4
+step s4_begin { BEGIN; }
+step s4_hot_update_row2 { UPDATE hot_test SET non_indexed_col = 'updated_s4' WHERE id = 2; }
+step s4_commit { COMMIT; }
+step s4_select { SELECT * FROM hot_test WHERE id = 2; }
+step s4_verify_hot {
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Two sessions both doing HOT updates on same row
+# Second session should block until first commits
+# Both should create HOT chains
+permutation s1_begin s1_hot_update s2_begin s2_hot_update s1_commit s2_commit s1_select s2_select s2_verify_hot
+
+# HOT update followed by non-HOT update
+# Non-HOT update should wait for HOT update to commit
+# First update is HOT, second is non-HOT (index updated)
+permutation s1_begin s1_hot_update s3_begin s3_non_hot_update s1_commit s3_commit s3_select s3_verify_index
+
+# Non-HOT update followed by HOT update
+# HOT update should wait for non-HOT update to commit
+# First update is non-HOT (index), second is HOT
+permutation s3_begin s3_non_hot_update s1_begin s1_hot_update s3_commit s1_commit s1_select s1_verify_hot
+
+# Concurrent HOT updates on different rows (should not block)
+# Both sessions should be able to create HOT chains independently
+permutation s1_begin s1_hot_update s4_begin s4_hot_update_row2 s1_commit s4_commit s1_select s4_select s1_verify_hot s4_verify_hot
diff --git a/src/test/isolation/specs/hot_updates_index_scan.spec b/src/test/isolation/specs/hot_updates_index_scan.spec
new file mode 100644
index 00000000000..39db07cc80f
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_index_scan.spec
@@ -0,0 +1,91 @@
+# Test HOT updates interaction with index scans and SELECT FOR UPDATE
+#
+# This test verifies that HOT updates are correctly handled when concurrent
+# sessions are performing index scans, using SELECT FOR UPDATE, and validates
+# HOT chains using heap_page_items().
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test SELECT i, i * 10, 'initial' || i FROM generate_series(1, 100) i;
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: Perform HOT update
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update { UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50; }
+step s1_non_hot_update { UPDATE hot_test SET indexed_col = 555 WHERE id = 50; }
+step s1_commit { COMMIT; }
+step s1_verify_hot {
+ -- Verify HOT chain exists for row with id=50
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid)
+ ) AS has_hot_chain;
+}
+
+# Session 2: Index scan while HOT update in progress
+session s2
+step s2_begin { BEGIN; }
+step s2_index_scan { SELECT * FROM hot_test WHERE indexed_col = 500; }
+step s2_index_scan_new { SELECT * FROM hot_test WHERE indexed_col = 555; }
+step s2_commit { COMMIT; }
+step s2_verify_index {
+ -- After non-HOT update, verify index reflects the change
+ SELECT COUNT(*) = 1 AS found_new_value FROM hot_test WHERE indexed_col = 555;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 500;
+}
+
+# Session 3: SELECT FOR UPDATE
+session s3
+step s3_begin { BEGIN; }
+step s3_select_for_update { SELECT * FROM hot_test WHERE id = 50 FOR UPDATE; }
+step s3_commit { COMMIT; }
+
+# Session 4: SELECT FOR KEY SHARE (should not block HOT update of non-key column)
+session s4
+step s4_begin { BEGIN; }
+step s4_select_for_key_share { SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE; }
+step s4_commit { COMMIT; }
+
+# Index scan should see consistent snapshot during HOT update
+# Index scan starts before HOT update commits
+permutation s1_begin s1_hot_update s2_begin s2_index_scan s1_commit s2_commit
+
+# Index scan after non-HOT update should see new index entry
+# Index scan starts after non-HOT update commits
+permutation s1_begin s1_non_hot_update s1_commit s2_begin s2_index_scan_new s2_commit s2_verify_index
+
+# SELECT FOR UPDATE blocks HOT update
+# FOR UPDATE should block the UPDATE until SELECT commits
+permutation s3_begin s3_select_for_update s1_begin s1_hot_update s3_commit s1_commit s1_verify_hot
+
+# HOT update blocks SELECT FOR UPDATE
+# SELECT FOR UPDATE should wait for HOT update to commit
+permutation s1_begin s1_hot_update s3_begin s3_select_for_update s1_commit s3_commit
+
+# SELECT FOR KEY SHARE should not block HOT update (non-key column)
+# HOT update of non-indexed column should not conflict with FOR KEY SHARE
+permutation s4_begin s4_select_for_key_share s1_begin s1_hot_update s4_commit s1_commit s1_verify_hot
+
+# Non-HOT update (key column) should block after FOR KEY SHARE
+# Non-HOT update of indexed column should wait for FOR KEY SHARE
+permutation s4_begin s4_select_for_key_share s1_begin s1_non_hot_update s4_commit s1_commit
diff --git a/src/test/regress/expected/hot_updates.out b/src/test/regress/expected/hot_updates.out
new file mode 100644
index 00000000000..04fb86755db
--- /dev/null
+++ b/src/test/regress/expected/hot_updates.out
@@ -0,0 +1,725 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Index verification (index still works = proves no index update for HOT)
+-- 2. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 3. pageinspect extension for HOT chain examination
+--
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+-- Clean up from prior runs
+DROP TABLE IF EXISTS hot_test CASCADE;
+NOTICE: table "hot_test" does not exist, skipping
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+ PERFORM pg_stat_force_next_flush();
+
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+-- Trigger page pruning via table scan
+CREATE OR REPLACE FUNCTION heap_prune_page(rel_name text, target_ctid tid)
+RETURNS void AS $$
+DECLARE
+ block_num int;
+BEGIN
+ -- Extract block number from ctid
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Scan only the specific page to trigger pruning on that page
+ EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(rel_name) ||
+ ' WHERE ctid >= (' || block_num || ',0) AND ctid < (' || (block_num + 1) || ',0)';
+END;
+$$ LANGUAGE plpgsql;
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) USING heap;
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+-- Get baseline and initial ctid
+WITH initial_state AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'Initial State' AS phase,
+ initial_state.ctid,
+ (get_hot_count('hot_test')).updates,
+ (get_hot_count('hot_test')).hot
+FROM initial_state;
+ phase | ctid | updates | hot
+---------------+-------+---------+-----
+ Initial State | (0,1) | 0 | 0
+(1 row)
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+-- Verify HOT updates occurred
+SELECT
+ 'After Updates' AS phase,
+ (get_hot_count('hot_test')).updates,
+ (get_hot_count('hot_test')).hot;
+ phase | updates | hot
+---------------+---------+-----
+ After Updates | 3 | 3
+(1 row)
+
+-- Dump the HOT chain before pruning
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'Before VACUUM' AS phase,
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ phase | has_chain | chain_position | ctid | lp_flags | t_ctid
+---------------+-----------+----------------+-------+------------+--------
+ Before VACUUM | t | 0 | (0,1) | normal (1) | (0,4)
+ Before VACUUM | t | 1 | (0,4) | normal (1) | (0,4)
+(2 rows)
+
+SET SESSION enable_seqscan = OFF;
+SET SESSION enable_bitmapscan = OFF;
+-- Verify indexes still work
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+ id | indexed_col
+----+-------------
+ 1 | 100
+(1 row)
+
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 200;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 200)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 200;
+ id | indexed_col
+----+-------------
+ 2 | 200
+(1 row)
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'After VACUUM' AS phase,
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ phase | has_chain | chain_position | ctid | lp_flags | t_ctid
+--------------+-----------+----------------+-------+------------+--------
+ After VACUUM | f | 0 | (0,4) | normal (1) | (0,4)
+(1 row)
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (4,3)
+(1 row)
+
+-- Verify index was updated (new value findable)
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 150)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ id | indexed_col
+----+-------------
+ 1 | 150
+(1 row)
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id FROM hot_test WHERE indexed_col = 100;
+ id
+----
+(0 rows)
+
+SET SESSION enable_seqscan = ON;
+SET SESSION enable_bitmapscan = ON;
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) USING heap;
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,0)
+(1 row)
+
+-- Verify all three indexes still work correctly
+SELECT id, col_a FROM hot_test WHERE col_a = 15; -- updated index
+ id | col_a
+----+-------
+ 1 | 15
+(1 row)
+
+SELECT id, col_b FROM hot_test WHERE col_b = 20; -- unchanged index
+ id | col_b
+----+-------
+ 1 | 20
+(1 row)
+
+SELECT id, col_c FROM hot_test WHERE col_c = 30; -- unchanged index
+ id | col_c
+----+-------
+ 1 | 30
+(1 row)
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,1)
+(1 row)
+
+-- Verify all indexes still work
+SELECT id FROM hot_test WHERE col_a = 15 AND col_b = 20 AND col_c = 30;
+ id
+----
+ 1
+(1 row)
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) USING heap;
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,2)
+(1 row)
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (3,2)
+(1 row)
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+ id | status
+----+--------
+ 1 | active
+(1 row)
+
+-- Expression index with JSONB
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ metadata jsonb
+) USING heap;
+-- Index on JSONB expression
+CREATE INDEX hot_test_user_id_idx ON hot_test((metadata->>'user_id'));
+CREATE INDEX hot_test_status_idx ON hot_test((metadata->>'status'));
+INSERT INTO hot_test VALUES (1, '{"user_id": "123", "status": "active"}'::jsonb);
+-- Update JSONB field used in expression index to the same value,
+-- this will be HOT because the entire JSONB field is observed to
+-- be unchanged.
+UPDATE hot_test SET metadata = jsonb_set(metadata, '{user_id}', '"123"')
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+-- Update JSONB field that is no used in any index to some new value, this
+-- will prevent a HOT update despite not changing what is used when forming
+-- the index key, this is counter intuitive and causes index bloat as well
+-- as slows down updates on JSONB data as any change will trigger all
+-- indexes to be updated.
+UPDATE hot_test SET metadata = jsonb_set(metadata, '{food}', '"apple"')
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,1)
+(1 row)
+
+-- Use a few different methods for mutating JSONB data, but don't modify
+-- indexed portions of the document. None of these will be HOT.
+UPDATE hot_test SET metadata = jsonb_set(
+ jsonb_set(metadata, '{food}', '"pear"'),
+ '{timestamp}',
+ to_jsonb(now())
+)
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (3,1)
+(1 row)
+
+UPDATE hot_test
+SET metadata = metadata || '{"user_id": "123", "timestamp": "2024-01-01"}'::jsonb
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (4,1)
+(1 row)
+
+UPDATE hot_test SET metadata =
+ jsonb_set(
+ jsonb_set(metadata, '{user_id}', '"123"'),
+ '{fruit}',
+ '"plumb"'
+ );
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (5,1)
+(1 row)
+
+UPDATE hot_test SET metadata = metadata || jsonb_build_object(
+ 'user_id', '123',
+ 'timestamp', now(),
+ 'fruit', 'honeydew'
+);
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (6,1)
+(1 row)
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) USING heap;
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+ id
+----
+ 1
+(1 row)
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,2)
+(1 row)
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) USING heap;
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,2)
+(1 row)
+
+-- Verify index still works
+SELECT id FROM hot_test WHERE indexed_col = 100;
+ id
+----
+ 1
+(1 row)
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (3,2)
+(1 row)
+
+-- Verify index was updated
+SELECT id FROM hot_test WHERE indexed_col = 200;
+ id
+----
+ 1
+(1 row)
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) USING heap;
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (2,2)
+(1 row)
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+ id | unique_col | data
+----+------------+---------
+ 1 | 100 | updated
+ 2 | 200 | updated
+(2 rows)
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+ERROR: duplicate key value violates unique constraint "hot_test_unique_col_key"
+DETAIL: Key (unique_col)=(100) already exists.
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) USING heap;
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (1,0)
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (3,0)
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (4,0)
+(1 row)
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT get_hot_count('hot_test');
+ get_hot_count
+---------------
+ (6,2)
+(1 row)
+
+-- Verify multi-column index works
+SELECT id FROM hot_test WHERE col_a = 10 AND col_b = 20;
+ id
+----
+ 1
+(1 row)
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+NOTICE: table "hot_test_partitioned" does not exist, skipping
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key) USING heap;
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) USING heap;
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) USING heap;
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+SELECT get_hot_count('hot_test_part1');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+SELECT get_hot_count('hot_test_part2');
+ get_hot_count
+---------------
+ (1,1)
+(1 row)
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+ id
+----
+ 2
+(1 row)
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT get_hot_count('hot_test_part1');
+ get_hot_count
+---------------
+ (2,1)
+(1 row)
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+ id
+----
+ 1
+(1 row)
+
+-- Cleanup
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS heap_prune(text);
+NOTICE: function heap_prune(text) does not exist, skipping
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
diff --git a/src/test/regress/expected/replica_identity_logging.out b/src/test/regress/expected/replica_identity_logging.out
new file mode 100644
index 00000000000..2096510b924
--- /dev/null
+++ b/src/test/regress/expected/replica_identity_logging.out
@@ -0,0 +1,396 @@
+--
+-- REPLICA_IDENTITY_LOGGING
+-- Test that replica identity keys are correctly extracted and logged for logical replication
+--
+-- This test verifies that the correct old key columns are included in WAL records
+-- for logical replication, based on the table's replica identity setting.
+--
+-- Clean up from prior runs
+DROP TABLE IF EXISTS repid_test CASCADE;
+NOTICE: table "repid_test" does not exist, skipping
+-- Drop replication slot if it exists from prior run
+SELECT pg_drop_replication_slot('repid_test_slot') FROM pg_replication_slots WHERE slot_name = 'repid_test_slot';
+ pg_drop_replication_slot
+--------------------------
+(0 rows)
+
+-- Enable logical decoding to verify what gets logged
+SELECT 'init' FROM pg_create_logical_replication_slot('repid_test_slot', 'test_decoding');
+ ?column?
+----------
+ init
+(1 row)
+
+-- REPLICA IDENTITY DEFAULT (primary key columns only)
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ data text
+);
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+INSERT INTO repid_test VALUES (2, 200, 'initial');
+-- Advance slot to skip inserts (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+----------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 indexed_col[integer]:100 data[text]:'initial'
+ table public.repid_test: INSERT: id[integer]:2 indexed_col[integer]:200 data[text]:'initial'
+(2 rows)
+
+-- Update non-key column - should log only id in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Update indexed non-key column - should still log only id in old key
+UPDATE repid_test SET indexed_col = 150 WHERE id = 2;
+-- Check logical decoding output - should see old key with only id
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+----------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: id[integer]:1 indexed_col[integer]:100 data[text]:'updated'
+ table public.repid_test: UPDATE: id[integer]:2 indexed_col[integer]:150 data[text]:'initial'
+(2 rows)
+
+-- REPLICA IDENTITY FULL (all columns in old key)
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+----------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 indexed_col[integer]:100 data[text]:'initial'
+(1 row)
+
+-- Update any column - should log ALL columns in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output - should see old key with all columns
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: old-key: id[integer]:1 indexed_col[integer]:100 data[text]:'initial' new-tuple: id[integer]:1 indexed_col[integer]:100 data[text]:'updated'
+(1 row)
+
+-- REPLICA IDENTITY USING INDEX (index columns only)
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int,
+ unique_col int UNIQUE NOT NULL,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_unique_col_key;
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+INSERT INTO repid_test VALUES (2, 200, 'initial');
+-- Advance slot past inserts (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+---------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 unique_col[integer]:100 data[text]:'initial'
+ table public.repid_test: INSERT: id[integer]:2 unique_col[integer]:200 data[text]:'initial'
+(2 rows)
+
+-- Update non-indexed column - should log only unique_col in old key
+UPDATE repid_test SET data = 'updated' WHERE unique_col = 100;
+-- Update id (not in replica identity index) - should still log only unique_col
+UPDATE repid_test SET id = 10 WHERE unique_col = 200;
+-- Check logical decoding output - should see old key with only unique_col
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+----------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: id[integer]:1 unique_col[integer]:100 data[text]:'updated'
+ table public.repid_test: UPDATE: id[integer]:10 unique_col[integer]:200 data[text]:'initial'
+(2 rows)
+
+-- REPLICA IDENTITY NOTHING (no old key logged)
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY NOTHING;
+INSERT INTO repid_test VALUES (1, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+---------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 data[text]:'initial'
+(1 row)
+
+-- Update - should log no old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output - should see update with no old key
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+---------------------------------------------------------------------
+ table public.repid_test: UPDATE: id[integer]:1 data[text]:'updated'
+(1 row)
+
+-- Multi-column index replica identity
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int,
+ col_a int NOT NULL,
+ col_b int NOT NULL,
+ col_c int,
+ data text,
+ UNIQUE (col_a, col_b)
+);
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_col_a_col_b_key;
+INSERT INTO repid_test VALUES (1, 10, 20, 30, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+---------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 col_a[integer]:10 col_b[integer]:20 col_c[integer]:30 data[text]:'initial'
+(1 row)
+
+-- Update non-indexed columns - should log col_a and col_b in old key
+UPDATE repid_test SET data = 'updated', col_c = 35 WHERE col_a = 10 AND col_b = 20;
+-- Check logical decoding output - should see old key with col_a and col_b
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+---------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: id[integer]:1 col_a[integer]:10 col_b[integer]:20 col_c[integer]:35 data[text]:'updated'
+(1 row)
+
+-- TOAST/external columns in replica identity
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ large_text text,
+ data text
+);
+-- REPLICA IDENTITY FULL includes toasted columns
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+-- Insert a large value (large enough to show the concept without excessive output)
+INSERT INTO repid_test VALUES (1, repeat('x', 100), 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 large_text[text]:'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' data[text]:'initial'
+(1 row)
+
+-- Update small column - should still log large_text column in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output - verify both old and new values are logged
+-- Just check that UPDATE happened and includes both large_text and data columns
+SELECT COUNT(*) as update_count FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%' AND data LIKE '%large_text%' AND data LIKE '%old-key%';
+ update_count
+--------------
+ 1
+(1 row)
+
+-- Test TOAST columns with REPLICA IDENTITY USING INDEX
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_large_text text NOT NULL,
+ data text
+);
+-- Create unique index on the large text column
+CREATE UNIQUE INDEX repid_test_large_idx ON repid_test(indexed_large_text);
+-- Set replica identity to use the index (not FULL)
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_large_idx;
+-- Insert a large value (large enough to be TOASTed)
+INSERT INTO repid_test VALUES (1, repeat('x', 100000), 'initial');
+-- Advance slot past inserts
+SELECT COUNT(*) FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ count
+-------
+ 1
+(1 row)
+
+-- Update non-indexed column - should still log indexed_large_text in old key
+-- despite being unmodified because it is TOASTed and in the replica key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Verify TOASTed indexed column part of the relica identity is logged in old key
+SELECT COUNT(*) AS toasted_index_logged FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%' AND data LIKE '%indexed_large_text%';
+ toasted_index_logged
+----------------------
+ 1
+(1 row)
+
+-- Dropped columns and replica identity
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ dropped_col int,
+ kept_col int,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+INSERT INTO repid_test VALUES (1, 999, 100, 'initial');
+-- Drop a column
+ALTER TABLE repid_test DROP COLUMN dropped_col;
+-- Advance slot past insert and DDL (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+--------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 dropped_col[integer]:999 kept_col[integer]:100 data[text]:'initial'
+(1 row)
+
+-- Update - old key should handle dropped column
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: old-key: id[integer]:1 kept_col[integer]:100 data[text]:'initial' new-tuple: id[integer]:1 kept_col[integer]:100 data[text]:'updated'
+(1 row)
+
+-- DEFAULT replica identity with composite primary key
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id_a int,
+ id_b int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id_a, id_b)
+);
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+INSERT INTO repid_test VALUES (1, 10, 100, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+-----------------------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id_a[integer]:1 id_b[integer]:10 indexed_col[integer]:100 data[text]:'initial'
+(1 row)
+
+-- Update non-key columns - should log both id_a and id_b in old key
+UPDATE repid_test SET data = 'updated', indexed_col = 150 WHERE id_a = 1 AND id_b = 10;
+-- Check logical decoding output - should see old key with both primary key columns
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+-----------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: id_a[integer]:1 id_b[integer]:10 indexed_col[integer]:150 data[text]:'updated'
+(1 row)
+
+-- Expression index and replica identity
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ email text NOT NULL,
+ data text
+);
+-- Create unique expression index
+CREATE UNIQUE INDEX repid_test_lower_email_idx ON repid_test(lower(email));
+-- Cannot use expression index for replica identity (should fail)
+-- PostgreSQL requires the index to be on simple column references
+-- This should produce an error
+DO $$
+BEGIN
+ ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_lower_email_idx;
+ RAISE EXCEPTION 'Should have failed - expression indexes cannot be used for replica identity';
+EXCEPTION
+ WHEN feature_not_supported THEN
+ RAISE NOTICE 'Correctly rejected expression index for replica identity';
+END$$;
+NOTICE: Correctly rejected expression index for replica identity
+-- Use FULL instead
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+INSERT INTO repid_test VALUES (1, '[email protected]', 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+----------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 email[text]:'[email protected]' data[text]:'initial'
+(1 row)
+
+-- Update - should log all columns in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: old-key: id[integer]:1 email[text]:'[email protected]' data[text]:'initial' new-tuple: id[integer]:1 email[text]:'[email protected]' data[text]:'updated'
+(1 row)
+
+-- NULL values in replica identity columns
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ nullable_col int,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+INSERT INTO repid_test VALUES (1, NULL, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 nullable_col[integer]:null data[text]:'initial'
+(1 row)
+
+-- Update - old key should include NULL value
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+-- Check logical decoding output - should see old key with NULL
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+-------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: old-key: id[integer]:1 data[text]:'initial' new-tuple: id[integer]:1 nullable_col[integer]:null data[text]:'updated'
+(1 row)
+
+-- Generated columns and replica identity
+DROP TABLE repid_test;
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ base_col int,
+ generated_col int GENERATED ALWAYS AS (base_col * 2) STORED,
+ data text
+);
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+INSERT INTO repid_test (id, base_col, data) VALUES (1, 50, 'initial');
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+ data
+---------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: INSERT: id[integer]:1 base_col[integer]:50 generated_col[integer]:100 data[text]:'initial'
+(1 row)
+
+-- Update base_col - generated_col will change automatically
+UPDATE repid_test SET base_col = 60 WHERE id = 1;
+-- Check logical decoding output - should include old generated_col value
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+ data
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ table public.repid_test: UPDATE: old-key: id[integer]:1 base_col[integer]:50 generated_col[integer]:100 data[text]:'initial' new-tuple: id[integer]:1 base_col[integer]:60 generated_col[integer]:120 data[text]:'initial'
+(1 row)
+
+-- Cleanup
+SELECT pg_drop_replication_slot('repid_test_slot');
+ pg_drop_replication_slot
+--------------------------
+
+(1 row)
+
+DROP TABLE repid_test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 549e9b2d7be..01ed43eba18 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,6 +137,13 @@ test: event_trigger_login
# this test also uses event triggers, so likewise run it by itself
test: fast_default
+# ----------
+# HOT updates and replica identity logging tests
+# Run these sequentially to avoid logical replication slot interference
+# ----------
+test: hot_updates
+test: replica_identity_logging
+
# run tablespace test at the end because it drops the tablespace created during
# setup that other tests may use.
test: tablespace
diff --git a/src/test/regress/sql/hot_updates.sql b/src/test/regress/sql/hot_updates.sql
new file mode 100644
index 00000000000..7030f4fc6db
--- /dev/null
+++ b/src/test/regress/sql/hot_updates.sql
@@ -0,0 +1,553 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Index verification (index still works = proves no index update for HOT)
+-- 2. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 3. pageinspect extension for HOT chain examination
+--
+
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+-- Clean up from prior runs
+DROP TABLE IF EXISTS hot_test CASCADE;
+
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+ PERFORM pg_stat_force_next_flush();
+
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Trigger page pruning via table scan
+CREATE OR REPLACE FUNCTION heap_prune_page(rel_name text, target_ctid tid)
+RETURNS void AS $$
+DECLARE
+ block_num int;
+BEGIN
+ -- Extract block number from ctid
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Scan only the specific page to trigger pruning on that page
+ EXECUTE 'SELECT COUNT(*) FROM ' || quote_ident(rel_name) ||
+ ' WHERE ctid >= (' || block_num || ',0) AND ctid < (' || (block_num + 1) || ',0)';
+END;
+$$ LANGUAGE plpgsql;
+
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) USING heap;
+
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+
+-- Get baseline and initial ctid
+WITH initial_state AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'Initial State' AS phase,
+ initial_state.ctid,
+ (get_hot_count('hot_test')).updates,
+ (get_hot_count('hot_test')).hot
+FROM initial_state;
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+
+-- Verify HOT updates occurred
+SELECT
+ 'After Updates' AS phase,
+ (get_hot_count('hot_test')).updates,
+ (get_hot_count('hot_test')).hot;
+
+-- Dump the HOT chain before pruning
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'Before VACUUM' AS phase,
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+SET SESSION enable_seqscan = OFF;
+SET SESSION enable_bitmapscan = OFF;
+
+-- Verify indexes still work
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 200;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 200;
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ 'After VACUUM' AS phase,
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Verify index was updated (new value findable)
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+SELECT id FROM hot_test WHERE indexed_col = 100;
+
+SET SESSION enable_seqscan = ON;
+SET SESSION enable_bitmapscan = ON;
+
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) USING heap;
+
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Verify all three indexes still work correctly
+SELECT id, col_a FROM hot_test WHERE col_a = 15; -- updated index
+SELECT id, col_b FROM hot_test WHERE col_b = 20; -- unchanged index
+SELECT id, col_c FROM hot_test WHERE col_c = 30; -- unchanged index
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT get_hot_count('hot_test');
+
+-- Verify all indexes still work
+SELECT id FROM hot_test WHERE col_a = 15 AND col_b = 20 AND col_c = 30;
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) USING heap;
+
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT get_hot_count('hot_test');
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT get_hot_count('hot_test');
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+
+-- Expression index with JSONB
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ metadata jsonb
+) USING heap;
+
+-- Index on JSONB expression
+CREATE INDEX hot_test_user_id_idx ON hot_test((metadata->>'user_id'));
+CREATE INDEX hot_test_status_idx ON hot_test((metadata->>'status'));
+
+INSERT INTO hot_test VALUES (1, '{"user_id": "123", "status": "active"}'::jsonb);
+
+-- Update JSONB field used in expression index to the same value,
+-- this will be HOT because the entire JSONB field is observed to
+-- be unchanged.
+UPDATE hot_test SET metadata = jsonb_set(metadata, '{user_id}', '"123"')
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Update JSONB field that is no used in any index to some new value, this
+-- will prevent a HOT update despite not changing what is used when forming
+-- the index key, this is counter intuitive and causes index bloat as well
+-- as slows down updates on JSONB data as any change will trigger all
+-- indexes to be updated.
+UPDATE hot_test SET metadata = jsonb_set(metadata, '{food}', '"apple"')
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Use a few different methods for mutating JSONB data, but don't modify
+-- indexed portions of the document. None of these will be HOT.
+UPDATE hot_test SET metadata = jsonb_set(
+ jsonb_set(metadata, '{food}', '"pear"'),
+ '{timestamp}',
+ to_jsonb(now())
+)
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+UPDATE hot_test
+SET metadata = metadata || '{"user_id": "123", "timestamp": "2024-01-01"}'::jsonb
+WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+UPDATE hot_test SET metadata =
+ jsonb_set(
+ jsonb_set(metadata, '{user_id}', '"123"'),
+ '{fruit}',
+ '"plumb"'
+ );
+SELECT get_hot_count('hot_test');
+
+UPDATE hot_test SET metadata = metadata || jsonb_build_object(
+ 'user_id', '123',
+ 'timestamp', now(),
+ 'fruit', 'honeydew'
+);
+SELECT get_hot_count('hot_test');
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) USING heap;
+
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT get_hot_count('hot_test');
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) USING heap;
+
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT get_hot_count('hot_test');
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT get_hot_count('hot_test');
+
+-- Verify index still works
+SELECT id FROM hot_test WHERE indexed_col = 100;
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT get_hot_count('hot_test');
+
+-- Verify index was updated
+SELECT id FROM hot_test WHERE indexed_col = 200;
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) USING heap;
+
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT get_hot_count('hot_test');
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) USING heap;
+
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT get_hot_count('hot_test');
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT get_hot_count('hot_test');
+
+-- Verify multi-column index works
+SELECT id FROM hot_test WHERE col_a = 10 AND col_b = 20;
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key) USING heap;
+
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) USING heap;
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) USING heap;
+
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+
+SELECT get_hot_count('hot_test_part1');
+SELECT get_hot_count('hot_test_part2');
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT get_hot_count('hot_test_part1');
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+
+-- Cleanup
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS heap_prune(text);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
diff --git a/src/test/regress/sql/replica_identity_logging.sql b/src/test/regress/sql/replica_identity_logging.sql
new file mode 100644
index 00000000000..4c45e76e15d
--- /dev/null
+++ b/src/test/regress/sql/replica_identity_logging.sql
@@ -0,0 +1,349 @@
+--
+-- REPLICA_IDENTITY_LOGGING
+-- Test that replica identity keys are correctly extracted and logged for logical replication
+--
+-- This test verifies that the correct old key columns are included in WAL records
+-- for logical replication, based on the table's replica identity setting.
+--
+
+-- Clean up from prior runs
+DROP TABLE IF EXISTS repid_test CASCADE;
+
+-- Drop replication slot if it exists from prior run
+SELECT pg_drop_replication_slot('repid_test_slot') FROM pg_replication_slots WHERE slot_name = 'repid_test_slot';
+
+-- Enable logical decoding to verify what gets logged
+SELECT 'init' FROM pg_create_logical_replication_slot('repid_test_slot', 'test_decoding');
+
+-- REPLICA IDENTITY DEFAULT (primary key columns only)
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ data text
+);
+
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+INSERT INTO repid_test VALUES (2, 200, 'initial');
+
+-- Advance slot to skip inserts (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update non-key column - should log only id in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Update indexed non-key column - should still log only id in old key
+UPDATE repid_test SET indexed_col = 150 WHERE id = 2;
+
+-- Check logical decoding output - should see old key with only id
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- REPLICA IDENTITY FULL (all columns in old key)
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update any column - should log ALL columns in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output - should see old key with all columns
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- REPLICA IDENTITY USING INDEX (index columns only)
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int,
+ unique_col int UNIQUE NOT NULL,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_unique_col_key;
+
+INSERT INTO repid_test VALUES (1, 100, 'initial');
+INSERT INTO repid_test VALUES (2, 200, 'initial');
+
+-- Advance slot past inserts (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update non-indexed column - should log only unique_col in old key
+UPDATE repid_test SET data = 'updated' WHERE unique_col = 100;
+
+-- Update id (not in replica identity index) - should still log only unique_col
+UPDATE repid_test SET id = 10 WHERE unique_col = 200;
+
+-- Check logical decoding output - should see old key with only unique_col
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- REPLICA IDENTITY NOTHING (no old key logged)
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY NOTHING;
+
+INSERT INTO repid_test VALUES (1, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update - should log no old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output - should see update with no old key
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- Multi-column index replica identity
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int,
+ col_a int NOT NULL,
+ col_b int NOT NULL,
+ col_c int,
+ data text,
+ UNIQUE (col_a, col_b)
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_col_a_col_b_key;
+
+INSERT INTO repid_test VALUES (1, 10, 20, 30, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update non-indexed columns - should log col_a and col_b in old key
+UPDATE repid_test SET data = 'updated', col_c = 35 WHERE col_a = 10 AND col_b = 20;
+
+-- Check logical decoding output - should see old key with col_a and col_b
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- TOAST/external columns in replica identity
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ large_text text,
+ data text
+);
+
+-- REPLICA IDENTITY FULL includes toasted columns
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+-- Insert a large value (large enough to show the concept without excessive output)
+INSERT INTO repid_test VALUES (1, repeat('x', 100), 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update small column - should still log large_text column in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output - verify both old and new values are logged
+-- Just check that UPDATE happened and includes both large_text and data columns
+SELECT COUNT(*) as update_count FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%' AND data LIKE '%large_text%' AND data LIKE '%old-key%';
+
+-- Test TOAST columns with REPLICA IDENTITY USING INDEX
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ indexed_large_text text NOT NULL,
+ data text
+);
+
+-- Create unique index on the large text column
+CREATE UNIQUE INDEX repid_test_large_idx ON repid_test(indexed_large_text);
+
+-- Set replica identity to use the index (not FULL)
+ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_large_idx;
+
+-- Insert a large value (large enough to be TOASTed)
+INSERT INTO repid_test VALUES (1, repeat('x', 100000), 'initial');
+
+-- Advance slot past inserts
+SELECT COUNT(*) FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update non-indexed column - should still log indexed_large_text in old key
+-- despite being unmodified because it is TOASTed and in the replica key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Verify TOASTed indexed column part of the relica identity is logged in old key
+SELECT COUNT(*) AS toasted_index_logged FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%' AND data LIKE '%indexed_large_text%';
+-- Dropped columns and replica identity
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ dropped_col int,
+ kept_col int,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+INSERT INTO repid_test VALUES (1, 999, 100, 'initial');
+
+-- Drop a column
+ALTER TABLE repid_test DROP COLUMN dropped_col;
+
+-- Advance slot past insert and DDL (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update - old key should handle dropped column
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- DEFAULT replica identity with composite primary key
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id_a int,
+ id_b int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id_a, id_b)
+);
+
+CREATE INDEX repid_test_idx ON repid_test(indexed_col);
+
+INSERT INTO repid_test VALUES (1, 10, 100, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update non-key columns - should log both id_a and id_b in old key
+UPDATE repid_test SET data = 'updated', indexed_col = 150 WHERE id_a = 1 AND id_b = 10;
+
+-- Check logical decoding output - should see old key with both primary key columns
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- Expression index and replica identity
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ email text NOT NULL,
+ data text
+);
+
+-- Create unique expression index
+CREATE UNIQUE INDEX repid_test_lower_email_idx ON repid_test(lower(email));
+
+-- Cannot use expression index for replica identity (should fail)
+-- PostgreSQL requires the index to be on simple column references
+-- This should produce an error
+DO $$
+BEGIN
+ ALTER TABLE repid_test REPLICA IDENTITY USING INDEX repid_test_lower_email_idx;
+ RAISE EXCEPTION 'Should have failed - expression indexes cannot be used for replica identity';
+EXCEPTION
+ WHEN feature_not_supported THEN
+ RAISE NOTICE 'Correctly rejected expression index for replica identity';
+END$$;
+
+-- Use FULL instead
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+INSERT INTO repid_test VALUES (1, '[email protected]', 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update - should log all columns in old key
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- NULL values in replica identity columns
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ nullable_col int,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+INSERT INTO repid_test VALUES (1, NULL, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update - old key should include NULL value
+UPDATE repid_test SET data = 'updated' WHERE id = 1;
+
+-- Check logical decoding output - should see old key with NULL
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- Generated columns and replica identity
+DROP TABLE repid_test;
+
+CREATE TABLE repid_test (
+ id int PRIMARY KEY,
+ base_col int,
+ generated_col int GENERATED ALWAYS AS (base_col * 2) STORED,
+ data text
+);
+
+ALTER TABLE repid_test REPLICA IDENTITY FULL;
+
+INSERT INTO repid_test (id, base_col, data) VALUES (1, 50, 'initial');
+
+-- Advance slot past insert (filter out transaction boundaries for stability)
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data NOT LIKE 'BEGIN %' AND data NOT LIKE 'COMMIT %';
+
+-- Update base_col - generated_col will change automatically
+UPDATE repid_test SET base_col = 60 WHERE id = 1;
+
+-- Check logical decoding output - should include old generated_col value
+SELECT data FROM pg_logical_slot_get_changes('repid_test_slot', NULL, NULL)
+WHERE data LIKE '%UPDATE%';
+
+-- Cleanup
+SELECT pg_drop_replication_slot('repid_test_slot');
+DROP TABLE repid_test;
--
2.51.2
[text/x-patch] v33-0002-Idenfity-modified-indexed-attributes-in-the-exec.patch (59.5K, 3-v33-0002-Idenfity-modified-indexed-attributes-in-the-exec.patch)
download | inline diff:
From 0ef7260343415f98201108778d0793af5ebcf1b3 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Sun, 2 Nov 2025 11:36:20 -0500
Subject: [PATCH v33 2/2] Idenfity modified indexed attributes in the executor
on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes" (mix_attrs,
was known as modified_attrs) now happens outside the buffer lock and can
inform other decisions unrelated to heap.
ExecUpdateModIdxAttrs() replaces HeapDeterminesColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the lock is held later within
heapam_tuple_update() and heap_update().
ExecUpdateModIdxAttrs() in turn uses ExecCompareSlotAttrs() to identify
which attributes have changed and then intersects that with the set of
indexed attributes to identify the modified indexed set, the mix_attrs.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involed in the decision to include the replica identity key or not.
This moved into heap_update() and out of HeapDetermineColumnsInfo()
which has been renamed to HeapUpdateModIdxAttrs() as it is still
required within simple_heap_update() to be able to identify mix_attrs
given only an old TID and a new HeapTuple.
Updates stemming from logical replication also use the new
ExecUpdateModIdxAttrs() in ExecSimpleRelationUpdate().
This patch also introduces a few helper functions: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a booleaning indicating if the update allows HOT updates or not.
If during heap_update() the new tuple will fit on the same page and that
boolean is true, the update is HOT. None of the logic related to when
HOT is allowed has changed.
Triggers are free to use heap_modify_tuple() and update attributes not
found in the UPDATE statement or triggers that fire due to an UPDATE.
When that happens the executor has no knowledge of those changes. This
forced HeapDetermineColumnsInfo() to scan all indexed attributes on a
relation rather than only the intersection of indexed and those
identified by ExecGetAllUpdatedCols(). This occurs in at least one test
that uses the tsvector_update_trigger() function (tsearch.sql).
ExecBRUpdateTriggers() has been changed to identify changes to indexed
columns not found by ExecGetAllUpdateCols() and add those attributes to
ri_extraUpdatedCols.
Three tests were adjusted to avoid instability due to tuple ordering
during heap page scans. This avoids nondeterministic results.
---
src/backend/access/heap/heapam.c | 481 +++++++++++-------
src/backend/access/heap/heapam_handler.c | 32 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/commands/trigger.c | 20 +-
src/backend/executor/execReplication.c | 7 +-
src/backend/executor/execTuples.c | 78 +++
src/backend/executor/nodeModifyTable.c | 93 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 9 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/triggers.out | 16 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/triggers.sql | 4 +-
src/test/regress/sql/updatable_views.sql | 2 +-
19 files changed, 591 insertions(+), 233 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index a231563f0df..18961d714a3 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,14 +37,19 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/proc.h"
@@ -52,6 +57,7 @@
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -68,11 +74,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3302,7 +3305,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3312,17 +3315,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *rid_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3341,13 +3340,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3378,33 +3376,14 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well
+ * as the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * NOTE: relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin", NULL);
@@ -3458,20 +3437,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3482,16 +3458,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicaIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(mix_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, mix_attrs);
+ attrs = bms_int_members(attrs, rid_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3504,9 +3523,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3523,7 +3541,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3534,7 +3552,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* with the new tuple's location, so there's great risk of confusion if we
* use otid anymore.
*/
-
l2:
checked_lockers = false;
locker_remains = false;
@@ -3602,7 +3619,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3611,7 +3628,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3696,7 +3713,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3756,17 +3773,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3796,7 +3810,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3913,7 +3927,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4073,37 +4087,16 @@ l2:
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
- if (newbuf == buffer)
- {
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
- use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
- }
+ if ((newbuf == buffer) && hot_allowed)
+ use_hot_update = true;
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4113,8 +4106,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4243,7 +4235,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4257,31 +4249,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* mix_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4454,28 +4427,113 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(mix_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes? If that's the case we need to update those
+ * indexes, but this can be a HOT update.
+ */
+ if (bms_is_subset(mix_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *mix_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(mix_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4491,7 +4549,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
@@ -4504,7 +4562,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ mix_attrs = bms_add_member(mix_attrs, attidx);
continue;
}
}
@@ -4520,29 +4578,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ mix_attrs = bms_add_member(mix_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return mix_attrs;
}
/*
@@ -4554,17 +4595,108 @@ HeapDetermineColumnsInfo(Relation relation,
* via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *mix_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("mix_attrs") so as to see if a HOT update is allowable or
+ * not. When updating heap tuples via execution of UPDATE statements this
+ * set is constructed before calling into the table AM's tuple_update()
+ * function by the function ExecUpdateModIdxAttrs() which compares the
+ * old/new TupleTableSlots. However, here we have the old TID and the new
+ * tuple, not two TupleTableSlots, but we still need to construct a simlar
+ * bitmap so as to be able to know if HOT updates are allowed or not. To
+ * do that we first have to fetch the old tuple itself. Because
+ * heapam_fetch_row_version() is static, we have to replicate that code
+ * here. This is a bit repetitive because heap_update() will again find
+ * and form the old HeapTuple from the old TID and in most cases the
+ * callers (ignoring extensions, always catalog tuple updates) already had
+ * the set of changed attributes (e.g. the "replaces" array), but for now
+ * this minor repetition of work is necessary.
+ */
+
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* mix_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ mix_attrs = HeapUpdateModIdxAttrs(relation, oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, mix_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4574,6 +4706,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4590,7 +4726,6 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
}
}
-
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..4600af61793 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -44,6 +43,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,19 +316,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, mix_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, mix_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, mix_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -341,16 +348,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..42acd5b17a9 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ mix_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..64efa55dfe3 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..c2e77740e76 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *mix_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,11 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ mix_attrs, &update_indexes);
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..1064ebe845b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,83 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ continue;
+
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..4927fc88e61 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,68 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "mix_attrs". An overlap of a single index's attributes and this "mix" set
+ * signals that the attributes in the new_tts used to form the index datum have
+ * changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * NOTE: There is a simlar function called HeapUpdateModIDxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModIdxAttrs(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *mix_attrs = NULL;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) from the
+ * executor state and then find the intersection with the indexed
+ * attributes. Attributes that are SET might not change value, so we have
+ * to examine them for changes.
+ */
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the "modified indexed
+ * attributes" or "mix_attrs".
+ */
+ if (!bms_is_empty(attrs))
+ mix_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return mix_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2259,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *mix_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2386,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ mix_attrs = ExecUpdateModIdxAttrs(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2409,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ mix_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2632,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3483,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4539,7 +4616,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 6b634c9fff1..f30505d8ae3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,8 +2475,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5276,8 +5276,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5300,8 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5320,10 +5320,10 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5366,8 +5366,8 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5426,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5435,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5498,12 +5498,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ bms_free(indexedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5512,10 +5520,10 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5528,8 +5536,8 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5542,10 +5550,10 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 3c0961ab36b..7abc8e24f21 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -365,10 +365,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *mix_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -430,6 +429,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *mix_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *mix_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..19c58a76854 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1523,12 +1524,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *mix_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ mix_attrs, update_indexes);
}
/*
@@ -2009,6 +2010,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *mix_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..266d5309103 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -17,6 +17,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -606,6 +607,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -803,5 +808,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModIdxAttrs(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,8 +162,8 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,8 +69,8 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 98dee63b50a..ef98fd0cccf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -959,16 +959,24 @@ NOTICE: main_view BEFORE UPDATE STATEMENT (before_view_upd_stmt)
NOTICE: main_view AFTER UPDATE STATEMENT (after_view_upd_stmt)
UPDATE 0
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
-NOTICE: OLD: (21,10)
-NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (20,31)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
+NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
+NOTICE: OLD: (21,10)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 32;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (21,32)
NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
-DELETE 3
+DELETE 1
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index ea39817ee3d..6ceb61608ae 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -660,7 +660,9 @@ UPDATE main_view SET b = 32 WHERE a = 21 AND b = 31 RETURNING a, b;
UPDATE main_view SET b = 0 WHERE false;
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+DELETE FROM main_view WHERE a = 21 AND b = 32;
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
\set QUIET true
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-11 15:51 ` Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-03-11 15:51 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Jeff Davis <[email protected]>; Nathan Bossart <[email protected]>
Hello again,
Attached is v35 (master@f4a4ce52c0d) where I've separated out changes into three patches. Still nothing related to $subject directly, but foundational for that work (coming soon). I'd like to get these into v19 if at all possible and then target the rest of $subject for v20 so that it has more time to soak.
0001 - This patch adds tests to validate and capture the expected behavior of Heap-only tuple (HOT) updates. This also serves as a foundation that will aide in documenting what exactly changed in the commits implementing $subject at some later date. This patch isn't required, but it does a good job of demonstrating that a) the changes in 0002 don't impact HOT decisions (as intended) and b) that future patches which change HOT behavior have a very obvious record of what changed because they update these test results (not tests) to illustrate that. That said, if the next two patches are merged without this one I'd be just as happy as if all 3 made it into v19.
0002 - This patch plugs a hole (bug?) in ExecGetAllUpdatedCols() which is triggered by an existing test in tsearch.sql and the tsvector_update_trigger(). That trigger uses heap_modify_tuple() to change an indexed attribute that is not discovered by ExecGetAllUpdatedCols(), which seems odd to me at best and at worst wrong (or even a potential security issue). This patch finds and adds columns that are updated into the Bitmapset returned by ExecGetAllUpdatedCols(). The patch includes a helper function ExecCompareSlotAttrs() that will be used in follow-on patches as well.
0003 - This patch moves the logic for HeapDetermineColumnsInfo() into the executor while preserving the functionality of simple_heap_update(). A few helper functions are created to better illustrate HOT and lock mode decision making and are reused when possible. The portion of HeapDetermineColumnsInfo() related to replica identity key WAL logging is now in-line in heap_update().
These commits maintain 100% identical logic for HOT, lockmode, and replica identity decisions (or there's a flaw and that should be fixed so let me know) They simply juggle the logic into places where I think they fit better and provide for future work in this area.
I appreciate your time and effort considering these changes.
best.
-greg
Attachments:
[text/x-patch] v35-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch (89.5K, 2-v35-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch)
download | inline diff:
From c09fd7d1825965db3698fad8b8b32b625155e45a Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 09:28:15 -0400
Subject: [PATCH v35 1/3] Add tests to cover a variety of heap HOT update
behaviors
This commit introduces test infrastructure for verifying Heap-Only Tuple
(HOT) update functionality in PostgreSQL. It provides a baseline for
demonstrating and validating HOT update behavior.
Regression tests:
- Basic HOT vs non-HOT update decisions
- All-or-none property for multiple indexes
- Partial indexes and predicate handling
- BRIN (summarizing) indexes allowing HOT updates
- TOAST column handling with HOT
- Unique constraints behavior
- Multi-column indexes
- Partitioned table HOT updates
Isolation tests:
- HOT chain formation and maintenance
- Concurrent HOT update scenarios
- Index scan behavior with HOT chains
---
.../isolation/expected/hot_updates_chain.out | 144 +++
.../expected/hot_updates_concurrent.out | 143 +++
.../expected/hot_updates_index_scan.out | 132 +++
src/test/isolation/isolation_schedule | 3 +
.../isolation/specs/hot_updates_chain.spec | 110 ++
.../specs/hot_updates_concurrent.spec | 107 ++
.../specs/hot_updates_index_scan.spec | 94 ++
src/test/regress/expected/hot_updates.out | 950 ++++++++++++++++++
src/test/regress/parallel_schedule | 5 +
src/test/regress/sql/hot_updates.sql | 692 +++++++++++++
10 files changed, 2380 insertions(+)
create mode 100644 src/test/isolation/expected/hot_updates_chain.out
create mode 100644 src/test/isolation/expected/hot_updates_concurrent.out
create mode 100644 src/test/isolation/expected/hot_updates_index_scan.out
create mode 100644 src/test/isolation/specs/hot_updates_chain.spec
create mode 100644 src/test/isolation/specs/hot_updates_concurrent.spec
create mode 100644 src/test/isolation/specs/hot_updates_index_scan.spec
create mode 100644 src/test/regress/expected/hot_updates.out
create mode 100644 src/test/regress/sql/hot_updates.sql
diff --git a/src/test/isolation/expected/hot_updates_chain.out b/src/test/isolation/expected/hot_updates_chain.out
new file mode 100644
index 00000000000..503252009ea
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_chain.out
@@ -0,0 +1,144 @@
+Parsed test spec with 5 sessions
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s1_hot_update3 s1_commit s1_select s1_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_hot_update3: UPDATE hot_test SET non_indexed_col = 'update3' WHERE id = 1;
+step s1_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|update3
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s2_begin s2_select_before s1_begin s1_hot_update1 s1_hot_update2 s1_commit s2_select_after s2_commit
+step s2_begin: BEGIN ISOLATION LEVEL REPEATABLE READ;
+step s2_select_before: SELECT non_indexed_col FROM hot_test WHERE id = 1;
+non_indexed_col
+---------------
+initial
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_commit: COMMIT;
+step s2_select_after: SELECT non_indexed_col FROM hot_test WHERE id = 1;
+non_indexed_col
+---------------
+initial
+(1 row)
+
+step s2_commit: COMMIT;
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s1_commit s3_begin s3_non_hot_update s3_commit s1_select
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s1_commit: COMMIT;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s3_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|update2
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update1 s1_commit s3_begin s3_non_hot_update s3_commit s4_begin s4_hot_after_non_hot s4_commit s4_select s4_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_commit: COMMIT;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s3_commit: COMMIT;
+step s4_begin: BEGIN;
+step s4_hot_after_non_hot: UPDATE hot_test SET non_indexed_col = 'after_non_hot' WHERE id = 1;
+step s4_commit: COMMIT;
+step s4_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|after_non_hot
+(1 row)
+
+step s4_verify_hot:
+ -- Check for new HOT chain after non-HOT update broke the previous chain
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update1 s1_hot_update2 s5_begin s5_hot_update_row2_1 s5_hot_update_row2_2 s1_commit s5_commit s1_select s5_select s1_verify_hot s5_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update1: UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1;
+step s1_hot_update2: UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1;
+step s5_begin: BEGIN;
+step s5_hot_update_row2_1: UPDATE hot_test SET non_indexed_col = 'row2_update1' WHERE id = 2;
+step s5_hot_update_row2_2: UPDATE hot_test SET non_indexed_col = 'row2_update2' WHERE id = 2;
+step s1_commit: COMMIT;
+step s5_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|update2
+(1 row)
+
+step s5_select: SELECT * FROM hot_test WHERE id = 2;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 2| 200|row2_update2
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+step s5_verify_hot:
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
diff --git a/src/test/isolation/expected/hot_updates_concurrent.out b/src/test/isolation/expected/hot_updates_concurrent.out
new file mode 100644
index 00000000000..b1a8b0cb7b2
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_concurrent.out
@@ -0,0 +1,143 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s1_begin s1_hot_update s2_begin s2_hot_update s1_commit s2_commit s1_select s2_select s2_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s2_begin: BEGIN;
+step s2_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s2' WHERE id = 1; <waiting ...>
+step s1_commit: COMMIT;
+step s2_hot_update: <... completed>
+step s2_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s2
+(1 row)
+
+step s2_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s2
+(1 row)
+
+step s2_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s3_begin s3_non_hot_update s1_commit s3_commit s3_select s3_verify_index
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1; <waiting ...>
+step s1_commit: COMMIT;
+step s3_non_hot_update: <... completed>
+step s3_commit: COMMIT;
+step s3_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|updated_s1
+(1 row)
+
+step s3_verify_index:
+ -- Verify index was updated (proves non-HOT)
+ SELECT COUNT(*) = 1 AS index_updated FROM hot_test WHERE indexed_col = 150;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 100;
+
+index_updated
+-------------
+t
+(1 row)
+
+old_value_gone
+--------------
+t
+(1 row)
+
+
+starting permutation: s3_begin s3_non_hot_update s1_begin s1_hot_update s3_commit s1_commit s1_select s1_verify_hot
+step s3_begin: BEGIN;
+step s3_non_hot_update: UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1; <waiting ...>
+step s3_commit: COMMIT;
+step s1_hot_update: <... completed>
+step s1_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 150|updated_s1
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s4_begin s4_hot_update_row2 s1_commit s4_commit s1_select s4_select s1_verify_hot s4_verify_hot
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1;
+step s4_begin: BEGIN;
+step s4_hot_update_row2: UPDATE hot_test SET non_indexed_col = 'updated_s4' WHERE id = 2;
+step s1_commit: COMMIT;
+step s4_commit: COMMIT;
+step s1_select: SELECT * FROM hot_test WHERE id = 1;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 1| 100|updated_s1
+(1 row)
+
+step s4_select: SELECT * FROM hot_test WHERE id = 2;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+ 2| 200|updated_s4
+(1 row)
+
+step s1_verify_hot:
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+step s4_verify_hot:
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+
+has_hot_chain
+-------------
+t
+(1 row)
+
diff --git a/src/test/isolation/expected/hot_updates_index_scan.out b/src/test/isolation/expected/hot_updates_index_scan.out
new file mode 100644
index 00000000000..7d8e9ff8857
--- /dev/null
+++ b/src/test/isolation/expected/hot_updates_index_scan.out
@@ -0,0 +1,132 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s1_begin s1_hot_update s2_begin s2_index_scan s1_commit s2_commit
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s2_begin: BEGIN;
+step s2_index_scan: SELECT * FROM hot_test WHERE indexed_col = 500;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_commit: COMMIT;
+step s2_commit: COMMIT;
+
+starting permutation: s1_begin s1_non_hot_update s1_commit s2_begin s2_index_scan_new s2_commit s2_verify_index
+step s1_begin: BEGIN;
+step s1_non_hot_update: UPDATE hot_test SET indexed_col = 555 WHERE id = 50;
+step s1_commit: COMMIT;
+step s2_begin: BEGIN;
+step s2_index_scan_new: SELECT * FROM hot_test WHERE indexed_col = 555;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 555|initial50
+(1 row)
+
+step s2_commit: COMMIT;
+step s2_verify_index:
+ -- After non-HOT update, verify index reflects the change
+ SELECT COUNT(*) = 1 AS found_new_value FROM hot_test WHERE indexed_col = 555;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 500;
+
+found_new_value
+---------------
+t
+(1 row)
+
+old_value_gone
+--------------
+t
+(1 row)
+
+
+starting permutation: s3_begin s3_select_for_update s1_begin s1_hot_update s3_commit s1_commit s1_verify_hot
+step s3_begin: BEGIN;
+step s3_select_for_update: SELECT * FROM hot_test WHERE id = 50 FOR UPDATE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50; <waiting ...>
+step s3_commit: COMMIT;
+step s1_hot_update: <... completed>
+step s1_commit: COMMIT;
+step s1_verify_hot:
+ -- Verify HOT chain exists for row with id=50
+ -- Use actual ctid to find the correct page
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(
+ get_raw_page('hot_test', (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ )
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND t_ctid != ('(' || (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50) || ',' || lp || ')')::tid
+ AND (t_ctid::text::point)[0]::int = (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ ) AS has_hot_chain;
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s1_begin s1_hot_update s3_begin s3_select_for_update s1_commit s3_commit
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s3_begin: BEGIN;
+step s3_select_for_update: SELECT * FROM hot_test WHERE id = 50 FOR UPDATE; <waiting ...>
+step s1_commit: COMMIT;
+step s3_select_for_update: <... completed>
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|hot_updated
+(1 row)
+
+step s3_commit: COMMIT;
+
+starting permutation: s4_begin s4_select_for_key_share s1_begin s1_hot_update s4_commit s1_commit s1_verify_hot
+step s4_begin: BEGIN;
+step s4_select_for_key_share: SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_hot_update: UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50;
+step s4_commit: COMMIT;
+step s1_commit: COMMIT;
+step s1_verify_hot:
+ -- Verify HOT chain exists for row with id=50
+ -- Use actual ctid to find the correct page
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(
+ get_raw_page('hot_test', (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ )
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND t_ctid != ('(' || (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50) || ',' || lp || ')')::tid
+ AND (t_ctid::text::point)[0]::int = (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ ) AS has_hot_chain;
+
+has_hot_chain
+-------------
+t
+(1 row)
+
+
+starting permutation: s4_begin s4_select_for_key_share s1_begin s1_non_hot_update s4_commit s1_commit
+step s4_begin: BEGIN;
+step s4_select_for_key_share: SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE;
+id|indexed_col|non_indexed_col
+--+-----------+---------------
+50| 500|initial50
+(1 row)
+
+step s1_begin: BEGIN;
+step s1_non_hot_update: UPDATE hot_test SET indexed_col = 555 WHERE id = 50;
+step s4_commit: COMMIT;
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 4e466580cd4..46525b0a62a 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -19,6 +19,9 @@ test: multiple-row-versions
test: index-only-scan
test: index-only-bitmapscan
test: predicate-lock-hot-tuple
+test: hot_updates_concurrent
+test: hot_updates_index_scan
+test: hot_updates_chain
test: update-conflict-out
test: deadlock-simple
test: deadlock-hard
diff --git a/src/test/isolation/specs/hot_updates_chain.spec b/src/test/isolation/specs/hot_updates_chain.spec
new file mode 100644
index 00000000000..85cd2176133
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_chain.spec
@@ -0,0 +1,110 @@
+# Test HOT update chains and their interaction with VACUUM and page pruning
+#
+# This test verifies that HOT update chains are correctly maintained when
+# multiple HOT updates occur on the same row, and that VACUUM correctly
+# handles HOT chains.
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test VALUES (1, 100, 'initial');
+ INSERT INTO hot_test VALUES (2, 200, 'initial');
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: Create HOT chain with multiple updates
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update1 { UPDATE hot_test SET non_indexed_col = 'update1' WHERE id = 1; }
+step s1_hot_update2 { UPDATE hot_test SET non_indexed_col = 'update2' WHERE id = 1; }
+step s1_hot_update3 { UPDATE hot_test SET non_indexed_col = 'update3' WHERE id = 1; }
+step s1_commit { COMMIT; }
+step s1_select { SELECT * FROM hot_test WHERE id = 1; }
+step s1_verify_hot {
+ -- Check for HOT chain: LP_REDIRECT or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 2: Read while HOT chain is being built
+session s2
+step s2_begin { BEGIN ISOLATION LEVEL REPEATABLE READ; }
+step s2_select_before { SELECT non_indexed_col FROM hot_test WHERE id = 1; }
+step s2_select_after { SELECT non_indexed_col FROM hot_test WHERE id = 1; }
+step s2_commit { COMMIT; }
+
+# Session 3: Break HOT chain with non-HOT update
+session s3
+step s3_begin { BEGIN; }
+step s3_non_hot_update { UPDATE hot_test SET indexed_col = 150 WHERE id = 1; }
+step s3_commit { COMMIT; }
+
+# Session 4: Try to build HOT chain after non-HOT update
+session s4
+step s4_begin { BEGIN; }
+step s4_hot_after_non_hot { UPDATE hot_test SET non_indexed_col = 'after_non_hot' WHERE id = 1; }
+step s4_commit { COMMIT; }
+step s4_select { SELECT * FROM hot_test WHERE id = 1; }
+step s4_verify_hot {
+ -- Check for new HOT chain after non-HOT update broke the previous chain
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Session 5: Multiple sessions building separate HOT chains on different rows
+session s5
+step s5_begin { BEGIN; }
+step s5_hot_update_row2_1 { UPDATE hot_test SET non_indexed_col = 'row2_update1' WHERE id = 2; }
+step s5_hot_update_row2_2 { UPDATE hot_test SET non_indexed_col = 'row2_update2' WHERE id = 2; }
+step s5_commit { COMMIT; }
+step s5_select { SELECT * FROM hot_test WHERE id = 2; }
+step s5_verify_hot {
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Build HOT chain within single transaction
+# All updates should form a HOT chain
+permutation s1_begin s1_hot_update1 s1_hot_update2 s1_hot_update3 s1_commit s1_select s1_verify_hot
+
+# REPEATABLE READ should see consistent snapshot across HOT chain updates
+# Session 2 starts before updates, should see 'initial' throughout
+permutation s2_begin s2_select_before s1_begin s1_hot_update1 s1_hot_update2 s1_commit s2_select_after s2_commit
+
+# HOT chain followed by non-HOT update
+# Non-HOT update breaks the HOT chain
+permutation s1_begin s1_hot_update1 s1_hot_update2 s1_commit s3_begin s3_non_hot_update s3_commit s1_select
+
+# HOT update after non-HOT update can start new HOT chain
+# After breaking chain with indexed column update, new HOT updates can start fresh chain
+permutation s1_begin s1_hot_update1 s1_commit s3_begin s3_non_hot_update s3_commit s4_begin s4_hot_after_non_hot s4_commit s4_select s4_verify_hot
+
+# Multiple sessions building separate HOT chains on different rows
+permutation s1_begin s1_hot_update1 s1_hot_update2 s5_begin s5_hot_update_row2_1 s5_hot_update_row2_2 s1_commit s5_commit s1_select s5_select s1_verify_hot s5_verify_hot
diff --git a/src/test/isolation/specs/hot_updates_concurrent.spec b/src/test/isolation/specs/hot_updates_concurrent.spec
new file mode 100644
index 00000000000..eac78d62ac5
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_concurrent.spec
@@ -0,0 +1,107 @@
+# Test concurrent HOT updates and validate HOT chains
+#
+# This test verifies that HOT updates work correctly when multiple sessions
+# are updating the same table concurrently, and validates that HOT chains
+# are actually created using heap_page_items().
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test VALUES (1, 100, 'initial1');
+ INSERT INTO hot_test VALUES (2, 200, 'initial2');
+ INSERT INTO hot_test VALUES (3, 300, 'initial3');
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: HOT update (modify non-indexed column)
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update { UPDATE hot_test SET non_indexed_col = 'updated_s1' WHERE id = 1; }
+step s1_commit { COMMIT; }
+step s1_select { SELECT * FROM hot_test WHERE id = 1; }
+step s1_verify_hot {
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 2: HOT update (modify non-indexed column on same row)
+session s2
+step s2_begin { BEGIN; }
+step s2_hot_update { UPDATE hot_test SET non_indexed_col = 'updated_s2' WHERE id = 1; }
+step s2_commit { COMMIT; }
+step s2_select { SELECT * FROM hot_test WHERE id = 1; }
+step s2_verify_hot {
+ -- Check for HOT chain: look for LP_REDIRECT (lp_flags=2) or tuple with t_ctid pointing to same page
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2 -- LP_REDIRECT indicates HOT chain
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0 -- same page
+ AND t_ctid != ('(0,' || lp || ')')::tid); -- different offset
+}
+
+# Session 3: Non-HOT update (modify indexed column)
+session s3
+step s3_begin { BEGIN; }
+step s3_non_hot_update { UPDATE hot_test SET indexed_col = 150 WHERE id = 1; }
+step s3_commit { COMMIT; }
+step s3_select { SELECT * FROM hot_test WHERE id = 1; }
+step s3_verify_index {
+ -- Verify index was updated (proves non-HOT)
+ SELECT COUNT(*) = 1 AS index_updated FROM hot_test WHERE indexed_col = 150;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 100;
+}
+
+# Session 4: Concurrent HOT updates on different rows
+session s4
+step s4_begin { BEGIN; }
+step s4_hot_update_row2 { UPDATE hot_test SET non_indexed_col = 'updated_s4' WHERE id = 2; }
+step s4_commit { COMMIT; }
+step s4_select { SELECT * FROM hot_test WHERE id = 2; }
+step s4_verify_hot {
+ -- Check for HOT chain on page 0
+ SELECT COUNT(*) > 0 AS has_hot_chain
+ FROM heap_page_items(get_raw_page('hot_test', 0))
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND (t_ctid::text::point)[0]::int = 0
+ AND t_ctid != ('(0,' || lp || ')')::tid);
+}
+
+# Two sessions both doing HOT updates on same row
+# Second session should block until first commits
+# Both should create HOT chains
+permutation s1_begin s1_hot_update s2_begin s2_hot_update s1_commit s2_commit s1_select s2_select s2_verify_hot
+
+# HOT update followed by non-HOT update
+# Non-HOT update should wait for HOT update to commit
+# First update is HOT, second is non-HOT (index updated)
+permutation s1_begin s1_hot_update s3_begin s3_non_hot_update s1_commit s3_commit s3_select s3_verify_index
+
+# Non-HOT update followed by HOT update
+# HOT update should wait for non-HOT update to commit
+# First update is non-HOT (index), second is HOT
+permutation s3_begin s3_non_hot_update s1_begin s1_hot_update s3_commit s1_commit s1_select s1_verify_hot
+
+# Concurrent HOT updates on different rows (should not block)
+# Both sessions should be able to create HOT chains independently
+permutation s1_begin s1_hot_update s4_begin s4_hot_update_row2 s1_commit s4_commit s1_select s4_select s1_verify_hot s4_verify_hot
diff --git a/src/test/isolation/specs/hot_updates_index_scan.spec b/src/test/isolation/specs/hot_updates_index_scan.spec
new file mode 100644
index 00000000000..70c3dae5166
--- /dev/null
+++ b/src/test/isolation/specs/hot_updates_index_scan.spec
@@ -0,0 +1,94 @@
+# Test HOT updates interaction with index scans and SELECT FOR UPDATE
+#
+# This test verifies that HOT updates are correctly handled when concurrent
+# sessions are performing index scans, using SELECT FOR UPDATE, and validates
+# HOT chains using heap_page_items().
+
+setup
+{
+ CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+ CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+ );
+
+ CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+ INSERT INTO hot_test SELECT i, i * 10, 'initial' || i FROM generate_series(1, 100) i;
+}
+
+teardown
+{
+ DROP TABLE hot_test;
+ DROP EXTENSION pageinspect;
+}
+
+# Session 1: Perform HOT update
+session s1
+step s1_begin { BEGIN; }
+step s1_hot_update { UPDATE hot_test SET non_indexed_col = 'hot_updated' WHERE id = 50; }
+step s1_non_hot_update { UPDATE hot_test SET indexed_col = 555 WHERE id = 50; }
+step s1_commit { COMMIT; }
+step s1_verify_hot {
+ -- Verify HOT chain exists for row with id=50
+ -- Use actual ctid to find the correct page
+ SELECT EXISTS (
+ SELECT 1 FROM heap_page_items(
+ get_raw_page('hot_test', (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ )
+ WHERE lp_flags = 2
+ OR (t_ctid IS NOT NULL
+ AND t_ctid != ('(' || (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50) || ',' || lp || ')')::tid
+ AND (t_ctid::text::point)[0]::int = (SELECT (ctid::text::point)[0]::int FROM hot_test WHERE id = 50))
+ ) AS has_hot_chain;
+}
+
+# Session 2: Index scan while HOT update in progress
+session s2
+step s2_begin { BEGIN; }
+step s2_index_scan { SELECT * FROM hot_test WHERE indexed_col = 500; }
+step s2_index_scan_new { SELECT * FROM hot_test WHERE indexed_col = 555; }
+step s2_commit { COMMIT; }
+step s2_verify_index {
+ -- After non-HOT update, verify index reflects the change
+ SELECT COUNT(*) = 1 AS found_new_value FROM hot_test WHERE indexed_col = 555;
+ SELECT COUNT(*) = 0 AS old_value_gone FROM hot_test WHERE indexed_col = 500;
+}
+
+# Session 3: SELECT FOR UPDATE
+session s3
+step s3_begin { BEGIN; }
+step s3_select_for_update { SELECT * FROM hot_test WHERE id = 50 FOR UPDATE; }
+step s3_commit { COMMIT; }
+
+# Session 4: SELECT FOR KEY SHARE (should not block HOT update of non-key column)
+session s4
+step s4_begin { BEGIN; }
+step s4_select_for_key_share { SELECT * FROM hot_test WHERE id = 50 FOR KEY SHARE; }
+step s4_commit { COMMIT; }
+
+# Index scan should see consistent snapshot during HOT update
+# Index scan starts before HOT update commits
+permutation s1_begin s1_hot_update s2_begin s2_index_scan s1_commit s2_commit
+
+# Index scan after non-HOT update should see new index entry
+# Index scan starts after non-HOT update commits
+permutation s1_begin s1_non_hot_update s1_commit s2_begin s2_index_scan_new s2_commit s2_verify_index
+
+# SELECT FOR UPDATE blocks HOT update
+# FOR UPDATE should block the UPDATE until SELECT commits
+permutation s3_begin s3_select_for_update s1_begin s1_hot_update s3_commit s1_commit s1_verify_hot
+
+# HOT update blocks SELECT FOR UPDATE
+# SELECT FOR UPDATE should wait for HOT update to commit
+permutation s1_begin s1_hot_update s3_begin s3_select_for_update s1_commit s3_commit
+
+# SELECT FOR KEY SHARE should not block HOT update (non-key column)
+# HOT update of non-indexed column should not conflict with FOR KEY SHARE
+permutation s4_begin s4_select_for_key_share s1_begin s1_hot_update s4_commit s1_commit s1_verify_hot
+
+# Non-HOT update (key column) should block after FOR KEY SHARE
+# Non-HOT update of indexed column should wait for FOR KEY SHARE
+permutation s4_begin s4_select_for_key_share s1_begin s1_non_hot_update s4_commit s1_commit
diff --git a/src/test/regress/expected/hot_updates.out b/src/test/regress/expected/hot_updates.out
new file mode 100644
index 00000000000..e99a51966ce
--- /dev/null
+++ b/src/test/regress/expected/hot_updates.out
@@ -0,0 +1,950 @@
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+-- Basic HOT update functionality
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) USING heap WITH (fillfactor = 50);
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 3
+(1 row)
+
+-- Dump the HOT chain for tuple with id == 1
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ t | 0 | (0,1) | normal (1) | (0,4)
+ t | 1 | (0,4) | normal (1) | (0,4)
+(2 rows)
+
+-- Trigger optimistic heap page pruning
+SELECT ctid, * FROM hot_test;
+ ctid | id | indexed_col | non_indexed_col
+-------+----+-------------+-----------------
+ (0,4) | 1 | 100 | updated1
+ (0,5) | 2 | 200 | updated2
+ (0,6) | 3 | 300 | updated3
+(3 rows)
+
+-- Dump the HOT chain after prune
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ t | 0 | (0,1) | normal (1) | (0,4)
+ t | 1 | (0,4) | normal (1) | (0,4)
+(2 rows)
+
+SET SESSION enable_seqscan = OFF;
+SET SESSION enable_bitmapscan = OFF;
+-- Verify indexes still work
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+ id | indexed_col
+----+-------------
+ 1 | 100
+(1 row)
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ f | 0 | (0,4) | normal (1) | (0,4)
+(1 row)
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 4 | 3
+(1 row)
+
+-- Verify index was updated (new value findable)
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 150)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ id | indexed_col
+----+-------------
+ 1 | 150
+(1 row)
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id FROM hot_test WHERE indexed_col = 100;
+ id
+----
+(0 rows)
+
+SET SESSION enable_seqscan = ON;
+SET SESSION enable_bitmapscan = ON;
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) USING heap WITH (fillfactor = 50);
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Verify all three indexes still work correctly
+SELECT id, col_a FROM hot_test WHERE col_a = 15; -- updated index
+ id | col_a
+----+-------
+ 1 | 15
+(1 row)
+
+SELECT id, col_b FROM hot_test WHERE col_b = 20; -- unchanged index
+ id | col_b
+----+-------
+ 1 | 20
+(1 row)
+
+SELECT id, col_c FROM hot_test WHERE col_c = 30; -- unchanged index
+ id | col_c
+----+-------
+ 1 | 30
+(1 row)
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 1
+(1 row)
+
+-- Verify all indexes still work
+SELECT id FROM hot_test WHERE col_a = 15 AND col_b = 20 AND col_c = 30;
+ id
+----
+ 1
+(1 row)
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 2
+(1 row)
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+ id | status
+----+--------
+ 1 | active
+(1 row)
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+ id
+----
+ 1
+(1 row)
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+ id | unique_col | data
+----+------------+---------
+ 1 | 100 | updated
+ 2 | 200 | updated
+(2 rows)
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+ERROR: duplicate key value violates unique constraint "hot_test_unique_col_key"
+DETAIL: Key (unique_col)=(100) already exists.
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ col_d int
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+CREATE INDEX hot_test_ab_inc_c_idx ON hot_test(col_a, col_b) INCLUDE(col_c);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 40);
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 0
+(1 row)
+
+-- Update col_c (not indexed, but included) - should NOT be HOT
+UPDATE hot_test SET col_c = 35;
+-- Verify multi-column index-only scan for included columns works
+EXPLAIN (COSTS OFF) SELECT col_c FROM hot_test WHERE col_a = 15 AND col_b = 25;
+ QUERY PLAN
+---------------------------------------------------------
+ Index Only Scan using hot_test_ab_inc_c_idx on hot_test
+ Index Cond: ((col_a = 15) AND (col_b = 25))
+(2 rows)
+
+SELECT col_c FROM hot_test WHERE col_a = 15 AND col_b = 25;
+ col_c
+-------
+ 35
+(1 row)
+
+-- ============================================================================
+-- Expression indexes with JSONB
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+-- Indexes on specific JSONB paths
+CREATE INDEX hot_test_status_idx ON hot_test((data->'status'));
+CREATE INDEX hot_test_user_id_idx ON hot_test((data->'user'->'id'));
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"status": "active", "user": {"id": 123, "name": "Alice"}, "count": 0}'::jsonb
+);
+-- Baseline
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------+---------+-----
+ Baseline | 0 | 0
+(1 row)
+
+-- Update non-indexed path {count} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 1;
+SELECT 'After updating count (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+------------------------------------+---------+-----
+ After updating count (non-indexed) | 1 | 0
+(1 row)
+
+-- Update different non-indexed path {user,name} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{user,name}', '"Bob"') WHERE id = 1;
+SELECT 'After updating user.name (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------------+---------+-----
+ After updating user.name (non-indexed) | 2 | 0
+(1 row)
+
+-- Update indexed path {status} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 1;
+SELECT 'After updating status (indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+---------------------------------+---------+-----
+ After updating status (indexed) | 3 | 0
+(1 row)
+
+-- Update indexed path {user,id} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{user,id}', '456') WHERE id = 1;
+SELECT 'After updating user.id (indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------+---------+-----
+ After updating user.id (indexed) | 4 | 0
+(1 row)
+
+-- Verify indexes still work correctly
+SELECT id FROM hot_test WHERE data->'status' = '"inactive"'::jsonb;
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test WHERE data->'user'->'id' = '456'::jsonb;
+ id
+----
+ 1
+(1 row)
+
+-- ============================================================================
+-- Nested paths and path intersection
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+CREATE INDEX hot_test_deep_idx ON hot_test((data->'a'->'b'->'c'));
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"a": {"b": {"c": "indexed", "d": "not-indexed"}}, "x": "other"}'::jsonb
+);
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------+---------+-----
+ Baseline | 0 | 0
+(1 row)
+
+-- Update sibling of indexed path {a,b,d} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{a,b,d}', '"updated"') WHERE id = 1;
+SELECT 'After updating a.b.d (sibling, non-indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+---------------------------------------------+---------+-----
+ After updating a.b.d (sibling, non-indexed) | 1 | 0
+(1 row)
+
+-- Update unrelated path {x} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{x}', '"modified"') WHERE id = 1;
+SELECT 'After updating x (unrelated path)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-----------------------------------+---------+-----
+ After updating x (unrelated path) | 2 | 0
+(1 row)
+
+-- Update parent of indexed path {a,b} - should NOT be HOT (affects child)
+UPDATE hot_test SET data = jsonb_set(data, '{a,b}', '{"c": "new", "d": "data"}') WHERE id = 1;
+SELECT 'After updating a.b (parent of indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------------+---------+-----
+ After updating a.b (parent of indexed) | 3 | 0
+(1 row)
+
+-- ============================================================================
+-- Multiple JSONB mutation functions
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+CREATE INDEX hot_test_keep_idx ON hot_test((data->'keep'));
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"keep": "important", "remove": "unimportant", "extra": "data"}'::jsonb
+);
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------+---------+-----
+ Baseline | 0 | 0
+(1 row)
+
+-- jsonb_delete on non-indexed key - should NOT be HOT
+UPDATE hot_test SET data = data - 'remove' WHERE id = 1;
+SELECT 'After deleting non-indexed key' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+--------------------------------+---------+-----
+ After deleting non-indexed key | 1 | 0
+(1 row)
+
+-- jsonb_set on non-indexed key - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{extra}', '"modified"') WHERE id = 1;
+SELECT 'After modifying non-indexed key' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+---------------------------------+---------+-----
+ After modifying non-indexed key | 2 | 0
+(1 row)
+
+-- jsonb_delete on indexed key - should NOT be HOT
+UPDATE hot_test SET data = data - 'keep' WHERE id = 1;
+SELECT 'After deleting indexed key' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------+---------+-----
+ After deleting indexed key | 3 | 0
+(1 row)
+
+-- ============================================================================
+-- Array operations
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+-- Index on array element
+CREATE INDEX hot_test_tags_idx ON hot_test((data->'tags'->0));
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"tags": ["indexed", "second", "third"], "other": "data"}'::jsonb
+);
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------+---------+-----
+ Baseline | 0 | 0
+(1 row)
+
+-- Update non-indexed array element - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{tags,1}', '"modified"') WHERE id = 1;
+SELECT 'After updating tags[1]' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+------------------------+---------+-----
+ After updating tags[1] | 1 | 0
+(1 row)
+
+-- Update indexed array element - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{tags,0}', '"changed"') WHERE id = 1;
+SELECT 'After updating tags[0] (indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------+---------+-----
+ After updating tags[0] (indexed) | 2 | 0
+(1 row)
+
+-- ============================================================================
+-- Whole column index
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+-- Index on entire JSONB column, and a path extraction
+CREATE INDEX hot_test_whole_idx ON hot_test(data);
+CREATE INDEX hot_test_tags_idx ON hot_test((data->'a'));
+INSERT INTO hot_test VALUES (1, '{"a": 1, "b": 1}'::jsonb);
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------+---------+-----
+ Baseline | 0 | 0
+(1 row)
+
+-- Any modification to data - should NOT be HOT (whole column indexed)
+UPDATE hot_test SET data = jsonb_set(data, '{b}', '2') WHERE id = 1;
+SELECT 'After modifying any field (whole column indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+--------------------------------------------------+---------+-----
+ After modifying any field (whole column indexed) | 1 | 0
+(1 row)
+
+-- ============================================================================
+-- Performance at scale
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor=50);
+CREATE INDEX hot_test_status_idx ON hot_test((data->'status'));
+CREATE INDEX hot_test_priority_idx ON hot_test((data->'priority'));
+-- Insert 10000 rows
+INSERT INTO hot_test
+SELECT i, jsonb_build_object(
+ 'status', 'active',
+ 'priority', 1,
+ 'count', 0,
+ 'data', 'value_' || i
+)
+FROM generate_series(1, 10000) i;
+SELECT 'Baseline (10000 rows)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-----------------------+---------+-----
+ Baseline (10000 rows) | 0 | 0
+(1 row)
+
+-- Update non-indexed fields on all rows - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{count}', to_jsonb((data->>'count')::int + 1));
+SELECT 'After updating 10000 rows (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-----------------------------------------+---------+-----
+ After updating 10000 rows (non-indexed) | 10000 | 0
+(1 row)
+
+-- Verify correctness
+SELECT COUNT(*) AS rows_with_count_1 FROM hot_test WHERE (data->>'count')::int = 1;
+ rows_with_count_1
+-------------------
+ 10000
+(1 row)
+
+-- Update indexed field on subset - should NOT be HOT for those rows
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"')
+WHERE id <= 10;
+SELECT 'After updating 10 rows (indexed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------+---------+-----
+ After updating 10 rows (indexed) | 10010 | 0
+(1 row)
+
+-- Verify indexes work
+SELECT COUNT(*) FROM hot_test WHERE data->>'status' = 'inactive';
+ count
+-------
+ 10
+(1 row)
+
+SELECT COUNT(*) FROM hot_test WHERE data->>'status' = 'active';
+ count
+-------
+ 9990
+(1 row)
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) USING heap WITH(fillfactor = 50);
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT 'After updating ts, brin_col (summarizing-only)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+------------------------------------------------+---------+-----
+ After updating ts, brin_col (summarizing-only) | 1 | 1
+(1 row)
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+ id
+----
+ 1
+(1 row)
+
+-- TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ large_text text
+) USING heap WITH(fillfactor = 50);
+CREATE INDEX hot_test_idx ON hot_test(large_text);
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, repeat('x', 3000));
+-- Update TOASTed column - should NOT be HOT
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT 'After updating large_text (TOASTed)' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-------------------------------------+---------+-----
+ After updating large_text (TOASTed) | 1 | 0
+(1 row)
+
+-- Partitioned tables: HOT works within partitions
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200);
+CREATE INDEX hot_test_partitioned_idx ON hot_test_partitioned(indexed_col);
+CREATE INDEX hot_test_part2_data ON hot_test_part2(data);
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'UPDATED' WHERE id = 1;
+SELECT 'After updating partition 1 data' AS test, * FROM get_hot_count('hot_test_part1');
+ test | updates | hot
+---------------------------------+---------+-----
+ After updating partition 1 data | 1 | 1
+(1 row)
+
+-- Update in partition 2 (indexed column) - should NOT be HOT
+UPDATE hot_test_partitioned SET data = 'UPDATED' WHERE id = 2;
+SELECT 'After updating large_text (TOASTed)' AS test, * FROM get_hot_count('hot_test_part2');
+ test | updates | hot
+-------------------------------------+---------+-----
+ After updating large_text (TOASTed) | 1 | 0
+(1 row)
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+ id
+----
+ 2
+(1 row)
+
+-- Update indexed column in partition - should NOT be HOT
+-- Partition 1 previously had 1 update and 1 HOT update, this should
+-- change that to 2 updates and 1 HOT update.
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT 'After updating indexed_col' AS test, * FROM get_hot_count('hot_test_part1');
+ test | updates | hot
+----------------------------+---------+-----
+ After updating indexed_col | 2 | 1
+(1 row)
+
+-- ============================================================================
+-- Partial indexes with complex predicates on JSONB
+-- ============================================================================
+-- Test partial indexes with WHERE clauses on JSONB expressions.
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+-- Partial index: only index status when priority > 5
+CREATE INDEX hot_test_partial_idx ON hot_test((data->'status'))
+ WHERE (data->>'priority')::int > 5;
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"status": "active", "priority": 10, "count": 0}'::jsonb
+);
+INSERT INTO hot_test VALUES (
+ 2,
+ '{"status": "active", "priority": 3, "count": 0}'::jsonb
+);
+SELECT 'Partial Index Test: Baseline' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+------------------------------+---------+-----
+ Partial Index Test: Baseline | 0 | 0
+(1 row)
+
+-- Update non-indexed path on row inside predicate (priority=10 > 5)
+-- Should NOT be HOT despite {count} is not indexed
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 1;
+SELECT 'Partial Index Test: count update, inside predicate' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+----------------------------------------------------+---------+-----
+ Partial Index Test: count update, inside predicate | 1 | 0
+(1 row)
+
+-- Update non-indexed path on row outside predicate (priority=3 <= 5)
+-- Should NOT be HOT dispite {count} is not indexed
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 2;
+SELECT 'Partial Index Test: count update, outside predicate' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-----------------------------------------------------+---------+-----
+ Partial Index Test: count update, outside predicate | 2 | 0
+(1 row)
+
+-- Update indexed path on row inside predicate (priority=10 > 5)
+-- Should NOT be HOT indexed portion is updated
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 1;
+SELECT 'Partial Index Test: status update, inside predicate' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+-----------------------------------------------------+---------+-----
+ Partial Index Test: status update, inside predicate | 3 | 0
+(1 row)
+
+-- Update indexed path on row outside predicate (priority=3 <= 5)
+-- PostgreSQL makes a conservative choice and treats it as non-HOT because the
+-- indexed column changed, even though the before/after rows are outside the predicate
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 2;
+SELECT 'Partial Index Test: status update, outside predicate' AS test, * FROM get_hot_count('hot_test');
+ test | updates | hot
+------------------------------------------------------+---------+-----
+ Partial Index Test: status update, outside predicate | 4 | 0
+(1 row)
+
+-- Verify index works
+SELECT id FROM hot_test WHERE data->'status' = '"inactive"'::jsonb AND (data->>'priority')::int > 5;
+ id
+----
+ 1
+(1 row)
+
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 549e9b2d7be..e06247ef7ea 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,6 +137,11 @@ test: event_trigger_login
# this test also uses event triggers, so likewise run it by itself
test: fast_default
+# ----------
+# HOT updates tests
+# ----------
+test: hot_updates
+
# run tablespace test at the end because it drops the tablespace created during
# setup that other tests may use.
test: tablespace
diff --git a/src/test/regress/sql/hot_updates.sql b/src/test/regress/sql/hot_updates.sql
new file mode 100644
index 00000000000..34da4552d4f
--- /dev/null
+++ b/src/test/regress/sql/hot_updates.sql
@@ -0,0 +1,692 @@
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Basic HOT update functionality
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) USING heap WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+
+-- Dump the HOT chain for tuple with id == 1
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Trigger optimistic heap page pruning
+SELECT ctid, * FROM hot_test;
+
+-- Dump the HOT chain after prune
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+SET SESSION enable_seqscan = OFF;
+SET SESSION enable_bitmapscan = OFF;
+
+-- Verify indexes still work
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 100;
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index was updated (new value findable)
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+SELECT id FROM hot_test WHERE indexed_col = 100;
+
+SET SESSION enable_seqscan = ON;
+SET SESSION enable_bitmapscan = ON;
+
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) USING heap WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify all three indexes still work correctly
+SELECT id, col_a FROM hot_test WHERE col_a = 15; -- updated index
+SELECT id, col_b FROM hot_test WHERE col_b = 20; -- unchanged index
+SELECT id, col_c FROM hot_test WHERE col_c = 30; -- unchanged index
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify all indexes still work
+SELECT id FROM hot_test WHERE col_a = 15 AND col_b = 20 AND col_c = 30;
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ col_d int
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+CREATE INDEX hot_test_ab_inc_c_idx ON hot_test(col_a, col_b) INCLUDE(col_c);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 40);
+
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update col_c (not indexed, but included) - should NOT be HOT
+UPDATE hot_test SET col_c = 35;
+
+-- Verify multi-column index-only scan for included columns works
+EXPLAIN (COSTS OFF) SELECT col_c FROM hot_test WHERE col_a = 15 AND col_b = 25;
+SELECT col_c FROM hot_test WHERE col_a = 15 AND col_b = 25;
+
+-- ============================================================================
+-- Expression indexes with JSONB
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+-- Indexes on specific JSONB paths
+CREATE INDEX hot_test_status_idx ON hot_test((data->'status'));
+CREATE INDEX hot_test_user_id_idx ON hot_test((data->'user'->'id'));
+
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"status": "active", "user": {"id": 123, "name": "Alice"}, "count": 0}'::jsonb
+);
+
+-- Baseline
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- Update non-indexed path {count} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 1;
+SELECT 'After updating count (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update different non-indexed path {user,name} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{user,name}', '"Bob"') WHERE id = 1;
+SELECT 'After updating user.name (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update indexed path {status} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 1;
+SELECT 'After updating status (indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update indexed path {user,id} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{user,id}', '456') WHERE id = 1;
+SELECT 'After updating user.id (indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Verify indexes still work correctly
+SELECT id FROM hot_test WHERE data->'status' = '"inactive"'::jsonb;
+SELECT id FROM hot_test WHERE data->'user'->'id' = '456'::jsonb;
+
+-- ============================================================================
+-- Nested paths and path intersection
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+CREATE INDEX hot_test_deep_idx ON hot_test((data->'a'->'b'->'c'));
+
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"a": {"b": {"c": "indexed", "d": "not-indexed"}}, "x": "other"}'::jsonb
+);
+
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- Update sibling of indexed path {a,b,d} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{a,b,d}', '"updated"') WHERE id = 1;
+SELECT 'After updating a.b.d (sibling, non-indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update unrelated path {x} - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{x}', '"modified"') WHERE id = 1;
+SELECT 'After updating x (unrelated path)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update parent of indexed path {a,b} - should NOT be HOT (affects child)
+UPDATE hot_test SET data = jsonb_set(data, '{a,b}', '{"c": "new", "d": "data"}') WHERE id = 1;
+SELECT 'After updating a.b (parent of indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- ============================================================================
+-- Multiple JSONB mutation functions
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+CREATE INDEX hot_test_keep_idx ON hot_test((data->'keep'));
+
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"keep": "important", "remove": "unimportant", "extra": "data"}'::jsonb
+);
+
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- jsonb_delete on non-indexed key - should NOT be HOT
+UPDATE hot_test SET data = data - 'remove' WHERE id = 1;
+SELECT 'After deleting non-indexed key' AS test, * FROM get_hot_count('hot_test');
+
+-- jsonb_set on non-indexed key - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{extra}', '"modified"') WHERE id = 1;
+SELECT 'After modifying non-indexed key' AS test, * FROM get_hot_count('hot_test');
+
+-- jsonb_delete on indexed key - should NOT be HOT
+UPDATE hot_test SET data = data - 'keep' WHERE id = 1;
+SELECT 'After deleting indexed key' AS test, * FROM get_hot_count('hot_test');
+
+-- ============================================================================
+-- Array operations
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+-- Index on array element
+CREATE INDEX hot_test_tags_idx ON hot_test((data->'tags'->0));
+
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"tags": ["indexed", "second", "third"], "other": "data"}'::jsonb
+);
+
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- Update non-indexed array element - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{tags,1}', '"modified"') WHERE id = 1;
+SELECT 'After updating tags[1]' AS test, * FROM get_hot_count('hot_test');
+
+-- Update indexed array element - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{tags,0}', '"changed"') WHERE id = 1;
+SELECT 'After updating tags[0] (indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- ============================================================================
+-- Whole column index
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+-- Index on entire JSONB column, and a path extraction
+CREATE INDEX hot_test_whole_idx ON hot_test(data);
+CREATE INDEX hot_test_tags_idx ON hot_test((data->'a'));
+
+INSERT INTO hot_test VALUES (1, '{"a": 1, "b": 1}'::jsonb);
+
+SELECT 'Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- Any modification to data - should NOT be HOT (whole column indexed)
+UPDATE hot_test SET data = jsonb_set(data, '{b}', '2') WHERE id = 1;
+SELECT 'After modifying any field (whole column indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- ============================================================================
+-- Performance at scale
+-- ============================================================================
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor=50);
+
+CREATE INDEX hot_test_status_idx ON hot_test((data->'status'));
+CREATE INDEX hot_test_priority_idx ON hot_test((data->'priority'));
+
+-- Insert 10000 rows
+INSERT INTO hot_test
+SELECT i, jsonb_build_object(
+ 'status', 'active',
+ 'priority', 1,
+ 'count', 0,
+ 'data', 'value_' || i
+)
+FROM generate_series(1, 10000) i;
+
+SELECT 'Baseline (10000 rows)' AS test, * FROM get_hot_count('hot_test');
+
+-- Update non-indexed fields on all rows - should NOT be HOT
+UPDATE hot_test SET data = jsonb_set(data, '{count}', to_jsonb((data->>'count')::int + 1));
+
+SELECT 'After updating 10000 rows (non-indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Verify correctness
+SELECT COUNT(*) AS rows_with_count_1 FROM hot_test WHERE (data->>'count')::int = 1;
+
+-- Update indexed field on subset - should NOT be HOT for those rows
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"')
+WHERE id <= 10;
+
+SELECT 'After updating 10 rows (indexed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Verify indexes work
+SELECT COUNT(*) FROM hot_test WHERE data->>'status' = 'inactive';
+SELECT COUNT(*) FROM hot_test WHERE data->>'status' = 'active';
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) USING heap WITH(fillfactor = 50);
+
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT 'After updating ts, brin_col (summarizing-only)' AS test, * FROM get_hot_count('hot_test');
+
+-- Verify BRIN indexes work
+SELECT id FROM hot_test WHERE ts >= '2024-01-02';
+SELECT id FROM hot_test WHERE brin_col >= 2000;
+
+-- TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ large_text text
+) USING heap WITH(fillfactor = 50);
+
+CREATE INDEX hot_test_idx ON hot_test(large_text);
+
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, repeat('x', 3000));
+
+-- Update TOASTed column - should NOT be HOT
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT 'After updating large_text (TOASTed)' AS test, * FROM get_hot_count('hot_test');
+
+-- Partitioned tables: HOT works within partitions
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200);
+
+CREATE INDEX hot_test_partitioned_idx ON hot_test_partitioned(indexed_col);
+CREATE INDEX hot_test_part2_data ON hot_test_part2(data);
+
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'UPDATED' WHERE id = 1;
+SELECT 'After updating partition 1 data' AS test, * FROM get_hot_count('hot_test_part1');
+
+-- Update in partition 2 (indexed column) - should NOT be HOT
+UPDATE hot_test_partitioned SET data = 'UPDATED' WHERE id = 2;
+SELECT 'After updating large_text (TOASTed)' AS test, * FROM get_hot_count('hot_test_part2');
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+
+-- Update indexed column in partition - should NOT be HOT
+-- Partition 1 previously had 1 update and 1 HOT update, this should
+-- change that to 2 updates and 1 HOT update.
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT 'After updating indexed_col' AS test, * FROM get_hot_count('hot_test_part1');
+
+-- ============================================================================
+-- Partial indexes with complex predicates on JSONB
+-- ============================================================================
+-- Test partial indexes with WHERE clauses on JSONB expressions.
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ data jsonb
+) USING heap WITH(fillfactor = 50);
+
+-- Partial index: only index status when priority > 5
+CREATE INDEX hot_test_partial_idx ON hot_test((data->'status'))
+ WHERE (data->>'priority')::int > 5;
+
+INSERT INTO hot_test VALUES (
+ 1,
+ '{"status": "active", "priority": 10, "count": 0}'::jsonb
+);
+INSERT INTO hot_test VALUES (
+ 2,
+ '{"status": "active", "priority": 3, "count": 0}'::jsonb
+);
+
+SELECT 'Partial Index Test: Baseline' AS test, * FROM get_hot_count('hot_test');
+
+-- Update non-indexed path on row inside predicate (priority=10 > 5)
+-- Should NOT be HOT despite {count} is not indexed
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 1;
+SELECT 'Partial Index Test: count update, inside predicate' AS test, * FROM get_hot_count('hot_test');
+
+-- Update non-indexed path on row outside predicate (priority=3 <= 5)
+-- Should NOT be HOT dispite {count} is not indexed
+UPDATE hot_test SET data = jsonb_set(data, '{count}', '1') WHERE id = 2;
+SELECT 'Partial Index Test: count update, outside predicate' AS test, * FROM get_hot_count('hot_test');
+
+-- Update indexed path on row inside predicate (priority=10 > 5)
+-- Should NOT be HOT indexed portion is updated
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 1;
+SELECT 'Partial Index Test: status update, inside predicate' AS test, * FROM get_hot_count('hot_test');
+
+-- Update indexed path on row outside predicate (priority=3 <= 5)
+-- PostgreSQL makes a conservative choice and treats it as non-HOT because the
+-- indexed column changed, even though the before/after rows are outside the predicate
+UPDATE hot_test SET data = jsonb_set(data, '{status}', '"inactive"') WHERE id = 2;
+SELECT 'Partial Index Test: status update, outside predicate' AS test, * FROM get_hot_count('hot_test');
+
+-- Verify index works
+SELECT id FROM hot_test WHERE data->'status' = '"inactive"'::jsonb AND (data->>'priority')::int > 5;
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
--
2.51.2
[text/x-patch] v35-0002-Identify-and-track-columns-modified-by-heap_modi.patch (7.0K, 3-v35-0002-Identify-and-track-columns-modified-by-heap_modi.patch)
download | inline diff:
From ff260840eadfd1cc41528fc503435c04be421083 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 08:17:31 -0400
Subject: [PATCH v35 2/3] Identify and track columns modified by
heap_modifiy_tuple() on update
ExecGetAllUpdatedCols() misses attributes modified using
heap_modify_tuple() that are not explictly SET in the UPDATE or by
triggers. This happens in one test (tsearch.sql) when the
tsvector_update_trigger() is invoked and modifies an indexed attribute
that isn't referenced in any SQL.
The net is that the functions like HeapDetermineColumnsInfo() have to
scan all indexed attributes for changes rather than being able to first
reduce the indexed set by intersecting it with the set of attributes
known to be potentially updated.
While this isn't so bad, it is an oversight should someone in the future
build some security related feature using that incomplete result. It
also might save a fraction of overhead calculating modified index
attributes in heap_update().
This commit adds to ExecBRUpdateTriggers() code that identify changes to
indexed columns not found by ExecGetAllUpdatedCols() and adds those
attributes to ri_extraUpdatedCols.
This commit introduces ExecCompareSlotAttrs() as a utility function to
identify those attributes that have changed. It compares a subset of
attributes between two TupleTableSlots and returns a Bitmapset of
attributes that differ.
It would be nice to integrate this into HeapDetermineColumnsInfo(),
however it would be a layering violation given that it is within
heap_update().
---
src/backend/commands/trigger.c | 20 +++++++-
src/backend/executor/execTuples.c | 78 +++++++++++++++++++++++++++++++
src/include/executor/executor.h | 5 ++
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 98d402c0a3b..bbe077a9ca9 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2978,6 +2978,7 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool is_merge_update)
{
TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+ TupleDesc tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
TupleTableSlot *oldslot = ExecGetTriggerOldSlot(estate, relinfo);
HeapTuple newtuple = NULL;
HeapTuple trigtuple;
@@ -2985,7 +2986,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
bool should_free_new = false;
TriggerData LocTriggerData = {0};
int i;
- Bitmapset *updatedCols;
+ Bitmapset *updatedCols = NULL;
+ Bitmapset *remainingCols = NULL;
+ Bitmapset *modifiedCols;
LockTupleMode lockmode;
/* Determine lock mode to use */
@@ -3127,6 +3130,21 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
if (should_free_trig)
heap_freetuple(trigtuple);
+ /*
+ * Before UPDATE triggers may have updated attributes not known to
+ * ExecGetAllUpdatedColumns() using heap_modify_tuple() or
+ * heap_modifiy_tuple_by_cols(). Find and record those now.
+ */
+ remainingCols = bms_add_range(NULL, 1 - FirstLowInvalidHeapAttributeNumber,
+ tupdesc->natts - FirstLowInvalidHeapAttributeNumber);
+ remainingCols = bms_del_members(remainingCols, updatedCols);
+ modifiedCols = ExecCompareSlotAttrs(tupdesc, remainingCols, oldslot, newslot);
+ relinfo->ri_extraUpdatedCols =
+ bms_add_members(relinfo->ri_extraUpdatedCols, modifiedCols);
+
+ bms_free(remainingCols);
+ bms_free(modifiedCols);
+
return true;
}
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b768eae9e53..1064ebe845b 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1929,6 +1930,83 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(TupleDesc tupdesc, const Bitmapset *attrs,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+ Bitmapset *modified = NULL;
+
+ /* XXX what if slots don't share the same tupleDescriptor... */
+ /* Assert(s1->tts_tupleDescriptor == s2->tts_tupleDescriptor); */
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum != TableOidAttributeNumber)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ {
+ modified = bms_add_member(modified, attidx);
+ continue;
+ }
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ continue;
+
+ if (!datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ modified = bms_add_member(modified, attidx);
+ }
+
+ return modified;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..5dcfaa2027f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -17,6 +17,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -606,6 +607,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(TupleDesc tupdesc,
+ const Bitmapset *attrs,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
--
2.51.2
[text/x-patch] v35-0003-Identify-modified-indexed-attributes-in-the-exec.patch (54.4K, 4-v35-0003-Identify-modified-indexed-attributes-in-the-exec.patch)
download | inline diff:
From 0208756d9666cb3b30b5b85a443a4df65463cb38 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 08:18:23 -0400
Subject: [PATCH v35 3/3] Identify modified indexed attributes in the executor
on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes"
(modified_idx_attrs, formerly known as modified_attrs).
ExecUpdateModifiedIdxAttrs() replaces HeapDetermineColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the buffer lock is held later within
heapam_tuple_update() and heap_update().
ExecUpdateModifiedIdxAttrs() uses the previously-introduced
ExecCompareSlotAttrs() function to identify which attributes have
changed and then intersects that with the set of indexed attributes to
identify the modified indexed set, the modified_idx_attrs.
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also responsible for part of the logic
involved in the decision about what to WAL log for the replica identity
key. This logic moved into heap_update() and out of the replacement
named HeapUpdateModifiedIdxAttrs(). Doing this allows for
simple_heap_update() and heapam_tuple_update() to share the same logic
as they both call into heap_update().
Updates stemming from logical replication also use the new
ExecUpdateModifiedIdxAttrs() in ExecSimpleRelationUpdate().
This patch introduces a few helper functions to reduce code duplication
and increase readability: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a boolean indicating if the update allows HOT updates or not, both
const. If during heap_update() the new tuple will fit on the same page
and that boolean is true, the update is HOT. This means that although
the functions and timing of the code involed in HOT decisions have
changed, none of the logic related to when HOT is allowed has changed.
Development of this feature exposed nondeterministic behavior in three
existing tests which have been adjusted to avoid inconsistent test
results due to tuple ordering during heap page scans.
---
src/backend/access/heap/heapam.c | 478 +++++++++++-------
src/backend/access/heap/heapam_handler.c | 32 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/executor/execReplication.c | 9 +-
src/backend/executor/nodeModifyTable.c | 93 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 4 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/triggers.out | 16 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/triggers.sql | 4 +-
src/test/regress/sql/updatable_views.sql | 2 +-
17 files changed, 492 insertions(+), 228 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ecc8330851..997dc9642d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,14 +37,20 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "optimizer/cost.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
#include "storage/proc.h"
@@ -52,6 +58,7 @@
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -68,11 +75,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModifiedIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3302,7 +3306,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3312,17 +3316,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *rid_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3341,13 +3341,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3378,33 +3377,14 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well
+ * as the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * NOTE: relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin", NULL);
@@ -3458,20 +3438,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3482,16 +3459,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicaIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(modified_idx_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, modified_idx_attrs);
+ attrs = bms_int_members(attrs, rid_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3504,9 +3524,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3523,7 +3542,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3534,7 +3553,6 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* with the new tuple's location, so there's great risk of confusion if we
* use otid anymore.
*/
-
l2:
checked_lockers = false;
locker_remains = false;
@@ -3602,7 +3620,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3611,7 +3629,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3696,7 +3714,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3756,17 +3774,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3796,7 +3811,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3913,7 +3928,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4073,37 +4088,19 @@ l2:
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
if (newbuf == buffer)
{
- /*
- * Since the new tuple is going into the same page, we might be able
- * to do a HOT update. Check if any of the index columns have been
- * changed.
- */
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
+ if (hot_allowed)
use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
}
else
- {
/* Set a hint that the old page could use prune/defrag */
PageSetFull(page);
- }
/*
* Compute replica identity tuple before entering the critical section so
@@ -4113,8 +4110,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4243,7 +4239,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4257,31 +4253,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4454,28 +4431,113 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(modified_idx_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes? If that's the case we need to update those
+ * indexes, but this can be a HOT update.
+ */
+ if (bms_is_subset(modified_idx_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know that one or more indexed attribute were updated and
+ * that there was at least one of those attributes were referenced
+ * by a non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any "key" attributes, we can grab a weaker lock type.
+ * This allows for more concurrency when we are running simultaneously with
+ * foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *modified_idx_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(modified_idx_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModifiedIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *modified_idx_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "modified_idx_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4491,7 +4553,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
@@ -4504,7 +4566,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
}
@@ -4520,29 +4582,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return modified_idx_attrs;
}
/*
@@ -4554,17 +4599,109 @@ HeapDetermineColumnsInfo(Relation relation,
* via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *modified_idx_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("modified_idx_attrs") so as to see if a HOT update is
+ * allowable or not. When updating heap tuples via execution of UPDATE
+ * statements this set is constructed before calling into the table AM's
+ * tuple_update() function by the function ExecUpdateModifiedIdxAttrs()
+ * which compares the old/new TupleTableSlots. However, here we have the
+ * old TID and the new tuple, not two TupleTableSlots, but we still need
+ * to construct a similar bitmap so as to be able to know if HOT updates
+ * are allowed or not. To do that we first have to fetch the old tuple
+ * itself. Because heapam_fetch_row_version() is static, we have to
+ * replicate that code here. This is a bit repetitive because
+ * heap_update() will again find and form the old HeapTuple from the old
+ * TID and in most cases the callers (ignoring extensions, always catalog
+ * tuple updates) already had the set of changed attributes (e.g. the
+ * "replaces" array), but for now this minor repetition of work is
+ * necessary.
+ */
+
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* modified_idx_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ modified_idx_attrs = HeapUpdateModifiedIdxAttrs(relation, oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, modified_idx_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4574,6 +4711,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
@@ -4590,7 +4731,6 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
}
}
-
/*
* Return the MultiXactStatus corresponding to the given tuple lock mode.
*/
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 3ff36f59bf8..bbdb732c001 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -44,6 +43,7 @@
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/builtins.h"
+#include "utils/injection_point.h"
#include "utils/rel.h"
static void reform_and_rewrite_tuple(HeapTuple tuple,
@@ -316,19 +316,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, modified_idx_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -341,16 +348,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..9ba72d51dfa 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ modified_idx_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..74a7379186b 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *modified_idx_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ modified_idx_attrs, &update_indexes);
+ bms_free(modified_idx_attrs);
+
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 327c27abff9..cca834a7359 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModifiedIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -54,6 +55,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -188,6 +190,68 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModifiedIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "modified_idx_attrs". An overlap of a single index's attributes and this "mix" set
+ * signals that the attributes in the new_tts used to form the index datum have
+ * changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * NOTE: There is a similar function called HeapUpdateModifiedIdxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModifiedIdxAttrs(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs,
+ *modified_idx_attrs = NULL;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * Fetch the set of attributes explicity SET in the UPDATE statement or
+ * set by a before row trigger (even if not mentioned in the SQL) from the
+ * executor state and then find the intersection with the indexed
+ * attributes. Attributes that are SET might not change value, so we have
+ * to examine them for changes.
+ */
+ attrs = bms_int_members(attrs, ExecGetAllUpdatedCols(resultRelInfo, estate));
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the "modified indexed
+ * attributes" or "modified_idx_attrs".
+ */
+ if (!bms_is_empty(attrs))
+ modified_idx_attrs = ExecCompareSlotAttrs(tupdesc, attrs, old_tts, new_tts);
+
+ bms_free(attrs);
+
+ return modified_idx_attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2195,14 +2259,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *modified_idx_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2319,7 +2386,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2333,6 +2409,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ modified_idx_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2555,8 +2632,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3406,8 +3483,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4544,7 +4621,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index a1c88c6b1b6..4303108565f 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2475,8 +2475,8 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
bms_free(relation->rd_summarizedattr);
+ bms_free(relation->rd_indexedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
if (relation->rd_options)
@@ -5276,8 +5276,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5300,8 +5300,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5320,10 +5320,10 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
default:
elog(ERROR, "unknown attrKind %u", attrKind);
}
@@ -5366,8 +5366,8 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
summarizedattrs = NULL;
+ indexedattrs = NULL;
foreach(l, indexoidlist)
{
Oid indexOid = lfirst_oid(l);
@@ -5426,7 +5426,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5435,9 +5435,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5498,12 +5498,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
bms_free(summarizedattrs);
+ bms_free(indexedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5512,10 +5520,10 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
/*
* Now save copies of the bitmaps in the relcache entry. We intentionally
@@ -5528,8 +5536,8 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5542,10 +5550,10 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
default:
elog(ERROR, "unknown attrKind %u", attrKind);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 24a27cc043a..909b4fad7c2 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -366,10 +366,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -431,6 +430,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *modified_idx_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8ec20dcfc11 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1523,12 +1524,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ modified_idx_attrs, update_indexes);
}
/*
@@ -2009,6 +2010,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 5dcfaa2027f..24ec43c35a9 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -808,5 +808,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModifiedIdxAttrs(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..10e5e9044ee 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,8 +162,8 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..57b46ee54e5 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,8 +69,8 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
INDEX_ATTR_BITMAP_SUMMARIZED,
+ INDEX_ATTR_BITMAP_INDEXED,
} IndexAttrBitmapKind;
extern Bitmapset *RelationGetIndexAttrBitmap(Relation relation,
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 98dee63b50a..ef98fd0cccf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -959,16 +959,24 @@ NOTICE: main_view BEFORE UPDATE STATEMENT (before_view_upd_stmt)
NOTICE: main_view AFTER UPDATE STATEMENT (after_view_upd_stmt)
UPDATE 0
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
-NOTICE: OLD: (21,10)
-NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (20,31)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
+NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
+NOTICE: OLD: (21,10)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 32;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (21,32)
NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
-DELETE 3
+DELETE 1
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index ea39817ee3d..6ceb61608ae 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -660,7 +660,9 @@ UPDATE main_view SET b = 32 WHERE a = 21 AND b = 31 RETURNING a, b;
UPDATE main_view SET b = 0 WHERE false;
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+DELETE FROM main_view WHERE a = 21 AND b = 32;
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
\set QUIET true
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-12 20:33 ` Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Bossart @ 2026-03-12 20:33 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: pgsql-hackers; Jeff Davis <[email protected]>
On Wed, Mar 11, 2026 at 11:51:03AM -0400, Greg Burd wrote:
> 0002 - This patch plugs a hole (bug?) in ExecGetAllUpdatedCols() which is
> triggered by an existing test in tsearch.sql and the
> tsvector_update_trigger(). That trigger uses heap_modify_tuple() to
> change an indexed attribute that is not discovered by
> ExecGetAllUpdatedCols(), which seems odd to me at best and at worst wrong
> (or even a potential security issue). This patch finds and adds columns
> that are updated into the Bitmapset returned by ExecGetAllUpdatedCols().
> The patch includes a helper function ExecCompareSlotAttrs() that will be
> used in follow-on patches as well.
I just looked at this one for now.
> The net is that the functions like HeapDetermineColumnsInfo() have to
> scan all indexed attributes for changes rather than being able to first
> reduce the indexed set by intersecting it with the set of attributes
> known to be potentially updated.
I noticed the patch doesn't update HeapDetermineColumnsInfo() accordingly.
Is that intended?
> This commit introduces ExecCompareSlotAttrs() as a utility function to
> identify those attributes that have changed. It compares a subset of
> attributes between two TupleTableSlots and returns a Bitmapset of
> attributes that differ.
Hm. Most of this new function looks duplicated from
HeapDetermineColumnsInfo(), so IIUC this commit effectively adds another
scan through all the attributes. Does this produce noticeably more
overhead?
> It would be nice to integrate this into HeapDetermineColumnsInfo(),
> however it would be a layering violation given that it is within
> heap_update().
It'd be good to understand whether the current behavior is intentional or
just a happy accident. I found commit 2fd8685e7f, which looks like it was
intended as a prerequisite for the WARM feature (which I don't think was
ever committed). And it seems to have scanned through all indexed columns
when HOT was first introduced in commit 282d2a03dd.
I'm also curious whether anything else could modify columns that won't be
discovered by ExecGetAllUpdatedCols(). Having HeapDetermineColumnsInfo()
scan everything seems like a defense against such things, which is perhaps
why you've left it unchanged in the patch. I haven't looked into 0003 yet.
Is 0002 a prerequisite for that or a separate fix?
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
@ 2026-03-12 21:31 ` Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-03-12 21:31 UTC (permalink / raw)
To: Nathan Bossart <[email protected]>; +Cc: pgsql-hackers; Jeff Davis <[email protected]>
On Thu, Mar 12, 2026, at 4:33 PM, Nathan Bossart wrote:
> On Wed, Mar 11, 2026 at 11:51:03AM -0400, Greg Burd wrote:
>> 0002 - This patch plugs a hole (bug?) in ExecGetAllUpdatedCols() which is
>> triggered by an existing test in tsearch.sql and the
>> tsvector_update_trigger(). That trigger uses heap_modify_tuple() to
>> change an indexed attribute that is not discovered by
>> ExecGetAllUpdatedCols(), which seems odd to me at best and at worst wrong
>> (or even a potential security issue). This patch finds and adds columns
>> that are updated into the Bitmapset returned by ExecGetAllUpdatedCols().
>> The patch includes a helper function ExecCompareSlotAttrs() that will be
>> used in follow-on patches as well.
>
> I just looked at this one for now.
Hey Nathan!
Thanks for taking the time to review 0002.
>> The net is that the functions like HeapDetermineColumnsInfo() have to
>> scan all indexed attributes for changes rather than being able to first
>> reduce the indexed set by intersecting it with the set of attributes
>> known to be potentially updated.
>
> I noticed the patch doesn't update HeapDetermineColumnsInfo() accordingly.
> Is that intended?
Yes, that is intended. The 0002 patch is bug fix that I'd hidden along with what is now 0003, I pulled it out for clarity and to discuss independent of the other changes.
>> This commit introduces ExecCompareSlotAttrs() as a utility function to
>> identify those attributes that have changed. It compares a subset of
>> attributes between two TupleTableSlots and returns a Bitmapset of
>> attributes that differ.
>
> Hm. Most of this new function looks duplicated from
> HeapDetermineColumnsInfo(), so IIUC this commit effectively adds another
> scan through all the attributes. Does this produce noticeably more
> overhead?
Yes, it appears similar to that for a reason but it differs in one key way. It compares TupleTableSlots, not HeapTuples.
The commit doesn't add another scan, the new code only scans the attributes that ExecGetAllUpdatedCols() didn't pick up earlier and have cached for us at this point. The intersection between that set and what is indexed is almost always the NULL set because most UPDATEs don't invoke functions via triggers that modify indexed columns using heap_modify_tuple() directly. But, notably there is the case in tsearch.sql that does.
This introduces almost no net new overhead and when it does in fact do some work it's doing no more than what was done before in HeapDetermineColumnsInfo().
>> It would be nice to integrate this into HeapDetermineColumnsInfo(),
>> however it would be a layering violation given that it is within
>> heap_update().
>
> It'd be good to understand whether the current behavior is intentional or
> just a happy accident. I found commit 2fd8685e7f, which looks like it was
> intended as a prerequisite for the WARM feature (which I don't think was
> ever committed). And it seems to have scanned through all indexed columns
> when HOT was first introduced in commit 282d2a03dd.
Hard to tell if it was accidental or intentional, more digging required, but I'd bet that others poking in this area noticed the test failure and didn't connect the dots fully and just assumed best practice was to scan all indexed columns, even ones that could not have been updated at all.
Honestly, if we wrote this section from scratch again today I'm better it'd be closer to where my patch takes us than not.
> I'm also curious whether anything else could modify columns that won't be
> discovered by ExecGetAllUpdatedCols(). Having HeapDetermineColumnsInfo()
> scan everything seems like a defense against such things, which is perhaps
> why you've left it unchanged in the patch. I haven't looked into 0003 yet.
> Is 0002 a prerequisite for that or a separate fix?
Other than the heap_modify_tuple() calls I don't know of something that allows for direct changes but that doesn't matter, 0002 will scan and pick up those attributes even if we introduce a new modification path in the future (as intended).
HeapDetermineColumnsInfo() can't call ExecGetAllUpdatedCols() because that function needs resultRelInfo/EState both not available inside heap (table AM) calls. Also, the new helper compares TTS, not HeapTuples, which is what we have in heapam_tuple_update(), so not an option
0002 is a both a bug fix (IMO) and a pre-req for 0003 because in the next patch we use the new ExecCompareSlotAttrs() function from within the executor ahead of calling into ExecUpdate().
> --
> nathan
Thanks for your time and comments, let me know if you have more. :)
best.
-greg
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-15 21:11 ` Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Jeff Davis @ 2026-03-15 21:11 UTC (permalink / raw)
To: Greg Burd <[email protected]>; Nathan Bossart <[email protected]>; +Cc: pgsql-hackers
On Thu, 2026-03-12 at 17:31 -0400, Greg Burd wrote:
> Other than the heap_modify_tuple() calls I don't know of something
> that allows for direct changes but that doesn't matter, 0002 will
> scan and pick up those attributes even if we introduce a new
> modification path in the future (as intended).
Why do extra work in ExecBRUpdateTriggers() to eliminate the false
negative case if we don't rely on it anyway? If we do need to rely on
it in subsequent patches, then we need to be sure, right?
I guess I'm confused about whether 0002 is introducing a new guarantee
or if it's just a convenient place to eliminate one source of false
negatives.
Regards,
Jeff Davis
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-03-16 16:23 ` Greg Burd <[email protected]>
2026-03-16 17:29 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
0 siblings, 2 replies; 24+ messages in thread
From: Greg Burd @ 2026-03-16 16:23 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; Nathan Bossart <[email protected]>; +Cc: pgsql-hackers
On Sun, Mar 15, 2026, at 5:11 PM, Jeff Davis wrote:
> On Thu, 2026-03-12 at 17:31 -0400, Greg Burd wrote:
>> Other than the heap_modify_tuple() calls I don't know of something
>> that allows for direct changes but that doesn't matter, 0002 will
>> scan and pick up those attributes even if we introduce a new
>> modification path in the future (as intended).
Hello Jeff, thanks for taking a look! :)
> Why do extra work in ExecBRUpdateTriggers() to eliminate the false
> negative case if we don't rely on it anyway? If we do need to rely on
> it in subsequent patches, then we need to be sure, right?
Later commits do currently rely on it, ExecUpdateModifiedIdxAttrs() uses it in the next commit (0003) to avoid reviewing indexed attributes that could not have possibly changed. Imagine a table with a lot of indexes where updates only modify one or two at a time. Why are we testing indexed attributes for changes in HeapDeterminColumnsInfo() that couldn't have changed? The answer is that a) HeapDeterminColumnsInfo() lives in heap, not the executor (see patch 0003) so it has no ability to call ExecGetAllUpdatedCols(), and b) the set returned by ExecGetAllUpdatedCols() is sometimes incomplete.
I see (a) as something I fix in patch 0003 and (b) as an oversight (or bug). I'll also argue that the overhead of checking for additional attributes in ExecBRUpdateTriggers() vs the overhead of checking all indexed attributes in HeapDeterminColumnsInfo() is net zero once patch 0003 is applied.
The argument to keep 0002 is both performance as much as correctness. After 0002 and 0003 ExecUpdateModifiedIdxAttrs() replaces HeapDeterminColumnsInfo() and doesn't have to scan all indexed attributes anymore. Relations with lots of indexed attributes but update patterns that only focus on subsets of those attributes will benefit as there will be fewer memcmp() calls when comparing datums.
What do we "need to be sure" of? That ExecGetAllUpdatedCols() not really contains all attributes that its name implies? I think it now does that after 0002, do you disagree?
> I guess I'm confused about whether 0002 is introducing a new guarantee
> or if it's just a convenient place to eliminate one source of false
> negatives.
I think it is a new guarantee that was implied before now but not required until 0003. I think this change reduces overhead and helps to avoid some future security feature that depends on ExecGetAllUpdatedCols() to provide that guarantee.
Does that make sense?
> Regards,
> Jeff Davis
best.
-greg
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-16 17:29 ` Nathan Bossart <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Nathan Bossart @ 2026-03-16 17:29 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
On Mon, Mar 16, 2026 at 12:23:04PM -0400, Greg Burd wrote:
> On Sun, Mar 15, 2026, at 5:11 PM, Jeff Davis wrote:
>> Why do extra work in ExecBRUpdateTriggers() to eliminate the false
>> negative case if we don't rely on it anyway? If we do need to rely on
>> it in subsequent patches, then we need to be sure, right?
>
> [...]
>
> What do we "need to be sure" of? That ExecGetAllUpdatedCols() not really
> contains all attributes that its name implies? I think it now does that
> after 0002, do you disagree?
I'm admittedly still digging into the details, but the main question on my
mind is whether there are other cases lurking that our in-tree tests aren't
catching or that only exist in extensions. Will there be some sort of
check or assertion to catch those?
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-16 17:55 ` Jeff Davis <[email protected]>
2026-03-17 15:22 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
1 sibling, 2 replies; 24+ messages in thread
From: Jeff Davis @ 2026-03-16 17:55 UTC (permalink / raw)
To: Greg Burd <[email protected]>; Nathan Bossart <[email protected]>; +Cc: pgsql-hackers
On Mon, 2026-03-16 at 12:23 -0400, Greg Burd wrote:
> Hello Jeff, thanks for taking a look! :)
Hi, thank you for working on this problem!
> > Why do extra work in ExecBRUpdateTriggers() to eliminate the false
> > negative case if we don't rely on it anyway? If we do need to rely
> > on
> > it in subsequent patches, then we need to be sure, right?
>
> Later commits do currently rely on it, ExecUpdateModifiedIdxAttrs()
> uses it in the next commit (0003) to avoid reviewing indexed
> attributes that could not have possibly changed.
OK. The first half of the commit message for 0002 is slightly confusing
because it's referring to pre-existing behavior, behavior changed by
the commit, and also future work. It might help to clarify the tenses
like:
- Previously, ExecGetAllUpdatedCols() had gaps ..., but not a real bug
because ...
- This commit closes those gaps by updating ri_extraUpdatedCols in
ExecBRUpdateTriggers(), making ExecGetAllUpdatedCols() reliable.
- We know there are no other gaps because ...
- Useful to fix because later work will rely on it for [very brief
reason]
> Imagine a table with a lot of indexes where updates only modify one
> or two at a time. Why are we testing indexed attributes for changes
> in HeapDeterminColumnsInfo() that couldn't have changed? The answer
> is that a) HeapDeterminColumnsInfo() lives in heap, not the executor
> (see patch 0003) so it has no ability to call
> ExecGetAllUpdatedCols(), and b) the set returned by
> ExecGetAllUpdatedCols() is sometimes incomplete.
That's helpful, thank you.
> What do we "need to be sure" of? That ExecGetAllUpdatedCols() not
> really contains all attributes that its name implies? I think it now
> does that after 0002, do you disagree?
I don't disagree, but I think we need some kind statement that we
believe that it's true, and a brief explanation why. (I don't have much
of an opinion about whether it's in this thread, the commit message, or
the code.)
>
> I think it is a new guarantee that was implied before now but not
> required until 0003. I think this change reduces overhead and helps
> to avoid some future security feature that depends on
> ExecGetAllUpdatedCols() to provide that guarantee.
>
> Does that make sense?
A subtlety here is that perhaps ExecGetAllUpdatedCols() already *was*
correct, and it just meant something different than we thought: the
*targeted* columns of an update, instead of the actually-updated
values.
If so we should think about whether that distinction should be
preserved. For instance, column filtering for triggers should be based
on the targeted columns (rather than actually-updated values) because,
semantically, it should still fire even for a no-op update. Perhaps
similar for choosing the lock mode?
>
Regards,
Jeff Davis
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-03-17 15:22 ` Nathan Bossart <[email protected]>
1 sibling, 0 replies; 24+ messages in thread
From: Nathan Bossart @ 2026-03-17 15:22 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
Catching up here. I see that you dropped 0002. Can you explain why that's
no longer needed?
On Mon, Mar 16, 2026 at 04:51:31PM -0400, Greg Burd wrote:
> Refactor executor update logic to determine which indexed columns have
> actually changed during an UPDATE operation rather than leaving this up
> to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
> attributes is not heap-specific, but more general to all table AMs and
> having this information in the executor could inform other decisions
> about when index inserts are required and when they are not regardless
> of the table AM's MVCC implementation strategy.
Nice, this is a crisp motivation statement.
> Development of this feature exposed nondeterministic behavior in three
> existing tests which have been adjusted to avoid inconsistent test
> results due to tuple ordering during heap page scans.
Logistically speaking, these could be nice to get out of the way early as a
prerequisite patch so we can focus on the substance of this patch.
The rest of my comments are from a relatively quick skim. Deeper review to
follow...
> + /*
> + * Reduce the set under review to only the unmodified indexed replica
> + * identity key attributes. idx_attrs is copied (by bms_difference())
> + * not modified here.
> + */
> + attrs = bms_difference(idx_attrs, modified_idx_attrs);
> + attrs = bms_int_members(attrs, rid_attrs);
> +
> + while ((attidx = bms_next_member(attrs, attidx)) >= 0)
Could it be worth moving this loop (and some surrounding code) to a helper
function?
> - * Note: beyond this point, use oldtup not otid to refer to old tuple.
> + * NOTE: beyond this point, use oldtup not otid to refer to old tuple.
nitpick: Please remove unnecessary changes.
> @@ -5269,10 +5269,10 @@ RelationGetIndexPredicate(Relation relation)
> * in expressions (i.e., usable for FKs)
> * INDEX_ATTR_BITMAP_PRIMARY_KEY Columns in the table's primary key
> * (beware: even if PK is deferrable!)
> + * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
> * INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
> * index (empty if FULL)
> - * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
> - * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
> + * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
Is the meaning of INDEX_ATTR_BITMAP_SUMMARIZED changing in this patch? I
see you moved it and dropped the "only".
> - Bitmapset *summarizedattrs; /* columns with summarizing indexes */
> + Bitmapset *indexedattrs; /* columns referenced by indexes */
> + Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
But you added an "only" here...
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-03-17 16:38 ` Jeff Davis <[email protected]>
2026-03-17 18:04 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
1 sibling, 1 reply; 24+ messages in thread
From: Jeff Davis @ 2026-03-17 16:38 UTC (permalink / raw)
To: Greg Burd <[email protected]>; Nathan Bossart <[email protected]>; +Cc: pgsql-hackers
On Mon, 2026-03-16 at 16:51 -0400, Greg Burd wrote:
> > Also, the "actually changed values" is only valid for a single
> > tuple,
> > and it would be good to clarify that and make sure there's not a
> > lot of
> > room for confusion there.
>
> Yes, that's true... too much confusion and not enough juice for the
> squeeze. I'm dropping that.
That is an interesting case you found in that the columns targeted by
an update are not a superset of the columns with actually changed
values. But I'm not sure exactly what to make of that fact, and if it's
not important for your other changes then I agree that we should drop
it.
However, it might be good to comment somewhere that your changes (which
are based on values in specific tuples) cannot rely on
ExecGetAllUpdatedCols(), to avoid confusion in the future.
Regards,
Jeff Davis
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
@ 2026-03-17 18:04 ` Greg Burd <[email protected]>
2026-03-23 18:39 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-03-17 18:04 UTC (permalink / raw)
To: Jeff Davis <[email protected]>; Nathan Bossart <[email protected]>; +Cc: pgsql-hackers
On Tue, Mar 17, 2026, at 12:38 PM, Jeff Davis wrote:
> On Mon, 2026-03-16 at 16:51 -0400, Greg Burd wrote:
>> > Also, the "actually changed values" is only valid for a single
>> > tuple,
>> > and it would be good to clarify that and make sure there's not a
>> > lot of
>> > room for confusion there.
>>
>> Yes, that's true... too much confusion and not enough juice for the
>> squeeze. I'm dropping that.
>
> That is an interesting case you found in that the columns targeted by
> an update are not a superset of the columns with actually changed
> values. But I'm not sure exactly what to make of that fact, and if it's
> not important for your other changes then I agree that we should drop
> it.
>
> However, it might be good to comment somewhere that your changes (which
> are based on values in specific tuples) cannot rely on
> ExecGetAllUpdatedCols(), to avoid confusion in the future.
Fair point, I'll do that.
> Regards,
> Jeff Davis
v37 attached with changes you and Nathan asked for so far. More please! :)
thanks Jeff and Nathan!
best.
-greg
Attachments:
[text/x-patch] v37-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch (45.3K, 2-v37-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch)
download | inline diff:
From 6553faa775465e0d525450b6b3cf84f95a02e033 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 09:28:15 -0400
Subject: [PATCH v37 1/4] Add tests to cover a variety of heap HOT update
behaviors
This commit introduces test infrastructure for verifying Heap-Only Tuple
(HOT) update functionality in PostgreSQL. It provides a baseline for
demonstrating and validating HOT update behavior.
Regression tests:
- Basic HOT vs non-HOT update decisions
- All-or-none property for multiple indexes
- Partial indexes and predicate handling
- BRIN (summarizing) indexes allowing HOT updates
- TOAST column handling with HOT
- Unique constraints behavior
- Multi-column indexes
- Partitioned table HOT updates
Isolation tests:
- HOT chain formation and maintenance
- Concurrent HOT update scenarios
- Index scan behavior with HOT chains
---
src/test/regress/expected/hot_updates.out | 745 ++++++++++++++++++++++
src/test/regress/parallel_schedule | 5 +
src/test/regress/sql/hot_updates.sql | 605 ++++++++++++++++++
3 files changed, 1355 insertions(+)
create mode 100644 src/test/regress/expected/hot_updates.out
create mode 100644 src/test/regress/sql/hot_updates.sql
diff --git a/src/test/regress/expected/hot_updates.out b/src/test/regress/expected/hot_updates.out
new file mode 100644
index 00000000000..273fe3310da
--- /dev/null
+++ b/src/test/regress/expected/hot_updates.out
@@ -0,0 +1,745 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 2. pageinspect extension for HOT chain examination
+-- 3. EXPLAIN to verify index usage after updates
+--
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 3
+(1 row)
+
+-- Dump the HOT chain before VACUUMing
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ t | 0 | (0,1) | normal (1) | (0,4)
+ t | 1 | (0,4) | normal (1) | (0,4)
+(2 rows)
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ f | 0 | (0,4) | normal (1) | (0,4)
+(1 row)
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 4 | 3
+(1 row)
+
+-- Verify index was updated (new value findable)
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 150)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ id | indexed_col
+----+-------------
+ 1 | 150
+(1 row)
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id FROM hot_test WHERE indexed_col = 100;
+ id
+----
+(0 rows)
+
+RESET enable_seqscan;
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 1
+(1 row)
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 2
+(1 row)
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+ id | status
+----+--------
+ 1 | active
+(1 row)
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 2
+(1 row)
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+ id | unique_col | data
+----+------------+---------
+ 1 | 100 | updated
+ 2 | 200 | updated
+(2 rows)
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+ERROR: duplicate key value violates unique constraint "hot_test_unique_col_key"
+DETAIL: Key (unique_col)=(100) already exists.
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 4 | 0
+(1 row)
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 6 | 2
+(1 row)
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+NOTICE: table "hot_test_partitioned" does not exist, skipping
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) WITH (fillfactor = 50);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) WITH (fillfactor = 50);
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test_part1');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+SELECT * FROM get_hot_count('hot_test_part2');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+ id
+----
+ 2
+(1 row)
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test_part1');
+ updates | hot
+---------+-----
+ 2 | 1
+(1 row)
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+ id
+----
+ 1
+(1 row)
+
+-- ============================================================================
+-- Trigger modifications: heap_modify_tuple() and HOT
+-- ============================================================================
+-- Test that we correctly detect when triggers modify indexed columns via
+-- heap_modify_tuple(), even when those columns aren't in the UPDATE's SET clause
+CREATE TABLE hot_trigger_test (
+ id int PRIMARY KEY,
+ triggered_col int,
+ data text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_trigger_idx ON hot_trigger_test(triggered_col);
+-- Create a trigger that modifies an indexed column
+CREATE OR REPLACE FUNCTION modify_triggered_col()
+RETURNS TRIGGER AS $$
+BEGIN
+ NEW.triggered_col = NEW.triggered_col + 1;
+ RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+CREATE TRIGGER before_update_modify
+ BEFORE UPDATE ON hot_trigger_test
+ FOR EACH ROW
+ EXECUTE FUNCTION modify_triggered_col();
+INSERT INTO hot_trigger_test VALUES (1, 100, 'initial');
+SELECT * FROM get_hot_count('hot_trigger_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update only data column, but trigger modifies indexed column
+-- Should NOT be HOT because trigger modified an indexed column
+UPDATE hot_trigger_test SET data = 'updated' WHERE id = 1;
+-- Verify it was NOT a HOT update (indexed column was modified by trigger)
+SELECT * FROM get_hot_count('hot_trigger_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Verify the triggered column was actually modified
+SELECT triggered_col FROM hot_trigger_test WHERE id = 1;
+ triggered_col
+---------------
+ 101
+(1 row)
+
+DROP TABLE hot_trigger_test CASCADE;
+DROP FUNCTION modify_triggered_col();
+-- ============================================================================
+-- JSONB expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed JSONB paths can be HOT updates
+CREATE TABLE hot_jsonb_test (
+ id int PRIMARY KEY,
+ data jsonb
+) WITH (fillfactor = 50);
+-- Create expression index on a specific JSON path
+CREATE INDEX hot_jsonb_name_idx ON hot_jsonb_test ((data->>'name'));
+INSERT INTO hot_jsonb_test VALUES
+ (1, '{"name":"Alice","age":30,"city":"NYC"}'),
+ (2, '{"name":"Bob","age":25,"city":"LA"}');
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update non-indexed JSON path (age) - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{age}', '31') WHERE id = 1;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Update indexed JSON path (name) - should NOT be HOT
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{name}', '"Alice2"') WHERE id = 1;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 2 | 0
+(1 row)
+
+-- Verify index works
+SELECT id FROM hot_jsonb_test WHERE data->>'name' = 'Alice2';
+ id
+----
+ 1
+(1 row)
+
+-- Test jsonb_delete on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = data - 'city' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Test jsonb_insert on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_insert(data, '{country}', '"USA"') WHERE id = 2;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 4 | 0
+(1 row)
+
+DROP TABLE hot_jsonb_test;
+-- ============================================================================
+-- XML expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed XML paths can be HOT updates
+CREATE TABLE hot_xml_test (
+ id int PRIMARY KEY,
+ doc xml
+) WITH (fillfactor = 50);
+-- Create expression index on a specific XPath
+CREATE INDEX hot_xml_name_idx ON hot_xml_test ((xpath('/person/name/text()', doc)));
+INSERT INTO hot_xml_test VALUES
+ (1, '<person><name>Alice</name><age>30</age></person>'),
+ (2, '<person><name>Bob</name><age>25</age></person>');
+ERROR: could not identify a comparison function for type xml
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update non-indexed XPath (age) - behavior depends on XML comparison fallback
+-- Full XML value replacement means non-indexed path updates still require index comparison
+UPDATE hot_xml_test SET doc = '<person><name>Alice</name><age>31</age></person>' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update indexed XPath (name) - should NOT be HOT
+UPDATE hot_xml_test SET doc = '<person><name>Alice2</name><age>31</age></person>' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Verify index works
+SELECT id FROM hot_xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['Alice2'::text];
+ERROR: operator does not exist: xml[] = text[]
+LINE 1: ..._xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['A...
+ ^
+DETAIL: No operator of that name accepts the given argument types.
+HINT: You might need to add explicit type casts.
+DROP TABLE hot_xml_test;
+-- ============================================================================
+-- GIN indexes and amcomparedatums for JSONB
+-- ============================================================================
+-- Test that GIN indexes can use amcomparedatums to enable HOT when extracted keys match
+CREATE TABLE hot_gin_test (
+ id int PRIMARY KEY,
+ tags text[],
+ properties jsonb
+) WITH (fillfactor = 50);
+-- GIN index on text array
+CREATE INDEX hot_gin_tags_idx ON hot_gin_test USING gin (tags);
+-- GIN index on JSONB (jsonb_ops - keys and values)
+CREATE INDEX hot_gin_props_idx ON hot_gin_test USING gin (properties);
+INSERT INTO hot_gin_test VALUES
+ (1, ARRAY['tag1', 'tag2'], '{"key1":"val1","key2":"val2"}'),
+ (2, ARRAY['tag3', 'tag4'], '{"key3":"val3","key4":"val4"}');
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update that changes tag order but not content - after amcomparedatums should be HOT
+-- (GIN extracts same keys, just different order)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1'] WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Update JSONB value (not key) - after amcomparedatums may be HOT or non-HOT
+-- depending on GIN operator class (jsonb_ops indexes both keys and values)
+UPDATE hot_gin_test SET properties = '{"key1":"val1_new","key2":"val2"}' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 2 | 0
+(1 row)
+
+-- Add new tag - should NOT be HOT (different extracted keys)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1', 'tag5'] WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Verify GIN indexes work
+SELECT id FROM hot_gin_test WHERE tags @> ARRAY['tag5'];
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_gin_test WHERE properties @> '{"key1":"val1_new"}';
+ id
+----
+ 1
+(1 row)
+
+DROP TABLE hot_gin_test;
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index e779ada70cb..05e63a5d76f 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,6 +137,11 @@ test: event_trigger_login
# this test also uses event triggers, so likewise run it by itself
test: fast_default
+# ----------
+# HOT updates tests
+# ----------
+test: hot_updates
+
# run tablespace test at the end because it drops the tablespace created during
# setup that other tests may use.
test: tablespace
diff --git a/src/test/regress/sql/hot_updates.sql b/src/test/regress/sql/hot_updates.sql
new file mode 100644
index 00000000000..a8894006177
--- /dev/null
+++ b/src/test/regress/sql/hot_updates.sql
@@ -0,0 +1,605 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 2. pageinspect extension for HOT chain examination
+-- 3. EXPLAIN to verify index usage after updates
+--
+
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+
+-- Dump the HOT chain before VACUUMing
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index was updated (new value findable)
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+SELECT id FROM hot_test WHERE indexed_col = 100;
+RESET enable_seqscan;
+
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) WITH (fillfactor = 50);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_test_part1');
+SELECT * FROM get_hot_count('hot_test_part2');
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test_part1');
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+
+-- ============================================================================
+-- Trigger modifications: heap_modify_tuple() and HOT
+-- ============================================================================
+-- Test that we correctly detect when triggers modify indexed columns via
+-- heap_modify_tuple(), even when those columns aren't in the UPDATE's SET clause
+
+CREATE TABLE hot_trigger_test (
+ id int PRIMARY KEY,
+ triggered_col int,
+ data text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_trigger_idx ON hot_trigger_test(triggered_col);
+
+-- Create a trigger that modifies an indexed column
+CREATE OR REPLACE FUNCTION modify_triggered_col()
+RETURNS TRIGGER AS $$
+BEGIN
+ NEW.triggered_col = NEW.triggered_col + 1;
+ RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER before_update_modify
+ BEFORE UPDATE ON hot_trigger_test
+ FOR EACH ROW
+ EXECUTE FUNCTION modify_triggered_col();
+
+INSERT INTO hot_trigger_test VALUES (1, 100, 'initial');
+
+SELECT * FROM get_hot_count('hot_trigger_test');
+
+-- Update only data column, but trigger modifies indexed column
+-- Should NOT be HOT because trigger modified an indexed column
+UPDATE hot_trigger_test SET data = 'updated' WHERE id = 1;
+
+-- Verify it was NOT a HOT update (indexed column was modified by trigger)
+SELECT * FROM get_hot_count('hot_trigger_test');
+
+-- Verify the triggered column was actually modified
+SELECT triggered_col FROM hot_trigger_test WHERE id = 1;
+
+DROP TABLE hot_trigger_test CASCADE;
+DROP FUNCTION modify_triggered_col();
+
+-- ============================================================================
+-- JSONB expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed JSONB paths can be HOT updates
+
+CREATE TABLE hot_jsonb_test (
+ id int PRIMARY KEY,
+ data jsonb
+) WITH (fillfactor = 50);
+
+-- Create expression index on a specific JSON path
+CREATE INDEX hot_jsonb_name_idx ON hot_jsonb_test ((data->>'name'));
+
+INSERT INTO hot_jsonb_test VALUES
+ (1, '{"name":"Alice","age":30,"city":"NYC"}'),
+ (2, '{"name":"Bob","age":25,"city":"LA"}');
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Update non-indexed JSON path (age) - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{age}', '31') WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Update indexed JSON path (name) - should NOT be HOT
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{name}', '"Alice2"') WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Verify index works
+SELECT id FROM hot_jsonb_test WHERE data->>'name' = 'Alice2';
+
+-- Test jsonb_delete on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = data - 'city' WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Test jsonb_insert on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_insert(data, '{country}', '"USA"') WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+DROP TABLE hot_jsonb_test;
+
+-- ============================================================================
+-- XML expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed XML paths can be HOT updates
+
+CREATE TABLE hot_xml_test (
+ id int PRIMARY KEY,
+ doc xml
+) WITH (fillfactor = 50);
+
+-- Create expression index on a specific XPath
+CREATE INDEX hot_xml_name_idx ON hot_xml_test ((xpath('/person/name/text()', doc)));
+
+INSERT INTO hot_xml_test VALUES
+ (1, '<person><name>Alice</name><age>30</age></person>'),
+ (2, '<person><name>Bob</name><age>25</age></person>');
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Update non-indexed XPath (age) - behavior depends on XML comparison fallback
+-- Full XML value replacement means non-indexed path updates still require index comparison
+UPDATE hot_xml_test SET doc = '<person><name>Alice</name><age>31</age></person>' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Update indexed XPath (name) - should NOT be HOT
+UPDATE hot_xml_test SET doc = '<person><name>Alice2</name><age>31</age></person>' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Verify index works
+SELECT id FROM hot_xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['Alice2'::text];
+
+DROP TABLE hot_xml_test;
+
+-- ============================================================================
+-- GIN indexes and amcomparedatums for JSONB
+-- ============================================================================
+-- Test that GIN indexes can use amcomparedatums to enable HOT when extracted keys match
+
+CREATE TABLE hot_gin_test (
+ id int PRIMARY KEY,
+ tags text[],
+ properties jsonb
+) WITH (fillfactor = 50);
+
+-- GIN index on text array
+CREATE INDEX hot_gin_tags_idx ON hot_gin_test USING gin (tags);
+
+-- GIN index on JSONB (jsonb_ops - keys and values)
+CREATE INDEX hot_gin_props_idx ON hot_gin_test USING gin (properties);
+
+INSERT INTO hot_gin_test VALUES
+ (1, ARRAY['tag1', 'tag2'], '{"key1":"val1","key2":"val2"}'),
+ (2, ARRAY['tag3', 'tag4'], '{"key3":"val3","key4":"val4"}');
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Update that changes tag order but not content - after amcomparedatums should be HOT
+-- (GIN extracts same keys, just different order)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1'] WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Update JSONB value (not key) - after amcomparedatums may be HOT or non-HOT
+-- depending on GIN operator class (jsonb_ops indexes both keys and values)
+UPDATE hot_gin_test SET properties = '{"key1":"val1_new","key2":"val2"}' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Add new tag - should NOT be HOT (different extracted keys)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1', 'tag5'] WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Verify GIN indexes work
+SELECT id FROM hot_gin_test WHERE tags @> ARRAY['tag5'];
+SELECT id FROM hot_gin_test WHERE properties @> '{"key1":"val1_new"}';
+
+DROP TABLE hot_gin_test;
+
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
--
2.51.2
[text/x-patch] v37-0002-Identify-modified-indexed-attributes-in-the-exec.patch (61.4K, 3-v37-0002-Identify-modified-indexed-attributes-in-the-exec.patch)
download | inline diff:
From 06ea9702713f4852c18b1e726ad35e3ff80a56c7 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 08:17:31 -0400
Subject: [PATCH v37 2/4] Identify modified indexed attributes in the executor
on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes"
(modified_idx_attrs, formerly known as modified_attrs).
ExecUpdateModifiedIdxAttrs() replaces HeapDetermineColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the buffer lock is held later within
heapam_tuple_update() and heap_update().
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also partially responsible for the
decision about what to WAL log for the replica identity key. This logic
moved into heap_update() and out of the replacement named
HeapUpdateModifiedIdxAttrs(). Doing this allows for
simple_heap_update() and heapam_tuple_update() to share the same logic
as they both call into heap_update().
Updates stemming from logical replication also use the new
ExecUpdateModifiedIdxAttrs() in ExecSimpleRelationUpdate().
ExecUpdateModifiedIdxAttrs() uses ExecCompareSlotAttrs() to identify
which attributes have changed and then intersects that with the set of
indexed attributes to identify the modified indexed set, the
modified_idx_attrs.
This patch introduces a few helper functions to reduce code duplication
and increase readability: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a boolean indicating if the update allows HOT updates or not, both
const. If during heap_update() the new tuple will fit on the same page
and that boolean is true, the update is HOT. This means that although
the functions and timing of the code involed in HOT decisions have
changed, none of the logic related to when HOT is allowed has changed.
Development of this feature exposed nondeterministic behavior in three
existing tests which have been adjusted to avoid inconsistent test
results due to tuple ordering during heap page scans.
---
src/backend/access/heap/heapam.c | 480 ++++++++++++------
src/backend/access/heap/heapam_handler.c | 31 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/executor/execReplication.c | 9 +-
src/backend/executor/execTuples.c | 70 +++
src/backend/executor/nodeModifyTable.c | 88 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 9 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../expected/syscache-update-pruned.out | 12 +-
.../specs/syscache-update-pruned.spec | 6 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/triggers.out | 16 +-
src/test/regress/expected/tsearch.out | 3 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/triggers.sql | 4 +-
src/test/regress/sql/tsearch.sql | 3 +-
src/test/regress/sql/updatable_views.sql | 2 +-
22 files changed, 583 insertions(+), 232 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..307855fdd67 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,21 +37,26 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
-#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -68,11 +73,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModifiedIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3312,7 +3314,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3322,17 +3324,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *rid_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3352,13 +3350,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3389,36 +3386,21 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well as
+ * the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * NOTE: We must compute the list before obtaining buffer lock. In the
+ * worst case, if we are doing an update on one of the relevant system
+ * catalogs, we could deadlock if we try to fetch the list later. Keep in
+ * mind that relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through, but we do need to free
+ * them.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
- INJECTION_POINT("heap_update-before-pin", NULL);
+ INJECTION_POINT("simple_heap_update-before-pin", NULL);
buffer = ReadBuffer(relation, block);
page = BufferGetPage(buffer);
@@ -3469,20 +3451,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3493,16 +3472,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicaIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(modified_idx_attrs, rid_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, modified_idx_attrs);
+ attrs = bms_int_members(attrs, rid_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3515,9 +3537,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3534,7 +3555,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3613,7 +3634,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3622,7 +3643,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3707,7 +3728,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3767,17 +3788,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3807,7 +3825,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3924,7 +3942,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4086,10 +4104,11 @@ l2:
/*
* At this point newbuf and buffer are both pinned and locked, and newbuf
- * has enough space for the new tuple. If they are the same buffer, only
- * one pin is held.
+ * has enough space for the new tuple so we can use the HOT update path if
+ * the caller determined that it is allowable.
+ *
+ * NOTE: If newbuf == buffer then only one pin is held.
*/
-
if (newbuf == buffer)
{
/*
@@ -4097,20 +4116,8 @@ l2:
* to do a HOT update. Check if any of the index columns have been
* changed.
*/
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
+ if (hot_allowed)
use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
}
else
{
@@ -4126,8 +4133,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4256,7 +4262,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4270,31 +4276,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
- bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(rid_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4467,28 +4454,115 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(modified_idx_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by summarizing
+ * indexes? If it is, then we need to update those indexes, but this
+ * update can still be considered heap-only (HOT) and avoid updating
+ * any non-summarizing indexes on the relation.
+ */
+ if (bms_is_subset(modified_idx_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know a) one or more indexed attributes were modified
+ * (changed value, not just referenced within the UPDATE) and that
+ * b) at least one of those attributes is used by a non-summarizing
+ * index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any attributes used when forming the index keys we can
+ * grab a weaker lock type. This allows for more concurrency when we are
+ * running simultaneously with foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *modified_idx_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(modified_idx_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModifiedIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *modified_idx_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "modified_idx_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4504,7 +4578,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
@@ -4517,7 +4591,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
}
@@ -4533,29 +4607,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return modified_idx_attrs;
}
/*
@@ -4567,17 +4624,112 @@ HeapDetermineColumnsInfo(Relation relation,
* via ereport().
*/
void
-simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
+simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *modified_idx_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("simple_heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("modified_idx_attrs") and use that to determine if a HOT
+ * update is allowable or not. When updating heap tuples via execution of
+ * UPDATE statements this set is constructed before calling into the table
+ * AM's update function by ExecUpdateModifiedIdxAttrs() which compares the
+ * old/new TupleTableSlots.
+ *
+ * Here things are a bit different, we have the old TID and the new tuple,
+ * not two TupleTableSlots, but we still need to construct a similar bitmap
+ * so as to be able to know if HOT updates are allowed or not.
+ *
+ * To do that we first have to fetch the old tuple itself, but because
+ * heapam_fetch_row_version() is static, we replicate in part that code
+ * here.
+ *
+ * This is a bit repetitive because heap_update() will again find and form
+ * the old HeapTuple from the old TID and in most cases the callers
+ * (ignoring extensions, are always catalog tuple updates) already had the
+ * set of changed attributes (the "replaces" array), but for now this minor
+ * repetition of work is necessary.
+ */
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple, 0);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* modified_idx_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ modified_idx_attrs = HeapUpdateModifiedIdxAttrs(relation, oldtup, tuple);
+ lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tuple, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, modified_idx_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4587,6 +4739,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tuple))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..3726c867c65 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -325,19 +324,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, modified_idx_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -350,16 +356,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..9ba72d51dfa 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ modified_idx_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..74a7379186b 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *modified_idx_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
+ estate, searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ modified_idx_attrs, &update_indexes);
+ bms_free(modified_idx_attrs);
+
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index b717b03b3d2..e8c5639b61e 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -1999,6 +2000,75 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a reused when possible Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(Bitmapset *attrs, TupleDesc tupdesc,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ continue;
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum == TableOidAttributeNumber)
+ attrs = bms_del_member(attrs, attidx);
+ else
+ continue;
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ continue;
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ {
+ attrs = bms_del_member(attrs, attidx);
+ continue;
+ }
+
+ if (datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ attrs = bms_del_member(attrs, attidx);
+ }
+
+ return attrs;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..ea5058c7a37 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModifiedIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -55,6 +56,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
#include "access/tupconvert.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -190,6 +192,63 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModifiedIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "modified_idx_attrs". An overlap of a single index's attributes and this
+ * modified_idx_attrs set signals that the attributes in the new_tts used to
+ * form the index datum have changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * Note: There is a similar function called HeapUpdateModifiedIdxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModifiedIdxAttrs(ResultRelInfo *resultRelInfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ *
+ * Note: We intentionally scan all indexed columns when looking for changes
+ * rather than reduce that set by intersecting it with
+ * ExecGetAllUpdatedCols(). Desipte the name it provides the set of
+ * targeted attributes in the SQL used for the UPDATE and any triggers, but
+ * that doesn't include any attributes updated using heap_modifiy_tuple().
+ * There is one test in tsearch.sql that does just that, modifies an
+ * indexed attribute that isn't specified in the SQL and so isn't present
+ * in that bitmapset.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the "modified_idx_attrs".
+ */
+ attrs = ExecCompareSlotAttrs(attrs, tupdesc, old_tts, new_tts);
+
+ return attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2197,14 +2256,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *modified_idx_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2321,7 +2383,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo, estate, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2335,6 +2406,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ modified_idx_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2557,8 +2629,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3408,8 +3480,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4546,7 +4618,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3a4f19e8d58..f2b7fb8f444 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2469,7 +2469,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
+ bms_free(relation->rd_indexedattr);
bms_free(relation->rd_summarizedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
@@ -5271,8 +5271,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5295,8 +5295,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5315,8 +5315,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
default:
@@ -5361,7 +5361,7 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
+ indexedattrs = NULL;
summarizedattrs = NULL;
foreach(l, indexoidlist)
{
@@ -5421,7 +5421,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5430,9 +5430,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5493,12 +5493,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
+ bms_free(indexedattrs);
bms_free(summarizedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5507,8 +5515,8 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
@@ -5523,7 +5531,7 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5537,8 +5545,8 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 2fdc50b865b..088097a9188 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -390,10 +390,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -456,6 +455,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *modified_idx_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8ec20dcfc11 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1523,12 +1524,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ modified_idx_attrs, update_indexes);
}
/*
@@ -2009,6 +2010,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..713ed35d8cf 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -18,6 +18,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -610,6 +611,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(Bitmapset *attrs,
+ TupleDesc tupdesc,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -807,5 +812,9 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModifiedIdxAttrs(ResultRelInfo *relinfo,
+ EState *estate,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..11460e134f0 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,7 +162,7 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..d4db82496b4 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,7 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
+ INDEX_ATTR_BITMAP_INDEXED,
INDEX_ATTR_BITMAP_SUMMARIZED,
} IndexAttrBitmapKind;
diff --git a/src/test/modules/injection_points/expected/syscache-update-pruned.out b/src/test/modules/injection_points/expected/syscache-update-pruned.out
index a6a4e8db996..07ef67a1eb4 100644
--- a/src/test/modules/injection_points/expected/syscache-update-pruned.out
+++ b/src/test/modules/injection_points/expected/syscache-update-pruned.out
@@ -16,8 +16,8 @@ step wakeinval4:
step at2: <... completed>
step wakeinval4: <... completed>
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
ERROR: tuple concurrently deleted
@@ -42,8 +42,8 @@ step mkrels4:
SELECT FROM vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
@@ -71,8 +71,8 @@ step at2: <... completed>
step wakeinval4: <... completed>
step at4: ALTER TABLE vactest.child50 INHERIT vactest.orig50;
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
step wakegrant4: <... completed>
diff --git a/src/test/modules/injection_points/specs/syscache-update-pruned.spec b/src/test/modules/injection_points/specs/syscache-update-pruned.spec
index e3a4295bd12..fef9ac895a1 100644
--- a/src/test/modules/injection_points/specs/syscache-update-pruned.spec
+++ b/src/test/modules/injection_points/specs/syscache-update-pruned.spec
@@ -103,7 +103,7 @@ session s1
setup {
SET debug_discard_caches = 0;
SELECT FROM injection_points_set_local();
- SELECT FROM injection_points_attach('heap_update-before-pin', 'wait');
+ SELECT FROM injection_points_attach('simple_heap_update-before-pin', 'wait');
}
step cachefill1 { SELECT FROM vactest.reloid_catcache_set('vactest.orig50'); }
step grant1 { GRANT SELECT ON vactest.orig50 TO PUBLIC; }
@@ -140,8 +140,8 @@ step mkrels4 {
SELECT FROM vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED
}
step wakegrant4 {
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
}
step at4 { ALTER TABLE vactest.child50 INHERIT vactest.orig50; }
step wakeinval4 {
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 98dee63b50a..ef98fd0cccf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -959,16 +959,24 @@ NOTICE: main_view BEFORE UPDATE STATEMENT (before_view_upd_stmt)
NOTICE: main_view AFTER UPDATE STATEMENT (after_view_upd_stmt)
UPDATE 0
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
-NOTICE: OLD: (21,10)
-NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (20,31)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
+NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
+NOTICE: OLD: (21,10)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 32;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (21,32)
NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
-DELETE 3
+DELETE 1
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 9287c440709..c604ec35fa5 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2483,7 +2483,8 @@ SELECT to_tsquery('SKIES & My | booKs');
'sky' | 'book'
(1 row)
---trigger
+-- tsvector_update_trigger() uses heap_modify_tuple() to set column 'a'
+-- without going through the executor's SET-clause tracking.
CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_tsvector
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(a, 'pg_catalog.english', t);
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index ea39817ee3d..6ceb61608ae 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -660,7 +660,9 @@ UPDATE main_view SET b = 32 WHERE a = 21 AND b = 31 RETURNING a, b;
UPDATE main_view SET b = 0 WHERE false;
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+DELETE FROM main_view WHERE a = 21 AND b = 32;
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
\set QUIET true
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index dc74aa0c889..77ac5fd3c5a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -752,7 +752,8 @@ SELECT to_tsvector('SKIES My booKs');
SELECT plainto_tsquery('SKIES My booKs');
SELECT to_tsquery('SKIES & My | booKs');
---trigger
+-- tsvector_update_trigger() uses heap_modify_tuple() to set column 'a'
+-- without going through the executor's SET-clause tracking.
CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_tsvector
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(a, 'pg_catalog.english', t);
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 18:04 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-23 18:39 ` Nathan Bossart <[email protected]>
2026-03-24 18:02 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Bossart @ 2026-03-23 18:39 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
Thanks for the new patch. As a general note, please be sure to run
pgindent on patches. My review is still rather surface-level, sorry.
On Tue, Mar 17, 2026 at 02:04:11PM -0400, Greg Burd wrote:
> - id_attrs = RelationGetIndexAttrBitmap(relation,
> - INDEX_ATTR_BITMAP_IDENTITY_KEY);
> [...]
> + rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
I'm nitpicking, but it took me a while to parse the
replica-identity-related code in heap_update() until I discovered that this
variable was renamed. I think we ought to leave the name alone.
> /*
> * At this point newbuf and buffer are both pinned and locked, and newbuf
> - * has enough space for the new tuple. If they are the same buffer, only
> - * one pin is held.
> + * has enough space for the new tuple so we can use the HOT update path if
> + * the caller determined that it is allowable.
> + *
> + * NOTE: If newbuf == buffer then only one pin is held.
> */
> -
> if (newbuf == buffer)
Sorry, more nitpicks. In addition to the unnecessary removal of the blank
line, I'm not sure the changes to this comment are needed.
> - /*
> - * If it is a HOT update, the update may still need to update summarized
> - * indexes, lest we fail to update those summaries and get incorrect
> - * results (for example, minmax bounds of the block may change with this
> - * update).
> - */
> - if (use_hot_update)
> - {
> - if (summarized_update)
> - *update_indexes = TU_Summarizing;
> - else
> - *update_indexes = TU_None;
> - }
> - else
> - *update_indexes = TU_All;
So, the "HOT but still need to update summarized indexes" code has been
moved from heap_update() to HeapUpdateHotAllowable(), which is called by
heap_update()'s callers (i.e., simple_heap_update() and
heapam_tuple_update()). That looks correct to me at a glance.
> -simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
> +simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
nitpick: This variable name change looks unnecessary.
> @@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
> if (rel->rd_rel->relispartition)
> ExecPartitionCheck(resultRelInfo, slot, estate, true);
>
> + modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
> + estate, searchslot, slot);
> +
> simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
> - &update_indexes);
> + modified_idx_attrs, &update_indexes);
> + bms_free(modified_idx_attrs);
I don't know how constructive of a comment this is, but this change in
particular seems quite out of place. It feels odd to me that we expect
callers of simple_table_tuple_update() to determine the
modified-index-attributes. I guess I'm confused why this work doesn't
belong one level down, i.e., in the tuple_update function.
> - * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
> + * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
> + * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
> - Bitmapset *summarizedattrs; /* columns with summarizing indexes */
> + Bitmapset *indexedattrs; /* columns referenced by indexes */
> + Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
As before, the comment changes for the summarized-attr-related stuff seem
unnecessary.
> if (indexDesc->rd_indam->amsummarizing)
> attrs = &summarizedattrs;
> else
> - attrs = &hotblockingattrs;
> + attrs = &indexedattrs;
> + /*
> + * Record what attributes are only referenced by summarizing indexes. Then
> + * add that into the other indexed attributes to track all referenced
> + * attributes.
> + */
> + summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
> + indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
The difference between hotblockingattrs and indexedattrs seems quite
subtle. Am I understanding correctly that indexedattrs is essentially just
hotblockingattrs + summarizedattrs? And that this is all meant for
INDEX_ATTR_BITMAP_INDEXED?
- INJECTION_POINT("heap_update-before-pin", NULL);
+ INJECTION_POINT("simple_heap_update-before-pin", NULL);
Why was this changed in heap_update()?
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 18:04 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-23 18:39 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
@ 2026-03-24 18:02 ` Greg Burd <[email protected]>
2026-03-24 19:44 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Greg Burd @ 2026-03-24 18:02 UTC (permalink / raw)
To: Nathan Bossart <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
On Mon, Mar 23, 2026, at 2:39 PM, Nathan Bossart wrote:
> Thanks for the new patch. As a general note, please be sure to run
> pgindent on patches. My review is still rather surface-level, sorry.
Hello Nathan,
Thanks for continuing to review my work. I appreciate your time. I do run pgindent on all patches, maybe something slipped it. Apologies if that's the case. :)
> On Tue, Mar 17, 2026 at 02:04:11PM -0400, Greg Burd wrote:
>> - id_attrs = RelationGetIndexAttrBitmap(relation,
>> - INDEX_ATTR_BITMAP_IDENTITY_KEY);
>> [...]
>> + rid_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
>
> I'm nitpicking, but it took me a while to parse the
> replica-identity-related code in heap_update() until I discovered that this
> variable was renamed. I think we ought to leave the name alone.
Okay, reverted to "id_attrs".
>> /*
>> * At this point newbuf and buffer are both pinned and locked, and newbuf
>> - * has enough space for the new tuple. If they are the same buffer, only
>> - * one pin is held.
>> + * has enough space for the new tuple so we can use the HOT update path if
>> + * the caller determined that it is allowable.
>> + *
>> + * NOTE: If newbuf == buffer then only one pin is held.
>> */
>> -
>> if (newbuf == buffer)
>
> Sorry, more nitpicks. In addition to the unnecessary removal of the blank
> line, I'm not sure the changes to this comment are needed.
Okay, reverted to earlier comment and blank line re-inserted. :)
>> - /*
>> - * If it is a HOT update, the update may still need to update summarized
>> - * indexes, lest we fail to update those summaries and get incorrect
>> - * results (for example, minmax bounds of the block may change with this
>> - * update).
>> - */
>> - if (use_hot_update)
>> - {
>> - if (summarized_update)
>> - *update_indexes = TU_Summarizing;
>> - else
>> - *update_indexes = TU_None;
>> - }
>> - else
>> - *update_indexes = TU_All;
>
> So, the "HOT but still need to update summarized indexes" code has been
> moved from heap_update() to HeapUpdateHotAllowable(), which is called by
> heap_update()'s callers (i.e., simple_heap_update() and
> heapam_tuple_update()). That looks correct to me at a glance.
Yes, that's indeed what that is.
>> -simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup,
>> +simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tuple,
>
> nitpick: This variable name change looks unnecessary.
Okay, reverted to "tup".
>> @@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
>> if (rel->rd_rel->relispartition)
>> ExecPartitionCheck(resultRelInfo, slot, estate, true);
>>
>> + modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
>> + estate, searchslot, slot);
>> +
>> simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
>> - &update_indexes);
>> + modified_idx_attrs, &update_indexes);
>> + bms_free(modified_idx_attrs);
>
> I don't know how constructive of a comment this is, but this change in
> particular seems quite out of place. It feels odd to me that we expect
> callers of simple_table_tuple_update() to determine the
> modified-index-attributes. I guess I'm confused why this work doesn't
> belong one level down, i.e., in the tuple_update function.
Problem is that simple_table_tuple_update() has the old TID and the new slot, but not the old slot (searchslot), so I'd have to change the signature of that function either way. Passing modified_idx_attrs is the new pattern, so I am just reusing that here.
I could replicate what's in simple_heap_update() and call HeapUpdateModifiedIdxAttrs() after re-constructing the HeapTuple, but that feels very ugly/unnecessary to me given that the caller has that information already in slot form.
I've left this as is, but I'm happy to continue discussing options.
>> - * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
>> + * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
>> + * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
>
>> - Bitmapset *summarizedattrs; /* columns with summarizing indexes */
>> + Bitmapset *indexedattrs; /* columns referenced by indexes */
>> + Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
>
> As before, the comment changes for the summarized-attr-related stuff seem
> unnecessary.
I disagree, the "only" is required to highlight the logic change here. Before this patch summarized attrs could overlap with indexed attrs, now it should not. This makes the logic a bit easier later in HeapUpdateHotAllowable().
>> if (indexDesc->rd_indam->amsummarizing)
>> attrs = &summarizedattrs;
>> else
>> - attrs = &hotblockingattrs;
>> + attrs = &indexedattrs;
>
>> + /*
>> + * Record what attributes are only referenced by summarizing indexes. Then
>> + * add that into the other indexed attributes to track all referenced
>> + * attributes.
>> + */
>> + summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
>> + indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
>
> The difference between hotblockingattrs and indexedattrs seems quite
> subtle.
I feel it was *much* more subtle before and mis-named ("hot blocking"). But, let's review. On master today in heapam.c heap_update() near the start and before the buffer lock there is the following:
hot_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_HOT_BLOCKING);
sum_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_SUMMARIZED);
key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
It turns out that hot_attrs includes all INDEX_ATTR_BITMAP_IDENTITY_KEY and INDEX_ATTR_BITMAP_IDENTITY_KEY except for those found when scanning a summarizing index. So, what comes next is a bit wasteful.
interesting_attrs = NULL;
interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
interesting_attrs = bms_add_members(interesting_attrs, key_attrs); <- unnecessary
interesting_attrs = bms_add_members(interesting_attrs, id_attrs); <- unnecessary
And that in the end, what's passed to HeapDetermineColumnsInfo() is all indexed attributes, including summarized, and those found in expressions. That function then reduces the set from "interesting" to "modified (and indexed is implied)". It does this by testing before/after datum for equality (memcmp() via datumIsEqual()) and that becomes our "modified_attrs" set used for lockmode and HOT eligibility tests.
When testing later on (on master) for the HOT or NOT decision the code:
if (newbuf == buffer)
{
// first test to see if any modified/indexed attributes are used
// by non-summarizing indexes
if (!bms_overlap(modified_attrs, hot_attrs))
{
// and if not, we're going HOT
use_hot_update = true;
// at this point if there is any overlap it means that the only
// attributes that might be referenced by an index and modified
// are summarizing, there can't be any non-summarizing attributes
// in the modified_attrs set otherwise our first test would have
// failed, so this tests for the "only summarizing" case
if (bms_overlap(modified_attrs, sum_attrs))
only_summarized = true;
}
}
My thinking was, why re-create this every update? Why not have the cached representation of these bitmaps have what's needed?
Now, I've changed the logic in this patch. First in the executor nodeModifyTable.c ExecUpdateModifiedIdxAttrs() identify which indexed attributes were modified (changed value):
// get all attributes indexed on a relation, including summarized
// note how we no longer construct "interesting_attrs" from a number
// of bitmaps, the map we want is the map we cached and the name matches
// the content, *all* indexed attributes (not indexed, but not summarized)
attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
// compare the old/new reducing the set to only those that changed
// as determined by datum_is_equal() to produce the modified/indexed
// attribute set
attrs = ExecCompareSlotAttrs(attrs, tupdesc, old_tts, new_tts);
Then in heapam_handler.c heapam_tuple_update():
// call our helper function
hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
HeapUpdateHotAllowable()
{
// if no indexed attributes were modified, we're done
if (bms_is_empty(modified_idx_attrs))
return true;
else
{
// now we need the *only* summarized attributes
Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_SUMMARIZED);
// if the modified set is a sumset of the summarized,
// we're only updating summarized
if (bms_is_subset(modified_idx_attrs, sum_attrs))
{
hot_allowed = true;
*summarized_only = true;
}
else
// at least one attribute is modified, referenced by an index
// that isn't summarizing, HOT isn't allowed
hot_allowed = false;
bms_free(sum_attrs);
}
}
So, we go from 3 calls to RelationGetIndexAttrBitmap() to 1, or at most 2 when there's a summarizing index (which is frequently the case).
This feels more logical, cleaner, and has less overhead but supports the same HOT logic.
> Am I understanding correctly that indexedattrs is essentially just
> hotblockingattrs + summarizedattrs? And that this is all meant for
> INDEX_ATTR_BITMAP_INDEXED?
>
> - INJECTION_POINT("heap_update-before-pin", NULL);
> + INJECTION_POINT("simple_heap_update-before-pin", NULL);
>
> Why was this changed in heap_update()?
Oops, that's a mistake. Fixed it.
> --
> nathan
Thanks for your review, v38 attached.
best.
-greg
Attachments:
[text/x-patch] v38-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch (45.3K, 2-v38-0001-Add-tests-to-cover-a-variety-of-heap-HOT-update-.patch)
download | inline diff:
From eb74d10bdb90c35be7c02a7585195af951518ad7 Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 09:28:15 -0400
Subject: [PATCH v38 1/2] Add tests to cover a variety of heap HOT update
behaviors
This commit introduces test infrastructure for verifying Heap-Only Tuple
(HOT) update functionality in PostgreSQL. It provides a baseline for
demonstrating and validating HOT update behavior.
Regression tests:
- Basic HOT vs non-HOT update decisions
- All-or-none property for multiple indexes
- Partial indexes and predicate handling
- BRIN (summarizing) indexes allowing HOT updates
- TOAST column handling with HOT
- Unique constraints behavior
- Multi-column indexes
- Partitioned table HOT updates
Isolation tests:
- HOT chain formation and maintenance
- Concurrent HOT update scenarios
- Index scan behavior with HOT chains
---
src/test/regress/expected/hot_updates.out | 745 ++++++++++++++++++++++
src/test/regress/parallel_schedule | 5 +
src/test/regress/sql/hot_updates.sql | 605 ++++++++++++++++++
3 files changed, 1355 insertions(+)
create mode 100644 src/test/regress/expected/hot_updates.out
create mode 100644 src/test/regress/sql/hot_updates.sql
diff --git a/src/test/regress/expected/hot_updates.out b/src/test/regress/expected/hot_updates.out
new file mode 100644
index 00000000000..273fe3310da
--- /dev/null
+++ b/src/test/regress/expected/hot_updates.out
@@ -0,0 +1,745 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 2. pageinspect extension for HOT chain examination
+-- 3. EXPLAIN to verify index usage after updates
+--
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 3
+(1 row)
+
+-- Dump the HOT chain before VACUUMing
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ t | 0 | (0,1) | normal (1) | (0,4)
+ t | 1 | (0,4) | normal (1) | (0,4)
+(2 rows)
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+ has_chain | chain_position | ctid | lp_flags | t_ctid
+-----------+----------------+-------+------------+--------
+ f | 0 | (0,4) | normal (1) | (0,4)
+(1 row)
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 4 | 3
+(1 row)
+
+-- Verify index was updated (new value findable)
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 150)
+(2 rows)
+
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+ id | indexed_col
+----+-------------
+ 1 | 150
+(1 row)
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+ QUERY PLAN
+---------------------------------------------------
+ Index Scan using hot_test_indexed_idx on hot_test
+ Index Cond: (indexed_col = 100)
+(2 rows)
+
+SELECT id FROM hot_test WHERE indexed_col = 100;
+ id
+----
+(0 rows)
+
+RESET enable_seqscan;
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 1
+(1 row)
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 2
+(1 row)
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+ id | status
+----+--------
+ 1 | active
+(1 row)
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 2
+(1 row)
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 2 | 2
+(1 row)
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+ id | unique_col | data
+----+------------+---------
+ 1 | 100 | updated
+ 2 | 200 | updated
+(2 rows)
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+ERROR: duplicate key value violates unique constraint "hot_test_unique_col_key"
+DETAIL: Key (unique_col)=(100) already exists.
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 4 | 0
+(1 row)
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+ updates | hot
+---------+-----
+ 6 | 2
+(1 row)
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+NOTICE: table "hot_test_partitioned" does not exist, skipping
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) WITH (fillfactor = 50);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) WITH (fillfactor = 50);
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test_part1');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+SELECT * FROM get_hot_count('hot_test_part2');
+ updates | hot
+---------+-----
+ 1 | 1
+(1 row)
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+ id
+----
+ 2
+(1 row)
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test_part1');
+ updates | hot
+---------+-----
+ 2 | 1
+(1 row)
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+ id
+----
+ 1
+(1 row)
+
+-- ============================================================================
+-- Trigger modifications: heap_modify_tuple() and HOT
+-- ============================================================================
+-- Test that we correctly detect when triggers modify indexed columns via
+-- heap_modify_tuple(), even when those columns aren't in the UPDATE's SET clause
+CREATE TABLE hot_trigger_test (
+ id int PRIMARY KEY,
+ triggered_col int,
+ data text
+) WITH (fillfactor = 50);
+CREATE INDEX hot_trigger_idx ON hot_trigger_test(triggered_col);
+-- Create a trigger that modifies an indexed column
+CREATE OR REPLACE FUNCTION modify_triggered_col()
+RETURNS TRIGGER AS $$
+BEGIN
+ NEW.triggered_col = NEW.triggered_col + 1;
+ RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+CREATE TRIGGER before_update_modify
+ BEFORE UPDATE ON hot_trigger_test
+ FOR EACH ROW
+ EXECUTE FUNCTION modify_triggered_col();
+INSERT INTO hot_trigger_test VALUES (1, 100, 'initial');
+SELECT * FROM get_hot_count('hot_trigger_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update only data column, but trigger modifies indexed column
+-- Should NOT be HOT because trigger modified an indexed column
+UPDATE hot_trigger_test SET data = 'updated' WHERE id = 1;
+-- Verify it was NOT a HOT update (indexed column was modified by trigger)
+SELECT * FROM get_hot_count('hot_trigger_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Verify the triggered column was actually modified
+SELECT triggered_col FROM hot_trigger_test WHERE id = 1;
+ triggered_col
+---------------
+ 101
+(1 row)
+
+DROP TABLE hot_trigger_test CASCADE;
+DROP FUNCTION modify_triggered_col();
+-- ============================================================================
+-- JSONB expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed JSONB paths can be HOT updates
+CREATE TABLE hot_jsonb_test (
+ id int PRIMARY KEY,
+ data jsonb
+) WITH (fillfactor = 50);
+-- Create expression index on a specific JSON path
+CREATE INDEX hot_jsonb_name_idx ON hot_jsonb_test ((data->>'name'));
+INSERT INTO hot_jsonb_test VALUES
+ (1, '{"name":"Alice","age":30,"city":"NYC"}'),
+ (2, '{"name":"Bob","age":25,"city":"LA"}');
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update non-indexed JSON path (age) - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{age}', '31') WHERE id = 1;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Update indexed JSON path (name) - should NOT be HOT
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{name}', '"Alice2"') WHERE id = 1;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 2 | 0
+(1 row)
+
+-- Verify index works
+SELECT id FROM hot_jsonb_test WHERE data->>'name' = 'Alice2';
+ id
+----
+ 1
+(1 row)
+
+-- Test jsonb_delete on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = data - 'city' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Test jsonb_insert on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_insert(data, '{country}', '"USA"') WHERE id = 2;
+SELECT * FROM get_hot_count('hot_jsonb_test');
+ updates | hot
+---------+-----
+ 4 | 0
+(1 row)
+
+DROP TABLE hot_jsonb_test;
+-- ============================================================================
+-- XML expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed XML paths can be HOT updates
+CREATE TABLE hot_xml_test (
+ id int PRIMARY KEY,
+ doc xml
+) WITH (fillfactor = 50);
+-- Create expression index on a specific XPath
+CREATE INDEX hot_xml_name_idx ON hot_xml_test ((xpath('/person/name/text()', doc)));
+INSERT INTO hot_xml_test VALUES
+ (1, '<person><name>Alice</name><age>30</age></person>'),
+ (2, '<person><name>Bob</name><age>25</age></person>');
+ERROR: could not identify a comparison function for type xml
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update non-indexed XPath (age) - behavior depends on XML comparison fallback
+-- Full XML value replacement means non-indexed path updates still require index comparison
+UPDATE hot_xml_test SET doc = '<person><name>Alice</name><age>31</age></person>' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update indexed XPath (name) - should NOT be HOT
+UPDATE hot_xml_test SET doc = '<person><name>Alice2</name><age>31</age></person>' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_xml_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Verify index works
+SELECT id FROM hot_xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['Alice2'::text];
+ERROR: operator does not exist: xml[] = text[]
+LINE 1: ..._xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['A...
+ ^
+DETAIL: No operator of that name accepts the given argument types.
+HINT: You might need to add explicit type casts.
+DROP TABLE hot_xml_test;
+-- ============================================================================
+-- GIN indexes and amcomparedatums for JSONB
+-- ============================================================================
+-- Test that GIN indexes can use amcomparedatums to enable HOT when extracted keys match
+CREATE TABLE hot_gin_test (
+ id int PRIMARY KEY,
+ tags text[],
+ properties jsonb
+) WITH (fillfactor = 50);
+-- GIN index on text array
+CREATE INDEX hot_gin_tags_idx ON hot_gin_test USING gin (tags);
+-- GIN index on JSONB (jsonb_ops - keys and values)
+CREATE INDEX hot_gin_props_idx ON hot_gin_test USING gin (properties);
+INSERT INTO hot_gin_test VALUES
+ (1, ARRAY['tag1', 'tag2'], '{"key1":"val1","key2":"val2"}'),
+ (2, ARRAY['tag3', 'tag4'], '{"key3":"val3","key4":"val4"}');
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 0 | 0
+(1 row)
+
+-- Update that changes tag order but not content - after amcomparedatums should be HOT
+-- (GIN extracts same keys, just different order)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1'] WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 1 | 0
+(1 row)
+
+-- Update JSONB value (not key) - after amcomparedatums may be HOT or non-HOT
+-- depending on GIN operator class (jsonb_ops indexes both keys and values)
+UPDATE hot_gin_test SET properties = '{"key1":"val1_new","key2":"val2"}' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 2 | 0
+(1 row)
+
+-- Add new tag - should NOT be HOT (different extracted keys)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1', 'tag5'] WHERE id = 1;
+SELECT * FROM get_hot_count('hot_gin_test');
+ updates | hot
+---------+-----
+ 3 | 0
+(1 row)
+
+-- Verify GIN indexes work
+SELECT id FROM hot_gin_test WHERE tags @> ARRAY['tag5'];
+ id
+----
+ 1
+(1 row)
+
+SELECT id FROM hot_gin_test WHERE properties @> '{"key1":"val1_new"}';
+ id
+----
+ 1
+(1 row)
+
+DROP TABLE hot_gin_test;
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 734da057c34..675eb175059 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -137,6 +137,11 @@ test: event_trigger_login
# this test also uses event triggers, so likewise run it by itself
test: fast_default
+# ----------
+# HOT updates tests
+# ----------
+test: hot_updates
+
# run tablespace test at the end because it drops the tablespace created during
# setup that other tests may use.
test: tablespace
diff --git a/src/test/regress/sql/hot_updates.sql b/src/test/regress/sql/hot_updates.sql
new file mode 100644
index 00000000000..a8894006177
--- /dev/null
+++ b/src/test/regress/sql/hot_updates.sql
@@ -0,0 +1,605 @@
+--
+-- HOT_UPDATES
+-- Test Heap-Only Tuple (HOT) update decisions
+--
+-- This test systematically verifies that HOT updates are used when appropriate
+-- and avoided when necessary (e.g., when indexed columns are modified).
+--
+-- We use multiple validation methods:
+-- 1. Statistics functions (pg_stat_get_tuples_hot_updated)
+-- 2. pageinspect extension for HOT chain examination
+-- 3. EXPLAIN to verify index usage after updates
+--
+
+-- Load required extensions
+CREATE EXTENSION IF NOT EXISTS pageinspect;
+
+-- Function to get HOT update count
+CREATE OR REPLACE FUNCTION get_hot_count(rel_name text)
+RETURNS TABLE (
+ updates BIGINT,
+ hot BIGINT
+) AS $$
+DECLARE
+ rel_oid oid;
+BEGIN
+ rel_oid := rel_name::regclass::oid;
+
+ -- Read both committed and transaction-local stats
+ -- In autocommit mode (default for regression tests), this works correctly
+ -- Note: In explicit transactions (BEGIN/COMMIT), committed stats already
+ -- include flushed updates, so this would double-count. For explicit
+ -- transaction testing, call pg_stat_force_next_flush() before this function.
+ updates := COALESCE(pg_stat_get_tuples_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_updated(rel_oid), 0);
+ hot := COALESCE(pg_stat_get_tuples_hot_updated(rel_oid), 0) +
+ COALESCE(pg_stat_get_xact_tuples_hot_updated(rel_oid), 0);
+
+ RETURN NEXT;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Check if a tuple is part of a HOT chain (has a predecessor on same page)
+CREATE OR REPLACE FUNCTION has_hot_chain(rel_name text, target_ctid tid)
+RETURNS boolean AS $$
+DECLARE
+ block_num int;
+ page_item record;
+BEGIN
+ block_num := (target_ctid::text::point)[0]::int;
+
+ -- Look for a different tuple on the same page that points to our target tuple
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid IS NOT NULL
+ AND t_ctid = target_ctid
+ AND ('(' || block_num::text || ',' || lp::text || ')')::tid != target_ctid
+ LOOP
+ RETURN true;
+ END LOOP;
+
+ RETURN false;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Print the HOT chain starting from a given tuple
+CREATE OR REPLACE FUNCTION print_hot_chain(rel_name text, start_ctid tid)
+RETURNS TABLE(chain_position int, ctid tid, lp_flags text, t_ctid tid, chain_end boolean) AS
+$$
+#variable_conflict use_column
+DECLARE
+ block_num int;
+ line_ptr int;
+ current_ctid tid := start_ctid;
+ next_ctid tid;
+ position int := 0;
+ max_iterations int := 100;
+ page_item record;
+ found_predecessor boolean := false;
+ flags_name text;
+BEGIN
+ block_num := (start_ctid::text::point)[0]::int;
+
+ -- Find the predecessor (old tuple pointing to our start_ctid)
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp_flags = 1
+ AND t_ctid = start_ctid
+ LOOP
+ current_ctid := ('(' || block_num::text || ',' || page_item.lp::text || ')')::tid;
+ found_predecessor := true;
+ EXIT;
+ END LOOP;
+
+ -- If no predecessor found, start with the given ctid
+ IF NOT found_predecessor THEN
+ current_ctid := start_ctid;
+ END IF;
+
+ -- Follow the chain forward
+ WHILE position < max_iterations LOOP
+ line_ptr := (current_ctid::text::point)[1]::int;
+
+ FOR page_item IN
+ SELECT lp, lp_flags, t_ctid
+ FROM heap_page_items(get_raw_page(rel_name, block_num))
+ WHERE lp = line_ptr
+ LOOP
+ -- Map lp_flags to names
+ flags_name := CASE page_item.lp_flags
+ WHEN 0 THEN 'unused (0)'
+ WHEN 1 THEN 'normal (1)'
+ WHEN 2 THEN 'redirect (2)'
+ WHEN 3 THEN 'dead (3)'
+ ELSE 'unknown (' || page_item.lp_flags::text || ')'
+ END;
+
+ RETURN QUERY SELECT
+ position,
+ current_ctid,
+ flags_name,
+ page_item.t_ctid,
+ (page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid)::boolean
+ ;
+
+ IF page_item.t_ctid IS NULL OR page_item.t_ctid = current_ctid THEN
+ RETURN;
+ END IF;
+
+ next_ctid := page_item.t_ctid;
+
+ IF (next_ctid::text::point)[0]::int != block_num THEN
+ RETURN;
+ END IF;
+
+ current_ctid := next_ctid;
+ position := position + 1;
+ END LOOP;
+
+ IF position = 0 THEN
+ RETURN;
+ END IF;
+ END LOOP;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Basic HOT update (update non-indexed column)
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ non_indexed_col text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_indexed_idx ON hot_test(indexed_col);
+
+INSERT INTO hot_test VALUES (1, 100, 'initial');
+INSERT INTO hot_test VALUES (2, 200, 'initial');
+INSERT INTO hot_test VALUES (3, 300, 'initial');
+
+-- Get baseline
+SELECT * FROM get_hot_count('hot_test');
+
+-- Should be HOT updates (only non-indexed column modified)
+UPDATE hot_test SET non_indexed_col = 'updated1' WHERE id = 1;
+UPDATE hot_test SET non_indexed_col = 'updated2' WHERE id = 2;
+UPDATE hot_test SET non_indexed_col = 'updated3' WHERE id = 3;
+
+-- Verify HOT updates occurred
+SELECT * FROM get_hot_count('hot_test');
+
+-- Dump the HOT chain before VACUUMing
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Vacuum the relation, expect the HOT chain to collapse
+VACUUM hot_test;
+
+-- Show that there is no chain after vacuum
+WITH current_tuple AS (
+ SELECT ctid FROM hot_test WHERE id = 1
+)
+SELECT
+ has_hot_chain('hot_test', current_tuple.ctid) AS has_chain,
+ chain_position,
+ print_hot_chain.ctid,
+ lp_flags,
+ t_ctid
+FROM current_tuple,
+LATERAL print_hot_chain('hot_test', current_tuple.ctid);
+
+-- Non-HOT update (update indexed column)
+UPDATE hot_test SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index was updated (new value findable)
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF) SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+SELECT id, indexed_col FROM hot_test WHERE indexed_col = 150;
+
+-- Verify old value no longer in index
+EXPLAIN (COSTS OFF) SELECT id FROM hot_test WHERE indexed_col = 100;
+SELECT id FROM hot_test WHERE indexed_col = 100;
+RESET enable_seqscan;
+
+-- All-or-none property: updating one indexed column requires ALL index updates
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ non_indexed text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_a_idx ON hot_test(col_a);
+CREATE INDEX hot_test_b_idx ON hot_test(col_b);
+CREATE INDEX hot_test_c_idx ON hot_test(col_c);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'initial');
+
+-- Update only col_a - should NOT be HOT because an indexed column changed
+-- This means ALL indexes must be updated (all-or-none property)
+UPDATE hot_test SET col_a = 15 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Now update only non-indexed column - should be HOT
+UPDATE hot_test SET non_indexed = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Partial index: both old and new outside predicate (conservative = non-HOT)
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ status text,
+ data text
+) WITH (fillfactor = 50);
+
+-- Partial index only covers status = 'active'
+CREATE INDEX hot_test_active_idx ON hot_test(status) WHERE status = 'active';
+
+INSERT INTO hot_test VALUES (1, 'active', 'data1');
+INSERT INTO hot_test VALUES (2, 'inactive', 'data2');
+INSERT INTO hot_test VALUES (3, 'deleted', 'data3');
+
+-- Update non-indexed column on 'active' row (in predicate, status unchanged)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated1' WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update non-indexed column on 'inactive' row (outside predicate)
+-- Should be HOT
+UPDATE hot_test SET data = 'updated2' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update status from 'inactive' to 'deleted' (both outside predicate)
+-- PostgreSQL is conservative: heap insert happens before predicate check
+-- So this is NON-HOT even though both values are outside predicate
+UPDATE hot_test SET status = 'deleted' WHERE id = 2;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify index still works for 'active' rows
+SELECT id, status FROM hot_test WHERE status = 'active';
+
+-- Only BRIN (summarizing) indexes on non-PK columns
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ ts timestamp,
+ value int,
+ brin_col int
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ts_brin ON hot_test USING brin(ts);
+CREATE INDEX hot_test_brin_col_brin ON hot_test USING brin(brin_col);
+
+INSERT INTO hot_test VALUES (1, '2024-01-01', 100, 1000);
+
+-- Update both BRIN columns - should still be HOT (only summarizing indexes)
+UPDATE hot_test SET ts = '2024-01-02', brin_col = 2000 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update non-indexed column - should also be HOT
+UPDATE hot_test SET value = 200 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test');
+
+-- TOAST and HOT: TOASTed columns can participate in HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ indexed_col int,
+ large_text text,
+ small_text text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_idx ON hot_test(indexed_col);
+
+-- Insert row with TOASTed column (> 2KB)
+INSERT INTO hot_test VALUES (1, 100, repeat('x', 3000), 'small');
+
+-- Update non-indexed, non-TOASTed column - should be HOT
+UPDATE hot_test SET small_text = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update TOASTed column - should be HOT if indexed column unchanged
+UPDATE hot_test SET large_text = repeat('y', 3000);
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update indexed column - should NOT be HOT
+UPDATE hot_test SET indexed_col = 200;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Unique constraint (unique index) behaves like regular index
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ unique_col int UNIQUE,
+ data text
+) WITH (fillfactor = 50);
+
+INSERT INTO hot_test VALUES (1, 100, 'data1');
+INSERT INTO hot_test VALUES (2, 200, 'data2');
+
+-- Update data (non-indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Verify unique constraint still enforced
+SELECT id, unique_col, data FROM hot_test ORDER BY id;
+
+-- This should fail (unique violation)
+UPDATE hot_test SET unique_col = 100 WHERE id = 2;
+
+-- Multi-column index: any column change = non-HOT
+DROP TABLE hot_test;
+
+CREATE TABLE hot_test (
+ id int PRIMARY KEY,
+ col_a int,
+ col_b int,
+ col_c int,
+ data text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_ab_idx ON hot_test(col_a, col_b);
+
+INSERT INTO hot_test VALUES (1, 10, 20, 30, 'data');
+
+-- Update col_a (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_a = 15;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_a = 10;
+
+-- Update col_b (part of multi-column index) - should NOT be HOT
+UPDATE hot_test SET col_b = 25;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Reset
+UPDATE hot_test SET col_b = 20;
+SELECT * FROM get_hot_count('hot_test');
+
+-- Update col_c (not indexed) - should be HOT
+UPDATE hot_test SET col_c = 35;
+
+-- Update data (not indexed) - should be HOT
+UPDATE hot_test SET data = 'updated';
+SELECT * FROM get_hot_count('hot_test');
+
+-- Partitioned tables: HOT works within partitions
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+
+CREATE TABLE hot_test_partitioned (
+ id int,
+ partition_key int,
+ indexed_col int,
+ data text,
+ PRIMARY KEY (id, partition_key)
+) PARTITION BY RANGE (partition_key);
+
+CREATE TABLE hot_test_part1 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (1) TO (100) WITH (fillfactor = 50);
+CREATE TABLE hot_test_part2 PARTITION OF hot_test_partitioned
+ FOR VALUES FROM (100) TO (200) WITH (fillfactor = 50);
+
+CREATE INDEX hot_test_part_idx ON hot_test_partitioned(indexed_col);
+
+INSERT INTO hot_test_partitioned VALUES (1, 50, 100, 'initial1');
+INSERT INTO hot_test_partitioned VALUES (2, 150, 200, 'initial2');
+
+-- Update in partition 1 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated1' WHERE id = 1;
+
+-- Update in partition 2 (non-indexed column) - should be HOT
+UPDATE hot_test_partitioned SET data = 'updated2' WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_test_part1');
+SELECT * FROM get_hot_count('hot_test_part2');
+
+-- Verify indexes work on partitions
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 100;
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 200;
+
+-- Update indexed column in partition - should NOT be HOT
+UPDATE hot_test_partitioned SET indexed_col = 150 WHERE id = 1;
+SELECT * FROM get_hot_count('hot_test_part1');
+
+-- Verify index was updated
+SELECT id FROM hot_test_partitioned WHERE indexed_col = 150;
+
+-- ============================================================================
+-- Trigger modifications: heap_modify_tuple() and HOT
+-- ============================================================================
+-- Test that we correctly detect when triggers modify indexed columns via
+-- heap_modify_tuple(), even when those columns aren't in the UPDATE's SET clause
+
+CREATE TABLE hot_trigger_test (
+ id int PRIMARY KEY,
+ triggered_col int,
+ data text
+) WITH (fillfactor = 50);
+
+CREATE INDEX hot_trigger_idx ON hot_trigger_test(triggered_col);
+
+-- Create a trigger that modifies an indexed column
+CREATE OR REPLACE FUNCTION modify_triggered_col()
+RETURNS TRIGGER AS $$
+BEGIN
+ NEW.triggered_col = NEW.triggered_col + 1;
+ RETURN NEW;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER before_update_modify
+ BEFORE UPDATE ON hot_trigger_test
+ FOR EACH ROW
+ EXECUTE FUNCTION modify_triggered_col();
+
+INSERT INTO hot_trigger_test VALUES (1, 100, 'initial');
+
+SELECT * FROM get_hot_count('hot_trigger_test');
+
+-- Update only data column, but trigger modifies indexed column
+-- Should NOT be HOT because trigger modified an indexed column
+UPDATE hot_trigger_test SET data = 'updated' WHERE id = 1;
+
+-- Verify it was NOT a HOT update (indexed column was modified by trigger)
+SELECT * FROM get_hot_count('hot_trigger_test');
+
+-- Verify the triggered column was actually modified
+SELECT triggered_col FROM hot_trigger_test WHERE id = 1;
+
+DROP TABLE hot_trigger_test CASCADE;
+DROP FUNCTION modify_triggered_col();
+
+-- ============================================================================
+-- JSONB expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed JSONB paths can be HOT updates
+
+CREATE TABLE hot_jsonb_test (
+ id int PRIMARY KEY,
+ data jsonb
+) WITH (fillfactor = 50);
+
+-- Create expression index on a specific JSON path
+CREATE INDEX hot_jsonb_name_idx ON hot_jsonb_test ((data->>'name'));
+
+INSERT INTO hot_jsonb_test VALUES
+ (1, '{"name":"Alice","age":30,"city":"NYC"}'),
+ (2, '{"name":"Bob","age":25,"city":"LA"}');
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Update non-indexed JSON path (age) - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{age}', '31') WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Update indexed JSON path (name) - should NOT be HOT
+UPDATE hot_jsonb_test SET data = jsonb_set(data, '{name}', '"Alice2"') WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Verify index works
+SELECT id FROM hot_jsonb_test WHERE data->>'name' = 'Alice2';
+
+-- Test jsonb_delete on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = data - 'city' WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+-- Test jsonb_insert on non-indexed path - should be HOT after instrumentation
+UPDATE hot_jsonb_test SET data = jsonb_insert(data, '{country}', '"USA"') WHERE id = 2;
+
+SELECT * FROM get_hot_count('hot_jsonb_test');
+
+DROP TABLE hot_jsonb_test;
+
+-- ============================================================================
+-- XML expression indexes and sub-attribute tracking
+-- ============================================================================
+-- Test that updates to non-indexed XML paths can be HOT updates
+
+CREATE TABLE hot_xml_test (
+ id int PRIMARY KEY,
+ doc xml
+) WITH (fillfactor = 50);
+
+-- Create expression index on a specific XPath
+CREATE INDEX hot_xml_name_idx ON hot_xml_test ((xpath('/person/name/text()', doc)));
+
+INSERT INTO hot_xml_test VALUES
+ (1, '<person><name>Alice</name><age>30</age></person>'),
+ (2, '<person><name>Bob</name><age>25</age></person>');
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Update non-indexed XPath (age) - behavior depends on XML comparison fallback
+-- Full XML value replacement means non-indexed path updates still require index comparison
+UPDATE hot_xml_test SET doc = '<person><name>Alice</name><age>31</age></person>' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Update indexed XPath (name) - should NOT be HOT
+UPDATE hot_xml_test SET doc = '<person><name>Alice2</name><age>31</age></person>' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_xml_test');
+
+-- Verify index works
+SELECT id FROM hot_xml_test WHERE xpath('/person/name/text()', doc) = ARRAY['Alice2'::text];
+
+DROP TABLE hot_xml_test;
+
+-- ============================================================================
+-- GIN indexes and amcomparedatums for JSONB
+-- ============================================================================
+-- Test that GIN indexes can use amcomparedatums to enable HOT when extracted keys match
+
+CREATE TABLE hot_gin_test (
+ id int PRIMARY KEY,
+ tags text[],
+ properties jsonb
+) WITH (fillfactor = 50);
+
+-- GIN index on text array
+CREATE INDEX hot_gin_tags_idx ON hot_gin_test USING gin (tags);
+
+-- GIN index on JSONB (jsonb_ops - keys and values)
+CREATE INDEX hot_gin_props_idx ON hot_gin_test USING gin (properties);
+
+INSERT INTO hot_gin_test VALUES
+ (1, ARRAY['tag1', 'tag2'], '{"key1":"val1","key2":"val2"}'),
+ (2, ARRAY['tag3', 'tag4'], '{"key3":"val3","key4":"val4"}');
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Update that changes tag order but not content - after amcomparedatums should be HOT
+-- (GIN extracts same keys, just different order)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1'] WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Update JSONB value (not key) - after amcomparedatums may be HOT or non-HOT
+-- depending on GIN operator class (jsonb_ops indexes both keys and values)
+UPDATE hot_gin_test SET properties = '{"key1":"val1_new","key2":"val2"}' WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Add new tag - should NOT be HOT (different extracted keys)
+UPDATE hot_gin_test SET tags = ARRAY['tag2', 'tag1', 'tag5'] WHERE id = 1;
+
+SELECT * FROM get_hot_count('hot_gin_test');
+
+-- Verify GIN indexes work
+SELECT id FROM hot_gin_test WHERE tags @> ARRAY['tag5'];
+SELECT id FROM hot_gin_test WHERE properties @> '{"key1":"val1_new"}';
+
+DROP TABLE hot_gin_test;
+
+-- ============================================================================
+-- Cleanup
+-- ============================================================================
+DROP TABLE IF EXISTS hot_test;
+DROP TABLE IF EXISTS hot_test_partitioned CASCADE;
+DROP FUNCTION IF EXISTS has_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS print_hot_chain(text, tid);
+DROP FUNCTION IF EXISTS get_hot_count(text);
+DROP EXTENSION pageinspect;
--
2.51.2
[text/x-patch] v38-0002-Identify-modified-indexed-attributes-in-the-exec.patch (60.5K, 3-v38-0002-Identify-modified-indexed-attributes-in-the-exec.patch)
download | inline diff:
From 5e6e0414192294882ccfbd1c731f12cebf2d507f Mon Sep 17 00:00:00 2001
From: Greg Burd <[email protected]>
Date: Tue, 10 Mar 2026 08:17:31 -0400
Subject: [PATCH v38 2/2] Identify modified indexed attributes in the executor
on UPDATE
Refactor executor update logic to determine which indexed columns have
actually changed during an UPDATE operation rather than leaving this up
to HeapDetermineColumnsInfo() in heap_update(). Finding this set of
attributes is not heap-specific, but more general to all table AMs and
having this information in the executor could inform other decisions
about when index inserts are required and when they are not regardless
of the table AM's MVCC implementation strategy.
The heap-only tuple decision (HOT) in heap functions as it always has,
but the determination of the "modified indexed attributes"
(modified_idx_attrs, formerly known as modified_attrs).
ExecUpdateModifiedIdxAttrs() replaces HeapDetermineColumnsInfo() and is
called before table_tuple_update() crucially without the need for an
exclusive buffer lock on the page that holds the tuple being updated.
This reduces the time the buffer lock is held later within
heapam_tuple_update() and heap_update().
Besides identifying the set of modified indexed attributes
HeapDetermineColumnsInfo() was also partially responsible for the
decision about what to WAL log for the replica identity key. This logic
moved into heap_update() and out of the replacement named
HeapUpdateModifiedIdxAttrs(). Doing this allows for
simple_heap_update() and heapam_tuple_update() to share the same logic
as they both call into heap_update().
Updates stemming from logical replication also use the new
ExecUpdateModifiedIdxAttrs() in ExecSimpleRelationUpdate().
ExecUpdateModifiedIdxAttrs() uses ExecCompareSlotAttrs() to identify
which attributes have changed and then intersects that with the set of
indexed attributes to identify the modified indexed set, the
modified_idx_attrs.
This patch introduces a few helper functions to reduce code duplication
and increase readability: HeapUpdateHotAllowable(),
HeapUpdateDetermineLockmode(). These are used in both heap_update() and
simple_heap_update().
The heap_update() function is called now with lockmode pre-determined
and a boolean indicating if the update allows HOT updates or not, both
const. If during heap_update() the new tuple will fit on the same page
and that boolean is true, the update is HOT. This means that although
the functions and timing of the code involed in HOT decisions have
changed, none of the logic related to when HOT is allowed has changed.
Development of this feature exposed nondeterministic behavior in three
existing tests which have been adjusted to avoid inconsistent test
results due to tuple ordering during heap page scans.
---
src/backend/access/heap/heapam.c | 463 ++++++++++++------
src/backend/access/heap/heapam_handler.c | 31 +-
src/backend/access/table/tableam.c | 5 +-
src/backend/executor/execReplication.c | 9 +-
src/backend/executor/execTuples.c | 70 +++
src/backend/executor/nodeModifyTable.c | 88 +++-
src/backend/utils/cache/relcache.c | 44 +-
src/include/access/heapam.h | 13 +-
src/include/access/tableam.h | 8 +-
src/include/executor/executor.h | 8 +
src/include/utils/rel.h | 2 +-
src/include/utils/relcache.h | 2 +-
.../expected/syscache-update-pruned.out | 12 +-
.../specs/syscache-update-pruned.spec | 6 +-
.../regress/expected/generated_virtual.out | 2 +-
src/test/regress/expected/triggers.out | 16 +-
src/test/regress/expected/tsearch.out | 3 +-
src/test/regress/expected/updatable_views.out | 4 +-
src/test/regress/sql/generated_virtual.sql | 2 +-
src/test/regress/sql/triggers.sql | 4 +-
src/test/regress/sql/tsearch.sql | 3 +-
src/test/regress/sql/updatable_views.sql | 2 +-
22 files changed, 573 insertions(+), 224 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e5bd062de77..dcb8a34b7bf 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -37,21 +37,26 @@
#include "access/multixact.h"
#include "access/subtrans.h"
#include "access/syncscan.h"
+#include "access/sysattr.h"
+#include "access/tableam.h"
#include "access/valid.h"
#include "access/visibilitymap.h"
#include "access/xloginsert.h"
#include "catalog/pg_database.h"
#include "catalog/pg_database_d.h"
#include "commands/vacuum.h"
+#include "executor/tuptable.h"
+#include "nodes/lockoptions.h"
#include "pgstat.h"
#include "port/pg_bitutils.h"
+#include "storage/buf.h"
#include "storage/lmgr.h"
#include "storage/predicate.h"
-#include "storage/proc.h"
#include "storage/procarray.h"
#include "utils/datum.h"
#include "utils/injection_point.h"
#include "utils/inval.h"
+#include "utils/relcache.h"
#include "utils/spccache.h"
#include "utils/syscache.h"
@@ -68,11 +73,8 @@ static void check_lock_if_inplace_updateable_rel(Relation relation,
HeapTuple newtup);
static void check_inplace_rel_lock(HeapTuple oldtup);
#endif
-static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external);
+static Bitmapset *HeapUpdateModifiedIdxAttrs(Relation relation,
+ HeapTuple oldtup, HeapTuple newtup);
static bool heap_acquire_tuplock(Relation relation, const ItemPointerData *tid,
LockTupleMode mode, LockWaitPolicy wait_policy,
bool *have_tuple_lock);
@@ -3312,7 +3314,7 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
* heap_update - replace a tuple
*
* See table_tuple_update() for an explanation of the parameters, except that
- * this routine directly takes a tuple rather than a slot.
+ * this routine directly takes a heap tuple rather than a slot.
*
* In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
@@ -3322,17 +3324,13 @@ simple_heap_delete(Relation relation, const ItemPointerData *tid)
TM_Result
heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed)
{
TM_Result result;
TransactionId xid = GetCurrentTransactionId();
- Bitmapset *hot_attrs;
- Bitmapset *sum_attrs;
- Bitmapset *key_attrs;
- Bitmapset *id_attrs;
- Bitmapset *interesting_attrs;
- Bitmapset *modified_attrs;
+ Bitmapset *idx_attrs,
+ *id_attrs;
ItemId lp;
HeapTupleData oldtup;
HeapTuple heaptup;
@@ -3352,13 +3350,12 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
bool have_tuple_lock = false;
bool iscombo;
bool use_hot_update = false;
- bool summarized_update = false;
bool key_intact;
bool all_visible_cleared = false;
bool all_visible_cleared_new = false;
bool checked_lockers;
bool locker_remains;
- bool id_has_external = false;
+ bool rep_id_key_required = false;
TransactionId xmax_new_tuple,
xmax_old_tuple;
uint16 infomask_old_tuple,
@@ -3389,33 +3386,18 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
#endif
/*
- * Fetch the list of attributes to be checked for various operations.
- *
- * For HOT considerations, this is wasted effort if we fail to update or
- * have to put the new tuple on a different page. But we must compute the
- * list before obtaining buffer lock --- in the worst case, if we are
- * doing an update on one of the relevant system catalogs, we could
- * deadlock if we try to fetch the list later. In any case, the relcache
- * caches the data so this is usually pretty cheap.
- *
- * We also need columns used by the replica identity and columns that are
- * considered the "key" of rows in the table.
+ * Fetch the attributes used across all indexes on this relation as well
+ * as the replica identity and columns.
*
- * Note that we get copies of each bitmap, so we need not worry about
- * relcache flush happening midway through.
- */
- hot_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_HOT_BLOCKING);
- sum_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_SUMMARIZED);
- key_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_KEY);
- id_attrs = RelationGetIndexAttrBitmap(relation,
- INDEX_ATTR_BITMAP_IDENTITY_KEY);
- interesting_attrs = NULL;
- interesting_attrs = bms_add_members(interesting_attrs, hot_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, sum_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, key_attrs);
- interesting_attrs = bms_add_members(interesting_attrs, id_attrs);
+ * Note: We must compute the list before obtaining buffer lock. In the
+ * worst case, if we are doing an update on one of the relevant system
+ * catalogs, we could deadlock if we try to fetch the list later. Keep in
+ * mind that relcache returns copies of each bitmap, so we need not worry
+ * about relcache flush happening midway through, but we do need to free
+ * them.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+ id_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_IDENTITY_KEY);
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin", NULL);
@@ -3469,20 +3451,17 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
tmfd->ctid = *otid;
tmfd->xmax = InvalidTransactionId;
tmfd->cmax = InvalidCommandId;
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
bms_free(id_attrs);
- /* modified_attrs not yet initialized */
- bms_free(interesting_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return TM_Deleted;
}
/*
- * Fill in enough data in oldtup for HeapDetermineColumnsInfo to work
- * properly.
+ * Fill in enough data in oldtup to determine replica identity attribute
+ * requirements.
*/
oldtup.t_tableOid = RelationGetRelid(relation);
oldtup.t_data = (HeapTupleHeader) PageGetItem(page, lp);
@@ -3493,16 +3472,59 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
newtup->t_tableOid = RelationGetRelid(relation);
/*
- * Determine columns modified by the update. Additionally, identify
- * whether any of the unmodified replica identity key attributes in the
- * old tuple is externally stored or not. This is required because for
- * such attributes the flattened value won't be WAL logged as part of the
- * new tuple so we must include it as part of the old_key_tuple. See
- * ExtractReplicaIdentity.
+ * ExtractReplicaIdentity() needs to know if a modified indexed attrbute
+ * is used as a replica indentity or if any of the replica identity
+ * attributes are referenced in an index, unmodified, and are stored
+ * externally in the old tuple being replaced. In those cases it may be
+ * necessary to WAL log them to so they are available to replicas.
*/
- modified_attrs = HeapDetermineColumnsInfo(relation, interesting_attrs,
- id_attrs, &oldtup,
- newtup, &id_has_external);
+ rep_id_key_required = bms_overlap(modified_idx_attrs, id_attrs);
+ if (!rep_id_key_required)
+ {
+ Bitmapset *attrs;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ int attidx = -1;
+
+ /*
+ * Reduce the set under review to only the unmodified indexed replica
+ * identity key attributes. idx_attrs is copied (by bms_difference())
+ * not modified here.
+ */
+ attrs = bms_difference(idx_attrs, modified_idx_attrs);
+ attrs = bms_int_members(attrs, id_attrs);
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /*
+ * attidx is zero-based, attrnum is the normal attribute number
+ */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value;
+ bool isnull;
+
+ /*
+ * System attributes are not added into INDEX_ATTR_BITMAP_INDEXED
+ * bitmap by relcache.
+ */
+ Assert(attrnum > 0);
+
+ value = heap_getattr(&oldtup, attrnum, tupdesc, &isnull);
+
+ /* No need to check attributes that can't be stored externally */
+ if (isnull ||
+ TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
+ continue;
+
+ /* Check if the old tuple's attribute is stored externally */
+ if (VARATT_IS_EXTERNAL((struct varlena *) DatumGetPointer(value)))
+ {
+ rep_id_key_required = true;
+ break;
+ }
+ }
+
+ bms_free(attrs);
+ }
/*
* If we're not updating any "key" column, we can grab a weaker lock type.
@@ -3515,9 +3537,8 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
* is updates that don't manipulate key columns, not those that
* serendipitously arrive at the same key values.
*/
- if (!bms_overlap(modified_attrs, key_attrs))
+ if (lockmode == LockTupleNoKeyExclusive)
{
- *lockmode = LockTupleNoKeyExclusive;
mxact_status = MultiXactStatusNoKeyUpdate;
key_intact = true;
@@ -3534,7 +3555,7 @@ heap_update(Relation relation, const ItemPointerData *otid, HeapTuple newtup,
}
else
{
- *lockmode = LockTupleExclusive;
+ Assert(lockmode == LockTupleExclusive);
mxact_status = MultiXactStatusUpdate;
key_intact = false;
}
@@ -3613,7 +3634,7 @@ l2:
bool current_is_member = false;
if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
- *lockmode, ¤t_is_member))
+ lockmode, ¤t_is_member))
{
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
@@ -3622,7 +3643,7 @@ l2:
* requesting a lock and already have one; avoids deadlock).
*/
if (!current_is_member)
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
/* wait for multixact */
@@ -3707,7 +3728,7 @@ l2:
* lock.
*/
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
+ heap_acquire_tuplock(relation, &(oldtup.t_self), lockmode,
LockWaitBlock, &have_tuple_lock);
XactLockTableWait(xwait, relation, &oldtup.t_self,
XLTW_Update);
@@ -3767,17 +3788,14 @@ l2:
tmfd->cmax = InvalidCommandId;
UnlockReleaseBuffer(buffer);
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
- *update_indexes = TU_None;
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
+
return result;
}
@@ -3807,7 +3825,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, true,
+ xid, lockmode, true,
&xmax_old_tuple, &infomask_old_tuple,
&infomask2_old_tuple);
@@ -3924,7 +3942,7 @@ l2:
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
- xid, *lockmode, false,
+ xid, lockmode, false,
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
&infomask2_lock_old_tuple);
@@ -4097,20 +4115,8 @@ l2:
* to do a HOT update. Check if any of the index columns have been
* changed.
*/
- if (!bms_overlap(modified_attrs, hot_attrs))
- {
+ if (hot_allowed)
use_hot_update = true;
-
- /*
- * If none of the columns that are used in hot-blocking indexes
- * were updated, we can apply HOT, but we do still need to check
- * if we need to update the summarizing indexes, and update those
- * indexes if the columns were updated, or we may fail to detect
- * e.g. value bound changes in BRIN minmax indexes.
- */
- if (bms_overlap(modified_attrs, sum_attrs))
- summarized_update = true;
- }
}
else
{
@@ -4126,8 +4132,7 @@ l2:
* columns are modified or it has external data.
*/
old_key_tuple = ExtractReplicaIdentity(relation, &oldtup,
- bms_overlap(modified_attrs, id_attrs) ||
- id_has_external,
+ rep_id_key_required,
&old_key_copied);
/* NO EREPORT(ERROR) from here till changes are logged */
@@ -4256,7 +4261,7 @@ l2:
* Release the lmgr tuple lock, if we had it.
*/
if (have_tuple_lock)
- UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
+ UnlockTupleTuplock(relation, &(oldtup.t_self), lockmode);
pgstat_count_heap_update(relation, use_hot_update, newbuf != buffer);
@@ -4270,31 +4275,12 @@ l2:
heap_freetuple(heaptup);
}
- /*
- * If it is a HOT update, the update may still need to update summarized
- * indexes, lest we fail to update those summaries and get incorrect
- * results (for example, minmax bounds of the block may change with this
- * update).
- */
- if (use_hot_update)
- {
- if (summarized_update)
- *update_indexes = TU_Summarizing;
- else
- *update_indexes = TU_None;
- }
- else
- *update_indexes = TU_All;
-
if (old_key_tuple != NULL && old_key_copied)
heap_freetuple(old_key_tuple);
- bms_free(hot_attrs);
- bms_free(sum_attrs);
- bms_free(key_attrs);
bms_free(id_attrs);
- bms_free(modified_attrs);
- bms_free(interesting_attrs);
+ bms_free(idx_attrs);
+ /* modified_idx_attrs is owned by the caller, don't free it */
return TM_Ok;
}
@@ -4467,28 +4453,115 @@ heap_attr_equals(TupleDesc tupdesc, int attrnum, Datum value1, Datum value2,
}
/*
- * Check which columns are being updated.
- *
- * Given an updated tuple, determine (and return into the output bitmapset),
- * from those listed as interesting, the set of columns that changed.
- *
- * has_external indicates if any of the unmodified attributes (from those
- * listed as interesting) of the old tuple is a member of external_cols and is
- * stored externally.
+ * HOT updates are possible when either: a) there are no modified indexed
+ * attributes, or b) the modified attributes are all on summarizing indexes.
+ * Later, in heap_update(), we can choose to perform a HOT update if there is
+ * space on the page for the new tuple and the following code has determined
+ * that HOT is allowed.
+ */
+bool
+HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only)
+{
+ bool hot_allowed;
+
+ /*
+ * Let's be optimistic and start off by assuming the best case, no indexes
+ * need updating and HOT is allowable.
+ */
+ hot_allowed = true;
+ *summarized_only = false;
+
+ /*
+ * Check for case (a); when there are no modified index attributes HOT is
+ * allowed.
+ */
+ if (bms_is_empty(modified_idx_attrs))
+ hot_allowed = true;
+ else
+ {
+ Bitmapset *sum_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_SUMMARIZED);
+
+ /*
+ * At least one index attribute was modified, but is this case (b)
+ * where all the modified index attributes are only used by
+ * summarizing indexes? If it is, then we need to update those
+ * indexes, but this update can still be considered heap-only (HOT)
+ * and avoid updating any non-summarizing indexes on the relation.
+ */
+ if (bms_is_subset(modified_idx_attrs, sum_attrs))
+ {
+ hot_allowed = true;
+ *summarized_only = true;
+ }
+ else
+ {
+ /*
+ * Now we know a) one or more indexed attributes were modified
+ * (changed value, not just referenced within the UPDATE) and that
+ * b) at least one of those attributes is used by a
+ * non-summarizing index. HOT is not allowed.
+ */
+ hot_allowed = false;
+ }
+
+ bms_free(sum_attrs);
+ }
+
+ return hot_allowed;
+}
+
+/*
+ * If we're not updating any attributes used when forming the index keys we can
+ * grab a weaker lock type. This allows for more concurrency when we are
+ * running simultaneously with foreign key checks.
+ */
+LockTupleMode
+HeapUpdateDetermineLockmode(Relation relation, const Bitmapset *modified_idx_attrs)
+{
+ LockTupleMode lockmode = LockTupleExclusive;
+
+ Bitmapset *key_attrs = RelationGetIndexAttrBitmap(relation,
+ INDEX_ATTR_BITMAP_KEY);
+
+ if (!bms_overlap(modified_idx_attrs, key_attrs))
+ lockmode = LockTupleNoKeyExclusive;
+
+ bms_free(key_attrs);
+
+ return lockmode;
+}
+
+/*
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
*/
static Bitmapset *
-HeapDetermineColumnsInfo(Relation relation,
- Bitmapset *interesting_cols,
- Bitmapset *external_cols,
- HeapTuple oldtup, HeapTuple newtup,
- bool *has_external)
+HeapUpdateModifiedIdxAttrs(Relation relation, HeapTuple oldtup, HeapTuple newtup)
{
int attidx;
- Bitmapset *modified = NULL;
+ Bitmapset *attrs,
+ *modified_idx_attrs = NULL;
TupleDesc tupdesc = RelationGetDescr(relation);
+ /* Get the set of all attributes across all indexes for this relation */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /* No indexed attributes, we're done */
+ if (bms_is_empty(attrs))
+ return NULL;
+
+ /*
+ * This heap update function is used outside the executor and so unlike
+ * heapam_tuple_update() where there is ResultRelInfo and EState to
+ * provide the concise set of attributes that might have been modified
+ * (via ExecGetAllUpdatedCols()) we simply check all indexed attributes to
+ * find the subset that changed value. That's the "modified indexed
+ * attributes" or "modified_idx_attrs".
+ */
attidx = -1;
- while ((attidx = bms_next_member(interesting_cols, attidx)) >= 0)
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
{
/* attidx is zero-based, attrnum is the normal attribute number */
AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
@@ -4504,7 +4577,7 @@ HeapDetermineColumnsInfo(Relation relation,
*/
if (attrnum == 0)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
@@ -4517,7 +4590,7 @@ HeapDetermineColumnsInfo(Relation relation,
{
if (attrnum != TableOidAttributeNumber)
{
- modified = bms_add_member(modified, attidx);
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
continue;
}
}
@@ -4533,29 +4606,12 @@ HeapDetermineColumnsInfo(Relation relation,
if (!heap_attr_equals(tupdesc, attrnum, value1,
value2, isnull1, isnull2))
- {
- modified = bms_add_member(modified, attidx);
- continue;
- }
-
- /*
- * No need to check attributes that can't be stored externally. Note
- * that system attributes can't be stored externally.
- */
- if (attrnum < 0 || isnull1 ||
- TupleDescCompactAttr(tupdesc, attrnum - 1)->attlen != -1)
- continue;
-
- /*
- * Check if the old tuple's attribute is stored externally and is a
- * member of external_cols.
- */
- if (VARATT_IS_EXTERNAL((varlena *) DatumGetPointer(value1)) &&
- bms_is_member(attidx, external_cols))
- *has_external = true;
+ modified_idx_attrs = bms_add_member(modified_idx_attrs, attidx);
}
- return modified;
+ bms_free(attrs);
+
+ return modified_idx_attrs;
}
/*
@@ -4573,11 +4629,106 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
TM_Result result;
TM_FailureData tmfd;
LockTupleMode lockmode;
+ TupleTableSlot *slot;
+ BufferHeapTupleTableSlot *bslot;
+ HeapTuple oldtup;
+ bool shouldFree = true;
+ Bitmapset *idx_attrs,
+ *modified_idx_attrs;
+ bool hot_allowed,
+ summarized_only;
+ Buffer buffer;
- result = heap_update(relation, otid, tup,
- GetCurrentCommandId(true), InvalidSnapshot,
- true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ Assert(ItemPointerIsValid(otid));
+
+ /*
+ * Fetch this bitmap of interesting attributes from relcache before
+ * obtaining a buffer lock because if we are doing an update on one of the
+ * relevant system catalogs we could deadlock if we try to fetch them
+ * later on. Relcache will return copies of each bitmap, so we need not
+ * worry about relcache flush happening midway through this operation.
+ */
+ idx_attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ INJECTION_POINT("simple_heap_update-before-pin", NULL);
+
+ /*
+ * To update a heap tuple we need to find the set of modified indexed
+ * attributes ("modified_idx_attrs") and use that to determine if a HOT
+ * update is allowable or not. When updating heap tuples via execution of
+ * UPDATE statements this set is constructed before calling into the table
+ * AM's update function by ExecUpdateModifiedIdxAttrs() which compares the
+ * old/new TupleTableSlots.
+ *
+ * Here things are a bit different, we have the old TID and the new tuple,
+ * not two TupleTableSlots, but we still need to construct a similar
+ * bitmap so as to be able to know if HOT updates are allowed or not.
+ *
+ * To do that we first have to fetch the old tuple itself, but because
+ * heapam_fetch_row_version() is static, we replicate in part that code
+ * here.
+ *
+ * This is a bit repetitive because heap_update() will again find and form
+ * the old HeapTuple from the old TID and in most cases the callers
+ * (ignoring extensions, are always catalog tuple updates) already had the
+ * set of changed attributes (the "replaces" array), but for now this
+ * minor repetition of work is necessary.
+ */
+ slot = MakeTupleTableSlot(RelationGetDescr(relation), &TTSOpsBufferHeapTuple, 0);
+ bslot = (BufferHeapTupleTableSlot *) slot;
+
+ /*
+ * Set the TID in the slot and then fetch the old tuple so we can examine
+ * it
+ */
+ bslot->base.tupdata.t_self = *otid;
+ if (!heap_fetch(relation, SnapshotAny, &bslot->base.tupdata, &buffer, false))
+ {
+ /*
+ * heap_update() checks for !ItemIdIsNormal(lp) and will return false
+ * in those cases.
+ */
+ Assert(RelationSupportsSysCache(RelationGetRelid(relation)));
+
+ *update_indexes = TU_None;
+
+ /* modified_idx_attrs not yet initialized */
+ bms_free(idx_attrs);
+ ExecDropSingleTupleTableSlot(slot);
+
+ elog(ERROR, "tuple concurrently deleted");
+
+ return;
+ }
+
+ Assert(buffer != InvalidBuffer);
+
+ /* Store in slot, transferring existing pin */
+ ExecStorePinnedBufferHeapTuple(&bslot->base.tupdata, slot, buffer);
+ oldtup = ExecFetchSlotHeapTuple(slot, false, &shouldFree);
+
+ modified_idx_attrs = HeapUpdateModifiedIdxAttrs(relation, oldtup, tup);
+ lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+
+ result = heap_update(relation, otid, tup, GetCurrentCommandId(true),
+ InvalidSnapshot, true /* wait for commit */ ,
+ &tmfd, lockmode, modified_idx_attrs, hot_allowed);
+
+ if (shouldFree)
+ heap_freetuple(oldtup);
+
+ ExecDropSingleTupleTableSlot(slot);
+ bms_free(idx_attrs);
+
+ /*
+ * Decide whether new index entries are needed for the tuple
+ *
+ * If the update is not HOT, we must update all indexes. If the update is
+ * HOT, it could be that we updated summarized columns, so we either
+ * update only summarized indexes, or none at all.
+ */
+ *update_indexes = TU_None;
switch (result)
{
case TM_SelfModified:
@@ -4587,6 +4738,10 @@ simple_heap_update(Relation relation, const ItemPointerData *otid, HeapTuple tup
case TM_Ok:
/* done successfully */
+ if (!HeapTupleIsHeapOnly(tup))
+ *update_indexes = TU_All;
+ else if (summarized_only)
+ *update_indexes = TU_Summarizing;
break;
case TM_Updated:
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index 253a735b6c1..3726c867c65 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -27,7 +27,6 @@
#include "access/syncscan.h"
#include "access/tableam.h"
#include "access/tsmapi.h"
-#include "access/visibilitymap.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/index.h"
@@ -325,19 +324,26 @@ heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid,
static TM_Result
heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
- bool wait, TM_FailureData *tmfd,
- LockTupleMode *lockmode, TU_UpdateIndexes *update_indexes)
+ bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
bool shouldFree = true;
HeapTuple tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+ bool hot_allowed;
+ bool summarized_only;
TM_Result result;
+ Assert(ItemPointerIsValid(otid));
+
+ hot_allowed = HeapUpdateHotAllowable(relation, modified_idx_attrs, &summarized_only);
+ *lockmode = HeapUpdateDetermineLockmode(relation, modified_idx_attrs);
+
/* Update the tuple with table oid */
slot->tts_tableOid = RelationGetRelid(relation);
tuple->t_tableOid = slot->tts_tableOid;
result = heap_update(relation, otid, tuple, cid, crosscheck, wait,
- tmfd, lockmode, update_indexes);
+ tmfd, *lockmode, modified_idx_attrs, hot_allowed);
ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
/*
@@ -350,16 +356,17 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot,
* HOT, it could be that we updated summarized columns, so we either
* update only summarized indexes, or none at all.
*/
- if (result != TM_Ok)
+ *update_indexes = TU_None;
+ if (result == TM_Ok)
{
- Assert(*update_indexes == TU_None);
- *update_indexes = TU_None;
+ if (HeapTupleIsHeapOnly(tuple))
+ {
+ if (summarized_only)
+ *update_indexes = TU_Summarizing;
+ }
+ else
+ *update_indexes = TU_All;
}
- else if (!HeapTupleIsHeapOnly(tuple))
- Assert(*update_indexes == TU_All);
- else
- Assert((*update_indexes == TU_Summarizing) ||
- (*update_indexes == TU_None));
if (shouldFree)
pfree(tuple);
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index dfda1af412e..9ba72d51dfa 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -359,6 +359,7 @@ void
simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot,
Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes)
{
TM_Result result;
@@ -369,7 +370,9 @@ simple_table_tuple_update(Relation rel, ItemPointer otid,
GetCurrentCommandId(true),
snapshot, InvalidSnapshot,
true /* wait for commit */ ,
- &tmfd, &lockmode, update_indexes);
+ &tmfd, &lockmode,
+ modified_idx_attrs,
+ update_indexes);
switch (result)
{
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 2497ee7edc5..8a269dd2f6c 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -33,6 +33,7 @@
#include "utils/builtins.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
+#include "utils/relcache.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
@@ -906,6 +907,7 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
bool skip_tuple = false;
Relation rel = resultRelInfo->ri_RelationDesc;
ItemPointer tid = &(searchslot->tts_tid);
+ Bitmapset *modified_idx_attrs;
/*
* We support only non-system tables, with
@@ -944,8 +946,13 @@ ExecSimpleRelationUpdate(ResultRelInfo *resultRelInfo,
if (rel->rd_rel->relispartition)
ExecPartitionCheck(resultRelInfo, slot, estate, true);
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo,
+ searchslot, slot);
+
simple_table_tuple_update(rel, tid, slot, estate->es_snapshot,
- &update_indexes);
+ modified_idx_attrs, &update_indexes);
+ bms_free(modified_idx_attrs);
+
conflictindexes = resultRelInfo->ri_onConflictArbiterIndexes;
diff --git a/src/backend/executor/execTuples.c b/src/backend/executor/execTuples.c
index 9d900147a55..19054109a77 100644
--- a/src/backend/executor/execTuples.c
+++ b/src/backend/executor/execTuples.c
@@ -66,6 +66,7 @@
#include "nodes/nodeFuncs.h"
#include "storage/bufmgr.h"
#include "utils/builtins.h"
+#include "utils/datum.h"
#include "utils/expandeddatum.h"
#include "utils/lsyscache.h"
#include "utils/typcache.h"
@@ -2005,6 +2006,75 @@ ExecFetchSlotHeapTupleDatum(TupleTableSlot *slot)
return ret;
}
+/*
+ * ExecCompareSlotAttrs
+ *
+ * Compare the subset of attributes in attrs bewtween TupleTableSlots to detect
+ * which attributes have changed.
+ *
+ * Returns a reused when possible Bitmapset of attribute indices (using
+ * FirstLowInvalidHeapAttributeNumber convention) that differ between the two
+ * slots.
+ */
+Bitmapset *
+ExecCompareSlotAttrs(Bitmapset *attrs, TupleDesc tupdesc,
+ TupleTableSlot *s1, TupleTableSlot *s2)
+{
+ int attidx = -1;
+
+ while ((attidx = bms_next_member(attrs, attidx)) >= 0)
+ {
+ /* attidx is zero-based, attrnum is the normal attribute number */
+ AttrNumber attrnum = attidx + FirstLowInvalidHeapAttributeNumber;
+ Datum value1,
+ value2;
+ bool null1,
+ null2;
+ CompactAttribute *att;
+
+ /*
+ * If it's a whole-tuple reference, say "not equal". It's not really
+ * worth supporting this case, since it could only succeed after a
+ * no-op update, which is hardly a case worth optimizing for.
+ */
+ if (attrnum == 0)
+ continue;
+
+ /*
+ * Likewise, automatically say "not equal" for any system attribute
+ * other than tableOID; we cannot expect these to be consistent in a
+ * HOT chain, or even to be set correctly yet in the new tuple.
+ */
+ if (attrnum < 0)
+ {
+ if (attrnum == TableOidAttributeNumber)
+ attrs = bms_del_member(attrs, attidx);
+ else
+ continue;
+ }
+
+ att = TupleDescCompactAttr(tupdesc, attrnum - 1);
+ value1 = slot_getattr(s1, attrnum, &null1);
+ value2 = slot_getattr(s2, attrnum, &null2);
+
+ /* A change to/from NULL, so not equal */
+ if (null1 != null2)
+ continue;
+
+ /* Both NULL, no change/unmodified */
+ if (null2)
+ {
+ attrs = bms_del_member(attrs, attidx);
+ continue;
+ }
+
+ if (datum_image_eq(value1, value2, att->attbyval, att->attlen))
+ attrs = bms_del_member(attrs, attidx);
+ }
+
+ return attrs;
+}
+
/* ----------------------------------------------------------------
* convenience initialization routines
* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..4c0c5a03026 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -17,6 +17,7 @@
* ExecModifyTable - retrieve the next tuple from the node
* ExecEndModifyTable - shut down the ModifyTable node
* ExecReScanModifyTable - rescan the ModifyTable node
+ * ExecUpdateModifiedIdxAttrs - find set of updated indexed columns
*
* NOTES
* The ModifyTable node receives input from its outerPlan, which is
@@ -55,6 +56,7 @@
#include "access/htup_details.h"
#include "access/tableam.h"
#include "access/tupconvert.h"
+#include "access/tupdesc.h"
#include "access/xact.h"
#include "commands/trigger.h"
#include "executor/execPartition.h"
@@ -190,6 +192,63 @@ static TupleTableSlot *ExecMergeNotMatched(ModifyTableContext *context,
ResultRelInfo *resultRelInfo,
bool canSetTag);
+/*
+ * ExecUpdateModifiedIdxAttrs
+ *
+ * Find the set of attributes referenced by this relation and used in this
+ * UPDATE that now differ in value. This is done by reviewing slot datum that
+ * are in the UPDATE statment and are known to be referenced by at least one
+ * index in some way. This set is called the "modified indexed attributes" or
+ * "modified_idx_attrs". An overlap of a single index's attributes and this
+ * modified_idx_attrs set signals that the attributes in the new_tts used to
+ * form the index datum have changed.
+ *
+ * Return a Bitmapset that contains the set of modified (changed) indexed
+ * attributes between oldtup and newtup.
+ *
+ * Note: There is a similar function called HeapUpdateModifiedIdxAttrs() that operates
+ * on the old TID and new HeapTuple rather than the old/new TupleTableSlots as
+ * this function does. These two functions should mirror one another until
+ * someday when catalog tuple updates track their changes avoiding the need to
+ * re-discover them in simple_heap_update().
+ */
+Bitmapset *
+ExecUpdateModifiedIdxAttrs(ResultRelInfo *resultRelInfo,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts)
+{
+ Relation relation = resultRelInfo->ri_RelationDesc;
+ TupleDesc tupdesc = RelationGetDescr(relation);
+ Bitmapset *attrs;
+
+ /* If no indexes, we're done */
+ if (resultRelInfo->ri_NumIndices == 0)
+ return NULL;
+
+ /*
+ * Get the set of all attributes across all indexes for this relation from
+ * the relcache, it returns us a copy of the bitmap so we can modify it.
+ *
+ * Note: We intentionally scan all indexed columns when looking for
+ * changes rather than reduce that set by intersecting it with
+ * ExecGetAllUpdatedCols(). Desipte the name it provides the set of
+ * targeted attributes in the SQL used for the UPDATE and any triggers,
+ * but that doesn't include any attributes updated using
+ * heap_modifiy_tuple(). There is one test in tsearch.sql that does just
+ * that, modifies an indexed attribute that isn't specified in the SQL and
+ * so isn't present in that bitmapset.
+ */
+ attrs = RelationGetIndexAttrBitmap(relation, INDEX_ATTR_BITMAP_INDEXED);
+
+ /*
+ * When there are indexed attributes mentioned in the UPDATE then we need
+ * to find the subset that changed value. That's the
+ * "modified_idx_attrs".
+ */
+ attrs = ExecCompareSlotAttrs(attrs, tupdesc, old_tts, new_tts);
+
+ return attrs;
+}
/*
* Verify that the tuples to be produced by INSERT match the
@@ -2197,14 +2256,17 @@ ExecUpdatePrepareSlot(ResultRelInfo *resultRelInfo,
*/
static TM_Result
ExecUpdateAct(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
- ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *slot,
- bool canSetTag, UpdateContext *updateCxt)
+ ItemPointer tupleid, HeapTuple oldtuple, TupleTableSlot *oldSlot,
+ TupleTableSlot *slot, bool canSetTag, UpdateContext *updateCxt)
{
EState *estate = context->estate;
Relation resultRelationDesc = resultRelInfo->ri_RelationDesc;
bool partition_constraint_failed;
TM_Result result;
+ /* The set of modified indexed attributes that trigger new index entries */
+ Bitmapset *modified_idx_attrs = NULL;
+
updateCxt->crossPartUpdate = false;
/*
@@ -2321,7 +2383,16 @@ lreplace:
ExecConstraints(resultRelInfo, slot, estate);
/*
- * replace the heap tuple
+ * Next up we need to find out the set of indexed attributes that have
+ * changed in value and should trigger a new index tuple. We could start
+ * with the set of updated columns via ExecGetUpdatedCols(), but if we do
+ * we will overlook attributes directly modified by heap_modify_tuple()
+ * which are not known to ExecGetUpdatedCols().
+ */
+ modified_idx_attrs = ExecUpdateModifiedIdxAttrs(resultRelInfo, oldSlot, slot);
+
+ /*
+ * Call into the table AM to update the heap tuple.
*
* Note: if es_crosscheck_snapshot isn't InvalidSnapshot, we check that
* the row to be updated is visible to that snapshot, and throw a
@@ -2335,6 +2406,7 @@ lreplace:
estate->es_crosscheck_snapshot,
true /* wait for commit */ ,
&context->tmfd, &updateCxt->lockmode,
+ modified_idx_attrs,
&updateCxt->updateIndexes);
return result;
@@ -2557,8 +2629,8 @@ ExecUpdate(ModifyTableContext *context, ResultRelInfo *resultRelInfo,
*/
redo_act:
lockedtid = *tupleid;
- result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, slot,
- canSetTag, &updateCxt);
+ result = ExecUpdateAct(context, resultRelInfo, tupleid, oldtuple, oldSlot,
+ slot, canSetTag, &updateCxt);
/*
* If ExecUpdateAct reports that a cross-partition update was done,
@@ -3408,8 +3480,8 @@ lmerge_matched:
Assert(oldtuple == NULL);
result = ExecUpdateAct(context, resultRelInfo, tupleid,
- NULL, newslot, canSetTag,
- &updateCxt);
+ NULL, resultRelInfo->ri_oldTupleSlot,
+ newslot, canSetTag, &updateCxt);
/*
* As in ExecUpdate(), if ExecUpdateAct() reports that a
@@ -4546,7 +4618,7 @@ ExecModifyTable(PlanState *pstate)
* For UPDATE/DELETE/MERGE, fetch the row identity info for the tuple
* to be updated/deleted/merged. For a heap relation, that's a TID;
* otherwise we may have a wholerow junk attr that carries the old
- * tuple in toto. Keep this in step with the part of
+ * tuple in total. Keep this in step with the part of
* ExecInitModifyTable that sets up ri_RowIdAttNo.
*/
if (operation == CMD_UPDATE || operation == CMD_DELETE ||
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 3a4f19e8d58..f2b7fb8f444 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -2469,7 +2469,7 @@ RelationDestroyRelation(Relation relation, bool remember_tupdesc)
bms_free(relation->rd_keyattr);
bms_free(relation->rd_pkattr);
bms_free(relation->rd_idattr);
- bms_free(relation->rd_hotblockingattr);
+ bms_free(relation->rd_indexedattr);
bms_free(relation->rd_summarizedattr);
if (relation->rd_pubdesc)
pfree(relation->rd_pubdesc);
@@ -5271,8 +5271,8 @@ RelationGetIndexPredicate(Relation relation)
* (beware: even if PK is deferrable!)
* INDEX_ATTR_BITMAP_IDENTITY_KEY Columns in the table's replica identity
* index (empty if FULL)
- * INDEX_ATTR_BITMAP_HOT_BLOCKING Columns that block updates from being HOT
- * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
+ * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
+ * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
*
* Attribute numbers are offset by FirstLowInvalidHeapAttributeNumber so that
* we can include system attributes (e.g., OID) in the bitmap representation.
@@ -5295,8 +5295,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
Bitmapset *uindexattrs; /* columns in unique indexes */
Bitmapset *pkindexattrs; /* columns in the primary index */
Bitmapset *idindexattrs; /* columns in the replica identity */
- Bitmapset *hotblockingattrs; /* columns with HOT blocking indexes */
- Bitmapset *summarizedattrs; /* columns with summarizing indexes */
+ Bitmapset *indexedattrs; /* columns referenced by indexes */
+ Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
List *indexoidlist;
List *newindexoidlist;
Oid relpkindex;
@@ -5315,8 +5315,8 @@ RelationGetIndexAttrBitmap(Relation relation, IndexAttrBitmapKind attrKind)
return bms_copy(relation->rd_pkattr);
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return bms_copy(relation->rd_idattr);
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return bms_copy(relation->rd_hotblockingattr);
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return bms_copy(relation->rd_indexedattr);
case INDEX_ATTR_BITMAP_SUMMARIZED:
return bms_copy(relation->rd_summarizedattr);
default:
@@ -5361,7 +5361,7 @@ restart:
uindexattrs = NULL;
pkindexattrs = NULL;
idindexattrs = NULL;
- hotblockingattrs = NULL;
+ indexedattrs = NULL;
summarizedattrs = NULL;
foreach(l, indexoidlist)
{
@@ -5421,7 +5421,7 @@ restart:
if (indexDesc->rd_indam->amsummarizing)
attrs = &summarizedattrs;
else
- attrs = &hotblockingattrs;
+ attrs = &indexedattrs;
/* Collect simple attribute references */
for (i = 0; i < indexDesc->rd_index->indnatts; i++)
@@ -5430,9 +5430,9 @@ restart:
/*
* Since we have covering indexes with non-key columns, we must
- * handle them accurately here. non-key columns must be added into
- * hotblockingattrs or summarizedattrs, since they are in index,
- * and update shouldn't miss them.
+ * handle them accurately here. Non-key columns must be added into
+ * indexedattrs or summarizedattrs, since they are in index, and
+ * update shouldn't miss them.
*
* Summarizing indexes do not block HOT, but do need to be updated
* when the column value changes, thus require a separate
@@ -5493,12 +5493,20 @@ restart:
bms_free(uindexattrs);
bms_free(pkindexattrs);
bms_free(idindexattrs);
- bms_free(hotblockingattrs);
+ bms_free(indexedattrs);
bms_free(summarizedattrs);
goto restart;
}
+ /*
+ * Record what attributes are only referenced by summarizing indexes. Then
+ * add that into the other indexed attributes to track all referenced
+ * attributes.
+ */
+ summarizedattrs = bms_del_members(summarizedattrs, indexedattrs);
+ indexedattrs = bms_add_members(indexedattrs, summarizedattrs);
+
/* Don't leak the old values of these bitmaps, if any */
relation->rd_attrsvalid = false;
bms_free(relation->rd_keyattr);
@@ -5507,8 +5515,8 @@ restart:
relation->rd_pkattr = NULL;
bms_free(relation->rd_idattr);
relation->rd_idattr = NULL;
- bms_free(relation->rd_hotblockingattr);
- relation->rd_hotblockingattr = NULL;
+ bms_free(relation->rd_indexedattr);
+ relation->rd_indexedattr = NULL;
bms_free(relation->rd_summarizedattr);
relation->rd_summarizedattr = NULL;
@@ -5523,7 +5531,7 @@ restart:
relation->rd_keyattr = bms_copy(uindexattrs);
relation->rd_pkattr = bms_copy(pkindexattrs);
relation->rd_idattr = bms_copy(idindexattrs);
- relation->rd_hotblockingattr = bms_copy(hotblockingattrs);
+ relation->rd_indexedattr = bms_copy(indexedattrs);
relation->rd_summarizedattr = bms_copy(summarizedattrs);
relation->rd_attrsvalid = true;
MemoryContextSwitchTo(oldcxt);
@@ -5537,8 +5545,8 @@ restart:
return pkindexattrs;
case INDEX_ATTR_BITMAP_IDENTITY_KEY:
return idindexattrs;
- case INDEX_ATTR_BITMAP_HOT_BLOCKING:
- return hotblockingattrs;
+ case INDEX_ATTR_BITMAP_INDEXED:
+ return indexedattrs;
case INDEX_ATTR_BITMAP_SUMMARIZED:
return summarizedattrs;
default:
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 305ecc31a9e..31a688cf05b 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -403,10 +403,9 @@ extern TM_Result heap_delete(Relation relation, const ItemPointerData *tid,
extern void heap_finish_speculative(Relation relation, const ItemPointerData *tid);
extern void heap_abort_speculative(Relation relation, const ItemPointerData *tid);
extern TM_Result heap_update(Relation relation, const ItemPointerData *otid,
- HeapTuple newtup,
- CommandId cid, Snapshot crosscheck, bool wait,
- TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes);
+ HeapTuple newtup, CommandId cid, Snapshot crosscheck, bool wait,
+ TM_FailureData *tmfd, const LockTupleMode lockmode,
+ const Bitmapset *modified_idx_attrs, const bool hot_allowed);
extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
CommandId cid, LockTupleMode mode, LockWaitPolicy wait_policy,
bool follow_updates,
@@ -469,6 +468,12 @@ extern void log_heap_prune_and_freeze(Relation relation, Buffer buffer,
OffsetNumber *dead, int ndead,
OffsetNumber *unused, int nunused);
+/* in heap/heapam.c */
+extern bool HeapUpdateHotAllowable(Relation relation, const Bitmapset *modified_idx_attrs,
+ bool *summarized_only);
+extern LockTupleMode HeapUpdateDetermineLockmode(Relation relation,
+ const Bitmapset *modified_idx_attrs);
+
/* in heap/vacuumlazy.c */
extern void heap_vacuum_rel(Relation rel,
const VacuumParams params, BufferAccessStrategy bstrategy);
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 06084752245..8ec20dcfc11 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -549,6 +549,7 @@ typedef struct TableAmRoutine
bool wait,
TM_FailureData *tmfd,
LockTupleMode *lockmode,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
/* see table_tuple_lock() for reference about parameters */
@@ -1523,12 +1524,12 @@ static inline TM_Result
table_tuple_update(Relation rel, ItemPointer otid, TupleTableSlot *slot,
CommandId cid, Snapshot snapshot, Snapshot crosscheck,
bool wait, TM_FailureData *tmfd, LockTupleMode *lockmode,
- TU_UpdateIndexes *update_indexes)
+ const Bitmapset *modified_idx_attrs, TU_UpdateIndexes *update_indexes)
{
return rel->rd_tableam->tuple_update(rel, otid, slot,
cid, snapshot, crosscheck,
- wait, tmfd,
- lockmode, update_indexes);
+ wait, tmfd, lockmode,
+ modified_idx_attrs, update_indexes);
}
/*
@@ -2009,6 +2010,7 @@ extern void simple_table_tuple_delete(Relation rel, ItemPointer tid,
Snapshot snapshot);
extern void simple_table_tuple_update(Relation rel, ItemPointer otid,
TupleTableSlot *slot, Snapshot snapshot,
+ const Bitmapset *modified_idx_attrs,
TU_UpdateIndexes *update_indexes);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..d294789c441 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -18,6 +18,7 @@
#include "datatype/timestamp.h"
#include "executor/execdesc.h"
#include "fmgr.h"
+#include "nodes/execnodes.h"
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
@@ -610,6 +611,10 @@ extern TupleDesc ExecCleanTypeFromTL(List *targetList);
extern TupleDesc ExecTypeFromExprList(List *exprList);
extern void ExecTypeSetColNames(TupleDesc typeInfo, List *namesList);
extern void UpdateChangedParamSet(PlanState *node, Bitmapset *newchg);
+extern Bitmapset *ExecCompareSlotAttrs(Bitmapset *attrs,
+ TupleDesc tupdesc,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
typedef struct TupOutputState
{
@@ -807,5 +812,8 @@ extern ResultRelInfo *ExecLookupResultRelByOid(ModifyTableState *node,
Oid resultoid,
bool missing_ok,
bool update_cache);
+extern Bitmapset *ExecUpdateModifiedIdxAttrs(ResultRelInfo *relinfo,
+ TupleTableSlot *old_tts,
+ TupleTableSlot *new_tts);
#endif /* EXECUTOR_H */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 236830f6b93..11460e134f0 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -162,7 +162,7 @@ typedef struct RelationData
Bitmapset *rd_keyattr; /* cols that can be ref'd by foreign keys */
Bitmapset *rd_pkattr; /* cols included in primary key */
Bitmapset *rd_idattr; /* included in replica identity index */
- Bitmapset *rd_hotblockingattr; /* cols blocking HOT update */
+ Bitmapset *rd_indexedattr; /* all cols referenced by indexes */
Bitmapset *rd_summarizedattr; /* cols indexed by summarizing indexes */
PublicationDesc *rd_pubdesc; /* publication descriptor, or NULL */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2700224939a..d4db82496b4 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -69,7 +69,7 @@ typedef enum IndexAttrBitmapKind
INDEX_ATTR_BITMAP_KEY,
INDEX_ATTR_BITMAP_PRIMARY_KEY,
INDEX_ATTR_BITMAP_IDENTITY_KEY,
- INDEX_ATTR_BITMAP_HOT_BLOCKING,
+ INDEX_ATTR_BITMAP_INDEXED,
INDEX_ATTR_BITMAP_SUMMARIZED,
} IndexAttrBitmapKind;
diff --git a/src/test/modules/injection_points/expected/syscache-update-pruned.out b/src/test/modules/injection_points/expected/syscache-update-pruned.out
index a6a4e8db996..07ef67a1eb4 100644
--- a/src/test/modules/injection_points/expected/syscache-update-pruned.out
+++ b/src/test/modules/injection_points/expected/syscache-update-pruned.out
@@ -16,8 +16,8 @@ step wakeinval4:
step at2: <... completed>
step wakeinval4: <... completed>
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
ERROR: tuple concurrently deleted
@@ -42,8 +42,8 @@ step mkrels4:
SELECT FROM vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
ERROR: duplicate key value violates unique constraint "pg_class_oid_index"
@@ -71,8 +71,8 @@ step at2: <... completed>
step wakeinval4: <... completed>
step at4: ALTER TABLE vactest.child50 INHERIT vactest.orig50;
step wakegrant4:
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
<waiting ...>
step grant1: <... completed>
step wakegrant4: <... completed>
diff --git a/src/test/modules/injection_points/specs/syscache-update-pruned.spec b/src/test/modules/injection_points/specs/syscache-update-pruned.spec
index e3a4295bd12..fef9ac895a1 100644
--- a/src/test/modules/injection_points/specs/syscache-update-pruned.spec
+++ b/src/test/modules/injection_points/specs/syscache-update-pruned.spec
@@ -103,7 +103,7 @@ session s1
setup {
SET debug_discard_caches = 0;
SELECT FROM injection_points_set_local();
- SELECT FROM injection_points_attach('heap_update-before-pin', 'wait');
+ SELECT FROM injection_points_attach('simple_heap_update-before-pin', 'wait');
}
step cachefill1 { SELECT FROM vactest.reloid_catcache_set('vactest.orig50'); }
step grant1 { GRANT SELECT ON vactest.orig50 TO PUBLIC; }
@@ -140,8 +140,8 @@ step mkrels4 {
SELECT FROM vactest.mkrels('intruder', 1, 100); -- repopulate LP_UNUSED
}
step wakegrant4 {
- SELECT FROM injection_points_detach('heap_update-before-pin');
- SELECT FROM injection_points_wakeup('heap_update-before-pin');
+ SELECT FROM injection_points_detach('simple_heap_update-before-pin');
+ SELECT FROM injection_points_wakeup('simple_heap_update-before-pin');
}
step at4 { ALTER TABLE vactest.child50 INHERIT vactest.orig50; }
step wakeinval4 {
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index 6dab60c937b..7ebb7890d96 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -287,7 +287,7 @@ DETAIL: Column "b" is a generated column.
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
ERROR: cannot insert a non-DEFAULT value into column "b"
DETAIL: Column "b" is a generated column.
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
a | b
---+----
3 | 6
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 98dee63b50a..ef98fd0cccf 100644
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -959,16 +959,24 @@ NOTICE: main_view BEFORE UPDATE STATEMENT (before_view_upd_stmt)
NOTICE: main_view AFTER UPDATE STATEMENT (after_view_upd_stmt)
UPDATE 0
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
-NOTICE: OLD: (21,10)
-NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (20,31)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
+NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
+NOTICE: OLD: (21,10)
+NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
+DELETE 1
+DELETE FROM main_view WHERE a = 21 AND b = 32;
+NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
NOTICE: OLD: (21,32)
NOTICE: main_view AFTER DELETE STATEMENT (after_view_del_stmt)
-DELETE 3
+DELETE 1
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
NOTICE: main_view BEFORE DELETE STATEMENT (before_view_del_stmt)
NOTICE: main_view INSTEAD OF DELETE ROW (instead_of_del)
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 9287c440709..c604ec35fa5 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -2483,7 +2483,8 @@ SELECT to_tsquery('SKIES & My | booKs');
'sky' | 'book'
(1 row)
---trigger
+-- tsvector_update_trigger() uses heap_modify_tuple() to set column 'a'
+-- without going through the executor's SET-clause tracking.
CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_tsvector
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(a, 'pg_catalog.english', t);
diff --git a/src/test/regress/expected/updatable_views.out b/src/test/regress/expected/updatable_views.out
index 9cea538b8e8..4877a1ddce9 100644
--- a/src/test/regress/expected/updatable_views.out
+++ b/src/test/regress/expected/updatable_views.out
@@ -372,15 +372,15 @@ INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
ERROR: multiple assignments to same column "a"
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
a | b
----+--------
+ -3 | Row 3
-2 | Row -2
-1 | Row -1
0 | Row 0
1 | Row 1
2 | Row 2
- -3 | Row 3
(6 rows)
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index e750866d2d8..877152d6d69 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -127,7 +127,7 @@ ALTER VIEW gtest1v ALTER COLUMN b SET DEFAULT 100;
INSERT INTO gtest1v VALUES (8, DEFAULT); -- error
INSERT INTO gtest1v VALUES (8, DEFAULT), (9, DEFAULT); -- error
-SELECT * FROM gtest1v;
+SELECT * FROM gtest1v ORDER BY a;
DELETE FROM gtest1v WHERE a >= 5;
DROP VIEW gtest1v;
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index ea39817ee3d..6ceb61608ae 100644
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -660,7 +660,9 @@ UPDATE main_view SET b = 32 WHERE a = 21 AND b = 31 RETURNING a, b;
UPDATE main_view SET b = 0 WHERE false;
-- Delete from view using trigger
-DELETE FROM main_view WHERE a IN (20,21);
+DELETE FROM main_view WHERE a = 20 AND b = 31;
+DELETE FROM main_view WHERE a = 21 AND b = 10;
+DELETE FROM main_view WHERE a = 21 AND b = 32;
DELETE FROM main_view WHERE a = 31 RETURNING a, b;
\set QUIET true
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index dc74aa0c889..77ac5fd3c5a 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -752,7 +752,8 @@ SELECT to_tsvector('SKIES My booKs');
SELECT plainto_tsquery('SKIES My booKs');
SELECT to_tsquery('SKIES & My | booKs');
---trigger
+-- tsvector_update_trigger() uses heap_modify_tuple() to set column 'a'
+-- without going through the executor's SET-clause tracking.
CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_tsvector
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(a, 'pg_catalog.english', t);
diff --git a/src/test/regress/sql/updatable_views.sql b/src/test/regress/sql/updatable_views.sql
index 1635adde2d4..160e7799715 100644
--- a/src/test/regress/sql/updatable_views.sql
+++ b/src/test/regress/sql/updatable_views.sql
@@ -125,7 +125,7 @@ INSERT INTO rw_view16 VALUES (3, 'Row 3', 3); -- should fail
INSERT INTO rw_view16 (a, b) VALUES (3, 'Row 3'); -- should be OK
UPDATE rw_view16 SET a=3, aa=-3 WHERE a=3; -- should fail
UPDATE rw_view16 SET aa=-3 WHERE a=3; -- should be OK
-SELECT * FROM base_tbl;
+SELECT * FROM base_tbl ORDER BY a;
DELETE FROM rw_view16 WHERE a=-3; -- should be OK
-- Read-only views
INSERT INTO ro_view17 VALUES (3, 'ROW 3');
--
2.51.2
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 18:04 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-23 18:39 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-24 18:02 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
@ 2026-03-24 19:44 ` Nathan Bossart <[email protected]>
2026-03-25 17:16 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
0 siblings, 1 reply; 24+ messages in thread
From: Nathan Bossart @ 2026-03-24 19:44 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
On Tue, Mar 24, 2026 at 02:02:07PM -0400, Greg Burd wrote:
> On Mon, Mar 23, 2026, at 2:39 PM, Nathan Bossart wrote:
>> On Tue, Mar 17, 2026 at 02:04:11PM -0400, Greg Burd wrote:
>>> - * INDEX_ATTR_BITMAP_SUMMARIZED Columns included in summarizing indexes
>>> + * INDEX_ATTR_BITMAP_INDEXED Columns referenced by indexes
>>> + * INDEX_ATTR_BITMAP_SUMMARIZED Columns only included in summarizing indexes
>>
>>> - Bitmapset *summarizedattrs; /* columns with summarizing indexes */
>>> + Bitmapset *indexedattrs; /* columns referenced by indexes */
>>> + Bitmapset *summarizedattrs; /* columns only in summarizing indexes */
>>
>> As before, the comment changes for the summarized-attr-related stuff seem
>> unnecessary.
>
> I disagree, the "only" is required to highlight the logic change here.
> Before this patch summarized attrs could overlap with indexed attrs, now
> it should not. This makes the logic a bit easier later in
> HeapUpdateHotAllowable().
My bad, you are right.
> So, we go from 3 calls to RelationGetIndexAttrBitmap() to 1, or at most 2
> when there's a summarizing index (which is frequently the case).
>
> This feels more logical, cleaner, and has less overhead but supports the
> same HOT logic.
Nice.
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
* Re: Expanding HOT updates for expression and partial indexes
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-19 20:43 ` Re: Expanding HOT updates for expression and partial indexes Andres Freund <[email protected]>
2026-02-19 22:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-23 19:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-25 21:03 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-26 22:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-02-26 23:01 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-02 19:08 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-11 15:51 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-12 20:33 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-15 21:11 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-16 16:23 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-16 17:55 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 16:38 ` Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-03-17 18:04 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-23 18:39 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
2026-03-24 18:02 ` Re: Expanding HOT updates for expression and partial indexes Greg Burd <[email protected]>
2026-03-24 19:44 ` Re: Expanding HOT updates for expression and partial indexes Nathan Bossart <[email protected]>
@ 2026-03-25 17:16 ` Nathan Bossart <[email protected]>
0 siblings, 0 replies; 24+ messages in thread
From: Nathan Bossart @ 2026-03-25 17:16 UTC (permalink / raw)
To: Greg Burd <[email protected]>; +Cc: Jeff Davis <[email protected]>; pgsql-hackers
I just spoke to Greg off-list and wanted to share my current thoughts on
the list as well. In short, while we feel that the patch is in decent
shape and seems to be performance neutral (or maybe even positive in some
cases), it obviously doesn't accomplish $subject, and only a couple of
folks have looked at it in depth. Furthermore, if this patch was committed
and someone did find a problem, it'd be hard to justify anything except a
revert. So, it's probably better to keep working on the full patch set and
try to get $subject committed much earlier in the development cycle.
If someone thinks that we should seriously consider committing this for
v19, please let us know.
--
nathan
^ permalink raw reply [nested|flat] 24+ messages in thread
end of thread, other threads:[~2026-03-25 17:16 UTC | newest]
Thread overview: 24+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-16 19:36 Re: Expanding HOT updates for expression and partial indexes Jeff Davis <[email protected]>
2026-02-17 21:15 ` Greg Burd <[email protected]>
2026-02-19 20:32 ` Greg Burd <[email protected]>
2026-02-19 20:43 ` Andres Freund <[email protected]>
2026-02-19 22:31 ` Greg Burd <[email protected]>
2026-02-23 19:23 ` Greg Burd <[email protected]>
2026-02-25 21:03 ` Jeff Davis <[email protected]>
2026-02-26 22:08 ` Greg Burd <[email protected]>
2026-02-26 23:01 ` Greg Burd <[email protected]>
2026-03-02 19:08 ` Greg Burd <[email protected]>
2026-03-11 15:51 ` Greg Burd <[email protected]>
2026-03-12 20:33 ` Nathan Bossart <[email protected]>
2026-03-12 21:31 ` Greg Burd <[email protected]>
2026-03-15 21:11 ` Jeff Davis <[email protected]>
2026-03-16 16:23 ` Greg Burd <[email protected]>
2026-03-16 17:29 ` Nathan Bossart <[email protected]>
2026-03-16 17:55 ` Jeff Davis <[email protected]>
2026-03-17 15:22 ` Nathan Bossart <[email protected]>
2026-03-17 16:38 ` Jeff Davis <[email protected]>
2026-03-17 18:04 ` Greg Burd <[email protected]>
2026-03-23 18:39 ` Nathan Bossart <[email protected]>
2026-03-24 18:02 ` Greg Burd <[email protected]>
2026-03-24 19:44 ` Nathan Bossart <[email protected]>
2026-03-25 17:16 ` Nathan Bossart <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox