Date: Fri, 22 Jul 2022 14:08:24 -0400
From: Bruce Momjian <bruce@momjian.us>
To: "David G. Johnston" <david.g.johnston@gmail.com>
Cc: "Jonathan S. Katz" <jkatz@postgresql.org>,
	Pg Docs <pgsql-docs@lists.postgresql.org>
Subject: Re: documentation on HOT
Message-ID: <YtrnmPQR4wYR17YE@momjian.us>
References: <c59ffbd5-96ac-a5a5-a401-14f627ca1405@postgresql.org>
 <CAKFQuwZHCbKYpEKEoQe=4Vf7JxTwVLbZdiLt-W+QgJsBN7wkzw@mail.gmail.com>
 <eead3a61-6c19-f5a8-ddfc-e895cb04657f@postgresql.org>
 <YtoFKu1D/KUo0ROb@momjian.us>
 <YtqdbfLYsULvY7HB@momjian.us>
 <d331a335-129f-66d9-e44c-4073119c9238@postgresql.org>
 <CAKFQuwZ_-k6ny-tbV-AZT142vBc2eyK6LtMXxQPSCjXFTM95PQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="Xb+7Jcw9oO4WVjcm"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: 
 <CAKFQuwZ_-k6ny-tbV-AZT142vBc2eyK6LtMXxQPSCjXFTM95PQ@mail.gmail.com>
Archived-At: 
 <https://www.postgresql.org/message-id/YtrnmPQR4wYR17YE%40momjian.us>
Precedence: bulk


--Xb+7Jcw9oO4WVjcm
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

On Fri, Jul 22, 2022 at 09:25:43AM -0700, David G. Johnston wrote:
> On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> I think we need to expose the information regarding columns used in predicates
> here.
> 
> "(Here, "indexed column" means any column referenced
> at all in an index definition, including for example columns that are
> tested in a partial-index predicate but are not stored in the index.)"

Okay, I clarified this in the attached patch.

> I get it is an implementation detail but explaining the name seems like a good
> thing to do as well:
> 
> "Without HOT, every version of a row in an update chain has its own index
> entries, even if all indexed columns are the same.  With HOT, a new tuple
> placed on the same page and with all indexed columns the same as its
> parent row version does not get new index entries.  This means there is
> only one index entry for the entire update chain on the heap page.
> An index-entry-less tuple is marked with the HEAP_ONLY_TUPLE flag."

I don't see how the chain is useful for people trying to understand how
to benefit from this feature.

> Where the last sentence becomes: "Those index-entry-less tuples (yeah, still
> dislike triple-hypenation...) are thus named "Heap-Only Tuples".
> 
> (I've actually incorporated this as I think it should be down below, as a
> lead-in to the listing of conditions for when the optimization can be used.)
> 
> Then maybe "can be removed during select" should be reworded as:
> 
> "No longer visible heap-only tuples can be removed during normal
> operation, including <command>SELECT</command>s, instead of requiring
> periodic vacuum operations."

I added a no-longer-visible qualifier to the patch.

> The original heap entry the index points to cannot be removed. "Old versions of
> heap-only tuples" vs. "No longer visible heap-only tuples" is probably a style
> choice.  There are basically three different "versions" in context here though
> so avoiding "old versions" has some appeal to me.
> 
> I'm not a fan of:
> 
> "Fortunately, there is an automatic system..."
> 
> I'd like to give credit to the fact we engineered a solution to the downsides,
> so change the lead-in paragraph to the conditions listing to be:

Yeah, good point.  We didn't stumble upon this feature.  I have adjusted
that wording.

> "To mitigate these downsides PostgreSQL implements an optimization whereby
> sometimes only the heap tuple is created, not the index entry, when performing
> an update.  In a case of giving things obvious and meaningful names, this is
> the Heap-Only Tuple (HOT) Optimization.  This optimization is possible when:"

Sorry, I don't like the above since it isn't precise and the "In a case
of giving things obvious and meaningful names" seems odd.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson


--Xb+7Jcw9oO4WVjcm
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="hot.diff"

diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml
index 9ed148ab84..2df6559acc 100644
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@@ -299,9 +299,7 @@
     <term><acronym>HOT</acronym></term>
     <listitem>
      <para>
-      <ulink
-      url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD">Heap-Only
-      Tuples</ulink>
+      <link linkend="storage-hot">Heap-Only Tuples</link>
      </para>
     </listitem>
    </varlistentry>
diff --git a/doc/src/sgml/btree.sgml b/doc/src/sgml/btree.sgml
index a9200ee52e..6f608a14bf 100644
--- a/doc/src/sgml/btree.sgml
+++ b/doc/src/sgml/btree.sgml
@@ -639,7 +639,8 @@ options(<replaceable>relopts</replaceable> <type>local_relopts *</type>) returns
    accumulate and adversely affect query latency and throughput.  This
    typically occurs with <command>UPDATE</command>-heavy workloads
    where most individual updates cannot apply the
-   <acronym>HOT</acronym> optimization.  Changing the value of only
+   <link linkend="storage-hot"><acronym>HOT</acronym> optimization.</link>
+   Changing the value of only
    one column covered by one index during an <command>UPDATE</command>
    <emphasis>always</emphasis> necessitates a new set of index tuples
    &mdash; one for <emphasis>each and every</emphasis> index on the
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a186e35f00..248dbc0e26 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4381,7 +4381,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
       <para>
        If true, queries must not use the index until the <structfield>xmin</structfield>
        of this <structname>pg_index</structname> row is below their <symbol>TransactionXmin</symbol>
-       event horizon, because the table may contain broken HOT chains with
+       event horizon, because the table may contain broken <link linkend="storage-hot">HOT chains</link> with
        incompatible rows that they can see
       </para></entry>
      </row>
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e2d728e0c4..e5a84ed76d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4482,7 +4482,8 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       <listitem>
        <para>
         Specifies the number of transactions by which <command>VACUUM</command> and
-        <acronym>HOT</acronym> updates will defer cleanup of dead row versions. The
+        <link linkend="storage-hot"><acronym>HOT</acronym> updates</link>
+        will defer cleanup of dead row versions. The
         default is zero transactions, meaning that dead row versions can be
         removed as soon as possible, that is, as soon as they are no longer
         visible to any open transaction.  You may wish to set this to a
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index cf359fa9ff..4f83970c85 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -45,7 +45,8 @@
    extant versions of the same logical row; to an index, each tuple is
    an independent object that needs its own index entry.  Thus, an
    update of a row always creates all-new index entries for the row, even if
-   the key values did not change.  (HOT tuples are an exception to this
+   the key values did not change.  (<link linkend="storage-hot">HOT
+   tuples</link> are an exception to this
    statement; but indexes do not deal with those, either.)  Index entries for
    dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
    are reclaimed.
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 023157d888..42e1e86c8a 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -749,7 +749,7 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
   <para>
    Index expressions are relatively expensive to maintain, because the
    derived expression(s) must be computed for each row insertion
-   and non-HOT update.  However, the index expressions are
+   and <link linkend="storage-hot">non-HOT update.</link>  However, the index expressions are
    <emphasis>not</emphasis> recomputed during an indexed search, since they are
    already stored in the index.  In both examples above, the system
    sees the query as just <literal>WHERE indexedcolumn = 'constant'</literal>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7dbbab6f5c..6408d28c5d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4426,7 +4426,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
        <structfield>n_tup_upd</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of rows updated (includes HOT updated rows)
+       Number of rows updated (includes <link linkend="storage-hot">HOT updated rows</link>)
       </para></entry>
      </row>
 
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index f4b9f66589..0c4c3ed7c2 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -1075,4 +1075,68 @@ data. Empty in ordinary tables.</entry>
  </sect2>
 </sect1>
 
+<sect1 id="storage-hot">
+
+ <title>Heap-Only Tuples (<acronym>HOT</acronym>)</title>
+
+ <para>
+  To allow for high concurrency, <productname>PostgreSQL</productname>
+  uses <link linkend="mvcc-intro">multiversion concurrency
+  control</link> (<acronym>MVCC</acronym>) to store rows.  However,
+  <acronym>MVCC</acronym> has some downsides for update queries.
+  Specifically, updates cause additional rows to be added to tables.
+  This can also require new index entries for each updated row, and
+  removal of old versions of rows can be expensive.
+ </para>
+
+ <para>
+  To help reduce the overhead of updates,
+  <productname>PostgreSQL</productname> has an optimization called
+  heap-only tuples (<acronym>HOT</acronym>).  This optimization is
+  possible when:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     The update does not modify any columns referenced by the table's
+     indexes, including expression and partial indexes.
+     </para>
+   </listitem>
+   <listitem>
+    <para>
+     There is sufficient free space on the page containing the old row
+     for the updated row.
+    </para>
+   </listitem>
+  </itemizedlist>
+
+  In such cases, heap-only tuples provide two optimizations:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     New index entries are not needed to represent updated rows.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Old no-longer-visible versions of the updated rows can be removed
+     during normal operation, including <command>SELECT</command>s,
+     instead of requiring periodic vacuum operations.
+    </para>
+   </listitem>
+  </itemizedlist>
+ </para>
+
+ <para>
+  In summary, heap-only tuple updates can only happen if indexed columns
+  are not updated.  You can increase the chance of sufficient page space
+  for <acronym>HOT</acronym> updates by using non-default table <link
+  linkend="sql-createtable"><literal>fillfactor</literal></link> settings.
+  If you don't, <acronym>HOT</acronym> updates will still happen because
+  new rows will naturally migrate to new pages and existing pages with
+  sufficient free space for new row versions.
+ </para>
+</sect1>
+
 </chapter>

--Xb+7Jcw9oO4WVjcm--