Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEvTd-00025e-IV for pgsql-docs@arkaria.postgresql.org; Fri, 22 Jul 2022 16:26:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1oEvTb-0001fv-Ed for pgsql-docs@arkaria.postgresql.org; Fri, 22 Jul 2022 16:26:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oEvTb-0001fl-2Z for pgsql-docs@lists.postgresql.org; Fri, 22 Jul 2022 16:26:07 +0000 Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oEvTV-0005HZ-UC for pgsql-docs@lists.postgresql.org; Fri, 22 Jul 2022 16:26:06 +0000 Received: by mail-ej1-x632.google.com with SMTP id oy13so9458875ejb.1 for ; Fri, 22 Jul 2022 09:26:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WRn1P1I4N3GszFLdebR0QlfdPTPyQvuFUzYO0UXEAz4=; b=d/+sHNeNvZWW0VZ6L/AWf69zdxymRKXWlxQiVwx1N8P8UG4FP90qEtTqMxN3M4w193 6o/1H3wtR2VSWJ7WeHIbEFyelFXRKleaZUDAkLM5MruVH7AcAlJa2NKRWhFmEBo7s+b0 SxKGiy9gUtaoAR9oYMcrw1reUphEuorVkOSOfJgWQ01NS229oEPiRMm9Biq7oYL0MJoY om0ykARQ1sxDVJrW8NbVAIeinKL3bopUf6iQIqui34yyQPfXXtVzAGJmFkn3qH0tgJlW l+vW3S3/Dk57XBSRUzuHBTJf19U4hcShmp+vxaINV0vs5GYhiKVqGHWBZvTt284YQ/yZ uY7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WRn1P1I4N3GszFLdebR0QlfdPTPyQvuFUzYO0UXEAz4=; b=7VBnXZ9lqgIFbg1MJWcq4Uo9b+I3pG8rhHpabh4VytKUkL1GYJQ+ORqz/hSOj/RCeD yxAmPRt4RKFd5cRfgUxl7cFywTfkkn1Av3P+D49pQC3PTq62CUIDPQuG47g7lV1YTtI2 Oddo0jWQy380zlJHtCtWk8H49eiwW7WZPcwGdFvtt/w0fEUKwbs0gtThpsHmSfSRWAxJ FBWB3ryoi9hXTnTL/CmbrfkQ4uI6bMIl1jJMPtBztrgbCH1yXvXBV/6oDV1YDGNe6T+W j/aOOwOYSK8aJ6LA2BljCfz4/bqzcbqzZlnH6aOsNq479b3kMemcpALC8KHep0lGRZNO S6CQ== X-Gm-Message-State: AJIora8KlL+b8T9fKMQK8haaYHOpNIUf7SJx3b37KDCczZSDP3zcWjBI VA9us/isRKoo1zfgiXfjy/dq9AcWUJMuXnTprUGRzQpUCTQ= X-Google-Smtp-Source: AGRyM1tCWUjWB2bBJ3QxNJAPsN31Pp8v9BOTEYTlBEBC8V1ZT6f83kkrf9gwr03DpMk/EHUbQy9X5MgMdoyJ+fp83Hw= X-Received: by 2002:a17:907:3e21:b0:72f:87c:efeb with SMTP id hp33-20020a1709073e2100b0072f087cefebmr533953ejc.528.1658507160288; Fri, 22 Jul 2022 09:26:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "David G. Johnston" Date: Fri, 22 Jul 2022 09:25:43 -0700 Message-ID: Subject: Re: documentation on HOT To: "Jonathan S. Katz" Cc: Bruce Momjian , Pg Docs Content-Type: multipart/alternative; boundary="00000000000025828c05e4674ab1" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000025828c05e4674ab1 Content-Type: text/plain; charset="UTF-8" On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz wrote: > On 7/22/22 8:51 AM, Bruce Momjian wrote: > > On Thu, Jul 21, 2022 at 10:02:18PM -0400, Bruce Momjian wrote: > >> On Sun, Feb 6, 2022 at 09:29:56PM -0500, Jonathan Katz wrote: > >>> I agree with Bruce's point that we should have a new section (or > >>> subsection). As I mentioned in my previous post, given HOT involves > >>> indexing, I would suggest putting it there. > >>> > >>> I think that something that follows the general outline of Laurenz's > post > >>> would satisfy the user requirements. It explains at a high level what > HOT > >>> is, it's advantages, and how it works. > >> > >> Attached is a patch that adds a new HOT section to the storage chapter. > > > > I wasn't happy with the last paragraph so I added some more details. > > URL contents updated too: > > > > https://momjian.us/tmp/pgsql/storage-hot.html > > Thanks! This is great. Probably the most concise and clear explanation I > have seen for HOT, which is exactly what we need for the docs :) > > A few suggestions: > > s/Fortunately, there is/To help reduce overhead, PostgreSQL has/ > > s/In summary, h/H/ > > > You can increase the change of HOT updates by using non-default table > fillfactor settings. > > I think we should expand on this and explain how adjusting "fillfactor" > will affect this. I think that may change the final sentence too. > > I think we need to expose the information regarding columns used in predicates here. "(Here, "indexed column" means any column referenced at all in an index definition, including for example columns that are tested in a partial-index predicate but are not stored in the index.)" I get it is an implementation detail but explaining the name seems like a good thing to do as well: "Without HOT, every version of a row in an update chain has its own index entries, even if all indexed columns are the same. With HOT, a new tuple placed on the same page and with all indexed columns the same as its parent row version does not get new index entries. This means there is only one index entry for the entire update chain on the heap page. An index-entry-less tuple is marked with the HEAP_ONLY_TUPLE flag." Where the last sentence becomes: "Those index-entry-less tuples (yeah, still dislike triple-hypenation...) are thus named "Heap-Only Tuples". (I've actually incorporated this as I think it should be down below, as a lead-in to the listing of conditions for when the optimization can be used.) Then maybe "can be removed during select" should be reworded as: "No longer visible heap-only tuples can be removed during normal operation, including SELECTs, instead of requiring periodic vacuum operations." The original heap entry the index points to cannot be removed. "Old versions of heap-only tuples" vs. "No longer visible heap-only tuples" is probably a style choice. There are basically three different "versions" in context here though so avoiding "old versions" has some appeal to me. I'm not a fan of: "Fortunately, there is an automatic system..." I'd like to give credit to the fact we engineered a solution to the downsides, so change the lead-in paragraph to the conditions listing to be: "To mitigate these downsides PostgreSQL implements an optimization whereby sometimes only the heap tuple is created, not the index entry, when performing an update. In a case of giving things obvious and meaningful names, this is the Heap-Only Tuple (HOT) Optimization. This optimization is possible when:" David J. --00000000000025828c05e4674ab1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Jul 22, 2022 at 8:09 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
On 7/22/22 8:51 AM, Bruce Momjian wrote:
> On Thu, Jul 21, 2022 at 10:02:18PM -0400, Bruce Momjian wrote:
>> On Sun, Feb=C2=A0 6, 2022 at 09:29:56PM -0500, Jonathan Katz wrote= :
>>> I agree with Bruce's point that we should have a new secti= on (or
>>> subsection). As I mentioned in my previous post, given HOT inv= olves
>>> indexing, I would suggest putting it there.
>>>
>>> I think that something that follows the general outline of Lau= renz's post
>>> would satisfy the user requirements. It explains at a high lev= el what HOT
>>> is, it's advantages, and how it works.
>>
>> Attached is a patch that adds a new HOT section to the storage cha= pter.
>
> I wasn't happy with the last paragraph so I added some more detail= s.
> URL contents updated too:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0https://momjian.us/tmp/p= gsql/storage-hot.html

Thanks! This is great. Probably the most concise and clear explanation I have seen for HOT, which is exactly what we need for the docs :)

A few suggestions:

s/Fortunately, there is/To help reduce overhead, PostgreSQL has/

s/In summary, h/H/

=C2=A0> You can increase the change of HOT updates by using non-default = table
fillfactor settings.

I think we should expand on this and explain how adjusting "fillfactor= "
will affect this. I think that may change the final sentence too.


I think we need to expose the information rega= rding columns used in predicates here.

"(Here, &q= uot;indexed column" means any column referenced
at all in an index = definition, including for example columns that are
tested in a partial-i= ndex predicate but are not stored in the index.)"

I get it is an implementation detail but explaining the name seems li= ke a good thing to do as well:

"Without HOT, ever= y version of a row in an update chain has its own index
entries, even if= all indexed columns are the same.=C2=A0 With HOT, a new tuple
placed on= the same page and with all indexed columns the same as its
parent row v= ersion does not get new index entries.=C2=A0 This means there is
only on= e index entry for the entire update chain on the heap page.
An index-ent= ry-less tuple is marked with the HEAP_ONLY_TUPLE flag."
Where the last sentence becomes: "Those index-entry-less tu= ples (yeah, still dislike triple-hypenation...) are thus named "Heap-O= nly Tuples".

(I've actually incorporated this= as I think it should be down below, as a lead-in to the listing of conditi= ons for when the optimization can be used.)

Then maybe= "can be removed during select" should be reworded as:
"No longer visible heap-only tuples can be removed during n= ormal
operation, including <command>SELECT</command>s, inste= ad of requiring
periodic vacuum operations."

=
The original heap entry the index points to cannot be removed. "Old= versions of heap-only tuples" vs. "No longer visible heap-only t= uples" is probably a style choice.=C2=A0 There are basically three dif= ferent "versions" in context here though so avoiding "old ve= rsions" has some appeal to me.

I'm not a fan = of:

"Fortunately, there is an automatic system...= "

I'd like to give credit to the fact we = engineered a solution to the downsides, so change the lead-in paragraph to = the conditions listing to be:

"To mitigate these = downsides PostgreSQL implements an optimization whereby sometimes only the = heap tuple is created, not the index entry, when performing an update.=C2= =A0 In a case of giving things obvious and meaningful names, this is the He= ap-Only Tuple (HOT) Optimization.=C2=A0 This optimization is possible when:= "

David J.

--00000000000025828c05e4674ab1--