public inbox for [email protected]help / color / mirror / Atom feed
Add a berief general comment on BTScanInsertData's nextkey and backward 4+ messages / 2 participants [nested] [flat]
* Add a berief general comment on BTScanInsertData's nextkey and backward @ 2025-11-18 07:28 Yugo Nagata <[email protected]> 2025-11-19 01:15 ` Re: Add a berief general comment on BTScanInsertData's nextkey and backward Peter Geoghegan <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Yugo Nagata @ 2025-11-18 07:28 UTC (permalink / raw) To: pgsql-hackers Hi, While reading the nbtree codes, I noticed that the comments on BTScanInsertData no longer describes the meaning of the nextkey and backward fields. The comment curently only says: * See comments in _bt_first for an explanation of the nextkey and backward * fields. Detailed comments used to exit here, but they were removed by c9c0589fda0e, I guess, because the semantic changed when the optimazation for backward scans was introduced. However, having a brief, general description here is still useful for readers. I've attached a patch that adds the following comment: + * nextkey determines how the scankey's boundary is interpreted, and backward + * indicates a backward scan. See comments in _bt_first for a more detailed + * explanation of these fields. What do think? Regards, Yugo Nagata -- Yugo Nagata <[email protected]> Attachments: [text/x-diff] add_brief_general_comment_on_BTScanInsertData_nextkey.patch (813B, 2-add_brief_general_comment_on_BTScanInsertData_nextkey.patch) download | inline diff: diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index 16be5c7a9c1..581e200a3cf 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -774,8 +774,9 @@ typedef BTStackData *BTStack; * bit, but may not when inserting into an INCLUDE index (tuple header value * is affected by the NULL-ness of both key and non-key attributes). * - * See comments in _bt_first for an explanation of the nextkey and backward - * fields. + * nextkey determines how the scankey's boundary is interpreted, and backward + * indicates a backward scan. See comments in _bt_first for a more detailed + * explanation of these fields. * * scantid is the heap TID that is used as a final tiebreaker attribute. It * is set to NULL when index scan doesn't need to find a position for a ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Add a berief general comment on BTScanInsertData's nextkey and backward 2025-11-18 07:28 Add a berief general comment on BTScanInsertData's nextkey and backward Yugo Nagata <[email protected]> @ 2025-11-19 01:15 ` Peter Geoghegan <[email protected]> 2025-12-31 10:21 ` Re: Add a berief general comment on BTScanInsertData's nextkey and backward Yugo Nagata <[email protected]> 0 siblings, 1 reply; 4+ messages in thread From: Peter Geoghegan @ 2025-11-19 01:15 UTC (permalink / raw) To: Yugo Nagata <[email protected]>; +Cc: pgsql-hackers On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <[email protected]> wrote: > I've attached a patch that adds the following comment: > > + * nextkey determines how the scankey's boundary is interpreted, and backward > + * indicates a backward scan. See comments in _bt_first for a more detailed > + * explanation of these fields. > > What do think? Seems reasonable to me. I wonder if we also do something about these existing _bt_binsrch comments: * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber * of the last key < given scankey, or last key <= given scankey if nextkey * is true. (Since _bt_compare treats the first data key of such a page as * minus infinity, there will be at least one key < scankey, so the result * always points at one of the keys on the page.) Here we describe what happens on an internal page. This is correct, but *seems* to contradict the higher level comments at the end of _bt_first. There is actually no contradiction between the two -- not when you understand that _bt_first describes the whole end-to-end process of calling _bt_search and then calling _bt_binsrch on the leaf page (not of calling _bt_binsrch once, against an internal page). One could think of this _bt_binsrch internal page behavior as compensating for the fact that internal pages have pivot tuples that consist of a separator key (except for the first key, which is just has a -inf key/no key), plus a downlink that points to the *next* page after that separator key one level down (except for the internal page high key, which has no downlink at all). Might be useful to say something like that instead. This is hard to explain without an example. Right now, an internal page might have pivot tuples that look like this: Separator: -inf, Downlink: 1 Separator: 'a', Downlink: 2 Separator: 'c', Downlink: 3 Separator: 'f', Downlink: (none, this is the high key) But _bt_binsrch "pretends" that our internal page actually contains: Downlink: 1 Separator: 'a' Downlink: 2 Separator: 'c' Downlink: 3 Separator: 'f' So if our = scan key is (say) 'c', we should descend using the downlink that points to block 2 (not the one that points to block 3, as might be expected from looking at the real physical representation consisting of pivot tuples). That is what the scan needs to do to get to the page one level down whose high key is also 'c' -- that's where the scan needs to look. (If the next level down is the leaf level, then the leaf page we descend to might also contain a *non*-pivot tuple with the key value 'c', "right before" the high key with 'c', which the scan will need to return in _bt_readpage. Lehman & Yao allow the key before a leaf page's high key to be equal to the high key, but otherwise forbid all duplicate keys.) I find it very hard to know what explanation will be the least confusing to other people, at least in this area. Since you're interested in this area, I thought I'd ask what you think. (Here I'm pretending that the keys in the tree are unique without needing a heap TID, per classic L&Y, which is unrealistic but simplifies the example.) -- Peter Geoghegan ^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Add a berief general comment on BTScanInsertData's nextkey and backward 2025-11-18 07:28 Add a berief general comment on BTScanInsertData's nextkey and backward Yugo Nagata <[email protected]> 2025-11-19 01:15 ` Re: Add a berief general comment on BTScanInsertData's nextkey and backward Peter Geoghegan <[email protected]> @ 2025-12-31 10:21 ` Yugo Nagata <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Yugo Nagata @ 2025-12-31 10:21 UTC (permalink / raw) To: Peter Geoghegan <[email protected]>; +Cc: pgsql-hackers On Tue, 18 Nov 2025 20:15:40 -0500 Peter Geoghegan <[email protected]> wrote: > On Tue, Nov 18, 2025 at 2:28 AM Yugo Nagata <[email protected]> wrote: > > I've attached a patch that adds the following comment: > > > > + * nextkey determines how the scankey's boundary is interpreted, and backward > > + * indicates a backward scan. See comments in _bt_first for a more detailed > > + * explanation of these fields. > > > > What do think? > > Seems reasonable to me. Thank you for your review. > > I wonder if we also do something about these existing _bt_binsrch comments: > > * On an internal (non-leaf) page, _bt_binsrch() returns the OffsetNumber > * of the last key < given scankey, or last key <= given scankey if nextkey > * is true. (Since _bt_compare treats the first data key of such a page as > * minus infinity, there will be at least one key < scankey, so the result > * always points at one of the keys on the page.) > > Here we describe what happens on an internal page. This is correct, > but *seems* to contradict the higher level comments at the end of > _bt_first. There is actually no contradiction between the two -- not > when you understand that _bt_first describes the whole end-to-end > process of calling _bt_search and then calling _bt_binsrch on the leaf > page (not of calling _bt_binsrch once, against an internal page). > > One could think of this _bt_binsrch internal page behavior as > compensating for the fact that internal pages have pivot tuples that > consist of a separator key (except for the first key, which is just > has a -inf key/no key), plus a downlink that points to the *next* page > after that separator key one level down (except for the internal page > high key, which has no downlink at all). Might be useful to say > something like that instead. > > This is hard to explain without an example. Right now, an internal > page might have pivot tuples that look like this: > > Separator: -inf, Downlink: 1 > Separator: 'a', Downlink: 2 > Separator: 'c', Downlink: 3 > Separator: 'f', Downlink: (none, this is the high key) > > But _bt_binsrch "pretends" that our internal page actually contains: > > Downlink: 1 > Separator: 'a' > Downlink: 2 > Separator: 'c' > Downlink: 3 > Separator: 'f' > > So if our = scan key is (say) 'c', we should descend using the > downlink that points to block 2 (not the one that points to block 3, > as might be expected from looking at the real physical representation > consisting of pivot tuples). That is what the scan needs to do to get > to the page one level down whose high key is also 'c' -- that's where > the scan needs to look. (If the next level down is the leaf level, > then the leaf page we descend to might also contain a *non*-pivot > tuple with the key value 'c', "right before" the high key with 'c', > which the scan will need to return in _bt_readpage. Lehman & Yao allow > the key before a leaf page's high key to be equal to the high key, but > otherwise forbid all duplicate keys.) > > I find it very hard to know what explanation will be the least > confusing to other people, at least in this area. Since you're > interested in this area, I thought I'd ask what you think. I do not see any contradiction between the comment on _bt_binsrch and the comments at the end of _bt_first. The former explicitly states that it refers to internal (non-leaf) pages, and I understand the latter to describe loading data from a leaf page. That said, we could make this clearer by explicitly mentioning the leaf page in the first sentence, for example: * Now load data from the first leaf page of the scan (usually the *page currently in so->currPos.buf). Regards, Yugo Nagata -- Yugo Nagata <[email protected]> ^ permalink raw reply [nested|flat] 4+ messages in thread
* [PATCH v2 2/2] Add a berief general comment on BTScanInsertData's nextkey and backward @ 2026-03-24 09:26 Yugo Nagata <[email protected]> 0 siblings, 0 replies; 4+ messages in thread From: Yugo Nagata @ 2026-03-24 09:26 UTC (permalink / raw) --- src/include/access/nbtree.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index da7503c57b6..00997a67bf4 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -774,8 +774,9 @@ typedef BTStackData *BTStack; * bit, but may not when inserting into an INCLUDE index (tuple header value * is affected by the NULL-ness of both key and non-key attributes). * - * See comments in _bt_first for an explanation of the nextkey and backward - * fields. + * nextkey determines how the scankey's boundary is interpreted, and backward + * indicates a backward scan. See comments in _bt_first for a more detailed + * explanation of these fields. * * scantid is the heap TID that is used as a final tiebreaker attribute. It * is set to NULL when index scan doesn't need to find a position for a -- 2.43.0 --Multipart=_Tue__24_Mar_2026_19_25_37_+0900_.W43dn0fDT9aZE0n Content-Type: text/x-diff; name="v2-0001-Clarify-_bt_binsrch-comments-for-internal-vs-leaf.patch" Content-Disposition: attachment; filename="v2-0001-Clarify-_bt_binsrch-comments-for-internal-vs-leaf.patch" Content-Transfer-Encoding: 7bit ^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-03-24 09:26 UTC | newest] Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-11-18 07:28 Add a berief general comment on BTScanInsertData's nextkey and backward Yugo Nagata <[email protected]> 2025-11-19 01:15 ` Peter Geoghegan <[email protected]> 2025-12-31 10:21 ` Yugo Nagata <[email protected]> 2026-03-24 09:26 [PATCH v2 2/2] Add a berief general comment on BTScanInsertData's nextkey and backward Yugo Nagata <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox