public inbox for [email protected]  
help / color / mirror / Atom feed
pgsql: Improve nbtree skip scan primitive scan scheduling.
7+ messages / 2 participants
[nested] [flat]

* pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-04 17:58  Peter Geoghegan <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Peter Geoghegan @ 2025-04-04 17:58 UTC (permalink / raw)
  To: [email protected]

Improve nbtree skip scan primitive scan scheduling.

Don't allow nbtree scans with skip arrays to end any primitive scan on
its first leaf page without giving some consideration to how many times
the scan's arrays advanced while changing at least one skip array
(though continue not caring about the number of array advancements that
only affected SAOP arrays, even during skip scans with SAOP arrays).
Now when a scan performs more than 3 such array advancements in the
course of reading a single leaf page, it is taken as a signal that the
next page is unlikely to be skippable.  We'll therefore continue the
ongoing primitive index scan, at least until we can perform a recheck
against the next page's finaltup.

Testing has shown that this new heuristic occasionally makes all the
difference with skip scans that were expected to rely on the "passed
first page" heuristic added by commit 9a2e2a28.  Without it, there is a
remaining risk that certain kinds of skip scans will never quite manage
to clear the initial hurdle of performing a primitive scan that lasts
beyond its first leaf page (or that such a skip scan will only clear
that initial hurdle when it has already wasted noticeably-many cycles
due to inefficient primitive scan scheduling).

Follow-up to commits 92fe23d9 and 9a2e2a28.

Author: Peter Geoghegan <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/21a152b37f36c9563d1b0b058fe1436baf578b4c

Modified Files
--------------
src/backend/access/nbtree/nbtsearch.c | 16 +++++++
src/backend/access/nbtree/nbtutils.c  | 90 +++++++++++++++++++++++------------
src/include/access/nbtree.h           |  3 +-
3 files changed, 78 insertions(+), 31 deletions(-)



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-27 02:38  Mark Dilger <[email protected]>
  parent: Peter Geoghegan <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Mark Dilger @ 2025-04-27 02:38 UTC (permalink / raw)
  To: Peter Geoghegan <[email protected]>; Matthias van de Meent <[email protected]>; +Cc: [email protected]

Peter, Matthias, thanks kindly for the good work on skipscans!

While admiring the recent performance improvements, I found a test case
which fails after commit 21a152b37f36c9563d1b0b058fe1436baf578b4c.  Please
find a reproducible test case, attached.

Thanks!



On Fri, Apr 4, 2025 at 10:58 AM Peter Geoghegan <[email protected]> wrote:

> Improve nbtree skip scan primitive scan scheduling.
>
> Don't allow nbtree scans with skip arrays to end any primitive scan on
> its first leaf page without giving some consideration to how many times
> the scan's arrays advanced while changing at least one skip array
> (though continue not caring about the number of array advancements that
> only affected SAOP arrays, even during skip scans with SAOP arrays).
> Now when a scan performs more than 3 such array advancements in the
> course of reading a single leaf page, it is taken as a signal that the
> next page is unlikely to be skippable.  We'll therefore continue the
> ongoing primitive index scan, at least until we can perform a recheck
> against the next page's finaltup.
>
> Testing has shown that this new heuristic occasionally makes all the
> difference with skip scans that were expected to rely on the "passed
> first page" heuristic added by commit 9a2e2a28.  Without it, there is a
> remaining risk that certain kinds of skip scans will never quite manage
> to clear the initial hurdle of performing a primitive scan that lasts
> beyond its first leaf page (or that such a skip scan will only clear
> that initial hurdle when it has already wasted noticeably-many cycles
> due to inefficient primitive scan scheduling).
>
> Follow-up to commits 92fe23d9 and 9a2e2a28.
>
> Author: Peter Geoghegan <[email protected]>
> Reviewed-By: Matthias van de Meent <[email protected]>
> Discussion:
> https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com
>
> Branch
> ------
> master
>
> Details
> -------
>
> https://git.postgresql.org/pg/commitdiff/21a152b37f36c9563d1b0b058fe1436baf578b4c
>
> Modified Files
> --------------
> src/backend/access/nbtree/nbtsearch.c | 16 +++++++
> src/backend/access/nbtree/nbtutils.c  | 90
> +++++++++++++++++++++++------------
> src/include/access/nbtree.h           |  3 +-
> 3 files changed, 78 insertions(+), 31 deletions(-)
>
>

-- 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachments:

  [application/octet-stream] v1-0001-Add-test-showing-a-problem-with-btree-index-scans.patch (6.5K, 3-v1-0001-Add-test-showing-a-problem-with-btree-index-scans.patch)
  download | inline diff:
From 8e4015e01c416448795f29003f5a667ef45437d7 Mon Sep 17 00:00:00 2001
From: Mark Dilger <[email protected]>
Date: Sat, 26 Apr 2025 19:21:27 -0700
Subject: [PATCH v1] Add test showing a problem with btree index scans

While working with a more complex set of changes, I discovered a
problem which might have been with btree, though it could have been
with some of my own code changes.  At that time, while playing with
a much more complex regression test (not included here owing to the
interplay of my code changes which are not relevant), I caught a
stack trace:

TRAP: failed Assert("sktrig_required"), File: "nbtutils.c", Line: 1861, PID: 74377
0   postgres                            0x0000000102c485d0 ExceptionalCondition + 108
1   postgres                            0x00000001027e6ae8 _bt_advance_array_keys + 3828
2   postgres                            0x00000001027e5280 _bt_check_compare + 372
3   postgres                            0x00000001027e4d6c _bt_checkkeys + 200
4   postgres                            0x00000001027e08c0 _bt_readpage + 764
5   postgres                            0x00000001027df914 _bt_readnextpage + 1260
6   postgres                            0x00000001027dff70 _bt_next + 76
7   postgres                            0x00000001027dbc48 btgetbitmap + 136
8   postgres                            0x00000001027cd420 index_getbitmap + 64
9   postgres                            0x000000010294d480 MultiExecBitmapIndexScan + 204
10  postgres                            0x000000010294d0dc BitmapHeapNext + 312
11  postgres                            0x000000010293cf48 ExecScan + 228
12  postgres                            0x0000000102949a48 fetch_input_tuple + 116
13  postgres                            0x0000000102947f60 ExecAgg + 1372
14  postgres                            0x00000001029330f0 standard_ExecutorRun + 312
15  postgres                            0x0000000102b13df4 PortalRunSelect + 236
16  postgres                            0x0000000102b13a10 PortalRun + 492
17  postgres                            0x0000000102b12954 exec_simple_query + 1292
18  postgres                            0x0000000102b0fb58 PostgresMain + 1384
19  postgres                            0x0000000102b0b7fc BackendInitialize + 0
20  postgres                            0x0000000102a65d8c postmaster_child_launch + 372
21  postgres                            0x0000000102a69fdc ServerLoop + 4948
22  postgres                            0x0000000102a68300 InitProcessGlobals + 0
23  postgres                            0x0000000102987f0c help + 0
24  dyld                                0x000000018eb17154 start + 2476
2025-04-26 16:11:35.168 PDT postmaster[74356] LOG:  client backend (PID 74377) was terminated by signal 6: Abort trap: 6

After removing my own changes and code, I did not happen upon any
similar stack traces, but managed to isolate a simplified regression
test that consistently reproduces the bug.  A patch of that test is
included here.
---
 src/test/regress/expected/_bt_skipscan.out | 53 ++++++++++++++++++++++
 src/test/regress/parallel_schedule         |  2 +
 src/test/regress/sql/_bt_skipscan.sql      | 40 ++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 src/test/regress/expected/_bt_skipscan.out
 create mode 100644 src/test/regress/sql/_bt_skipscan.sql

diff --git a/src/test/regress/expected/_bt_skipscan.out b/src/test/regress/expected/_bt_skipscan.out
new file mode 100644
index 00000000000..0a70a554164
--- /dev/null
+++ b/src/test/regress/expected/_bt_skipscan.out
@@ -0,0 +1,53 @@
+CREATE TABLE test (
+	aid			INTEGER,
+	bid			INTEGER,
+	ary			FLOAT4[]
+);
+SELECT setseed(3.0/1024.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, NULL, ary FROM 
+		(SELECT aid, array_agg(random()) AS ary
+			FROM generate_series(1,1000) AS aid, generate_series(1,10)
+			GROUP BY aid
+		) AS ss
+	);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, drift, array_agg(random()) AS ary
+		FROM test, generate_series(1,1000) AS drift, generate_series(1,10)
+		GROUP BY aid, drift
+	);
+CREATE INDEX test_idx ON test USING btree (aid, bid, ary);
+ANALYZE test;
+-- scan the table, ignoring the index
+SET enable_seqscan = on;
+SET enable_indexscan = off;
+SET enable_bitmapscan = off;
+SET enable_indexonlyscan = off;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+ count 
+-------
+   424
+(1 row)
+
+-- scan the index, expecting the same results as above
+SET enable_seqscan = off;
+SET enable_indexscan = on;
+SET enable_bitmapscan = on;
+SET enable_indexonlyscan = on;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+ count 
+-------
+   424
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a424be2a6bf..043f472a604 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -11,6 +11,8 @@
 # required setup steps
 test: test_setup
 
+test: _bt_skipscan
+
 # ----------
 # The first group of parallel tests
 # ----------
diff --git a/src/test/regress/sql/_bt_skipscan.sql b/src/test/regress/sql/_bt_skipscan.sql
new file mode 100644
index 00000000000..8641f10c947
--- /dev/null
+++ b/src/test/regress/sql/_bt_skipscan.sql
@@ -0,0 +1,40 @@
+CREATE TABLE test (
+	aid			INTEGER,
+	bid			INTEGER,
+	ary			FLOAT4[]
+);
+SELECT setseed(3.0/1024.0);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, NULL, ary FROM 
+		(SELECT aid, array_agg(random()) AS ary
+			FROM generate_series(1,1000) AS aid, generate_series(1,10)
+			GROUP BY aid
+		) AS ss
+	);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, drift, array_agg(random()) AS ary
+		FROM test, generate_series(1,1000) AS drift, generate_series(1,10)
+		GROUP BY aid, drift
+	);
+CREATE INDEX test_idx ON test USING btree (aid, bid, ary);
+ANALYZE test;
+
+-- scan the table, ignoring the index
+SET enable_seqscan = on;
+SET enable_indexscan = off;
+SET enable_bitmapscan = off;
+SET enable_indexonlyscan = off;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+
+-- scan the index, expecting the same results as above
+SET enable_seqscan = off;
+SET enable_indexscan = on;
+SET enable_bitmapscan = on;
+SET enable_indexonlyscan = on;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-27 03:53  Peter Geoghegan <[email protected]>
  parent: Mark Dilger <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Peter Geoghegan @ 2025-04-27 03:53 UTC (permalink / raw)
  To: Mark Dilger <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; [email protected]

On Sat, Apr 26, 2025 at 10:39 PM Mark Dilger
<[email protected]> wrote:
> Peter, Matthias, thanks kindly for the good work on skipscans!

Thanks!

> I found a test case which fails after commit 21a152b37f36c9563d1b0b058fe1436baf578b4c.  Please find a reproducible test case, attached.

The bug isn't actually in commit
21a152b37f36c9563d1b0b058fe1436baf578b4c -- it's just an accident that
the mechanism added by that commit happens to make your test case
fail. The underlying issue was introduced in commit 8a510275, "Further
optimize nbtree search scan key comparisons".

This looks to have been a silly oversight in our handling of NULL
tuple datums within _bt_check_compare. Attached provisional fix makes
your test case pass.

-- 
Peter Geoghegan


Attachments:

  [application/octet-stream] v1-0001-Provisional-fix.patch (849B, 2-v1-0001-Provisional-fix.patch)
  download | inline diff:
From 28c28301fd19e1dd388f297971a4379422b9ae5b Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <[email protected]>
Date: Sat, 26 Apr 2025 23:47:34 -0400
Subject: [PATCH v1] Provisional fix

---
 src/backend/access/nbtree/nbtutils.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 93bdbed04..ffad1bf48 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2960,6 +2960,10 @@ _bt_check_compare(IndexScanDesc scan, ScanDirection dir,
 					*continuescan = false;
 			}
 
+			if (unlikely(forcenonrequired && key->sk_flags & SK_BT_SKIP))
+				return _bt_advance_array_keys(scan, NULL, tuple, tupnatts,
+											  tupdesc, *ikey, false);
+
 			/*
 			 * In any case, this indextuple doesn't match the qual.
 			 */
-- 
2.49.0



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-27 17:06  Mark Dilger <[email protected]>
  parent: Peter Geoghegan <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Mark Dilger @ 2025-04-27 17:06 UTC (permalink / raw)
  To: Peter Geoghegan <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; [email protected]

On Sat, Apr 26, 2025 at 8:53 PM Peter Geoghegan <[email protected]> wrote:

> On Sat, Apr 26, 2025 at 10:39 PM Mark Dilger
> <[email protected]> wrote:
> > Peter, Matthias, thanks kindly for the good work on skipscans!
>
> Thanks!
>
> > I found a test case which fails after commit
> 21a152b37f36c9563d1b0b058fe1436baf578b4c.  Please find a reproducible test
> case, attached.
>
> The bug isn't actually in commit
> 21a152b37f36c9563d1b0b058fe1436baf578b4c -- it's just an accident that
> the mechanism added by that commit happens to make your test case
> fail. The underlying issue was introduced in commit 8a510275, "Further
> optimize nbtree search scan key comparisons".
>
> This looks to have been a silly oversight in our handling of NULL
> tuple datums within _bt_check_compare. Attached provisional fix makes
> your test case pass.
>
> --
> Peter Geoghegan
>

I can confirm that your patch fixes the problem, having spent some four
hours trying to find other test cases which still fail, finding none.

Thank you for the quick reply, and again for the work on btree.

-- 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-28 16:12  Peter Geoghegan <[email protected]>
  parent: Mark Dilger <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Peter Geoghegan @ 2025-04-28 16:12 UTC (permalink / raw)
  To: Mark Dilger <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; [email protected]

On Sun, Apr 27, 2025 at 1:06 PM Mark Dilger
<[email protected]> wrote:
> I can confirm that your patch fixes the problem, having spent some four hours trying to find other test cases which still fail, finding none.

Great.

I pushed essentially the same patch to HEAD just now.

Thanks for the report!

--
Peter Geoghegan





^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-30 20:10  Mark Dilger <[email protected]>
  parent: Peter Geoghegan <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Mark Dilger @ 2025-04-30 20:10 UTC (permalink / raw)
  To: Peter Geoghegan <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; [email protected]

On Mon, Apr 28, 2025 at 9:12 AM Peter Geoghegan <[email protected]> wrote:

> On Sun, Apr 27, 2025 at 1:06 PM Mark Dilger
> <[email protected]> wrote:
> > I can confirm that your patch fixes the problem, having spent some four
> hours trying to find other test cases which still fail, finding none.
>
> Great.
>
> I pushed essentially the same patch to HEAD just now.
>

A similar assertion can still be reached on HEAD using the attached (rather
contrived) regression test as of f60420cff66a9089a9b431f9c07f4a29aae4990a:

TRAP: failed Assert("sktrig_required"), File: "nbtutils.c", Line: 1861,
PID: 93648
0   postgres                            0x0000000100d1b0e0
ExceptionalCondition + 108
1   postgres                            0x00000001008b7658
_bt_advance_array_keys + 3828
2   postgres                            0x00000001008b5dc0
_bt_check_compare + 372
3   postgres                            0x00000001008b58ac _bt_checkkeys +
200
4   postgres                            0x00000001008b1400 _bt_readpage +
764
5   postgres                            0x00000001008b0454 _bt_readnextpage
+ 1260
6   postgres                            0x00000001008b0ab0 _bt_next + 76
7   postgres                            0x00000001008ac644 btgettuple + 136
8   postgres                            0x000000010089dcdc
index_getnext_tid + 68
9   postgres                            0x0000000100a297a0 IndexOnlyNext +
228
10  postgres                            0x0000000100a0db2c ExecScan + 228
11  postgres                            0x0000000100a39088 ExecSort + 536
12  postgres                            0x0000000100a03c68
standard_ExecutorRun + 312
13  postgres                            0x0000000100be5fb8 PortalRunSelect
+ 236
14  postgres                            0x0000000100be5bd4 PortalRun + 492
15  postgres                            0x0000000100be4b18
exec_simple_query + 1292
16  postgres                            0x0000000100be1d1c PostgresMain +
1388
17  postgres                            0x0000000100bdd9a8
BackendInitialize + 0
18  postgres                            0x0000000100b36dd8
postmaster_child_launch + 372
19  postgres                            0x0000000100b3b03c ServerLoop + 4948
20  postgres                            0x0000000100b39360
InitProcessGlobals + 0
21  postgres                            0x0000000100a58c00 help + 0
22  dyld                                0x000000018eb17154 start + 2476



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachments:

  [application/octet-stream] _skipscan.sql (7.7K, 3-_skipscan.sql)
  download

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
@ 2025-04-30 20:25  Peter Geoghegan <[email protected]>
  parent: Mark Dilger <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Peter Geoghegan @ 2025-04-30 20:25 UTC (permalink / raw)
  To: Mark Dilger <[email protected]>; +Cc: Matthias van de Meent <[email protected]>; [email protected]

On Wed, Apr 30, 2025 at 4:11 PM Mark Dilger
<[email protected]> wrote:
> A similar assertion can still be reached on HEAD using the attached (rather contrived) regression test as of f60420cff66a9089a9b431f9c07f4a29aae4990a:

I independently found the same bug just a few hours ago, using an
enhanced version of my python fuzzing script (posted to the main skip
scan thread) that now performs backwards scans. I'm not at all
surprised to see that your repro also uses ORDER BY ... DESC.

Would you post to that other thread going forward, rather than posting
to this -committers thread?

I'm going to talk about this on the main skip scan -hackers thread,
when I have a proper fix. I already have a fix for the issue that you
reported, but there seems to be an independent remaining issue with
backwards scans that I haven't quite got to the bottom of just yet.

Thanks
-- 
Peter Geoghegan






^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2025-04-30 20:25 UTC | newest]

Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-04 17:58 pgsql: Improve nbtree skip scan primitive scan scheduling. Peter Geoghegan <[email protected]>
2025-04-27 02:38 ` Mark Dilger <[email protected]>
2025-04-27 03:53   ` Peter Geoghegan <[email protected]>
2025-04-27 17:06     ` Mark Dilger <[email protected]>
2025-04-28 16:12       ` Peter Geoghegan <[email protected]>
2025-04-30 20:10         ` Mark Dilger <[email protected]>
2025-04-30 20:25           ` Peter Geoghegan <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox