public inbox for [email protected]  
help / color / mirror / Atom feed
From: Mark Dilger <[email protected]>
To: Peter Geoghegan <[email protected]>
To: Matthias van de Meent <[email protected]>
Cc: [email protected]
Subject: Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
Date: Sat, 26 Apr 2025 19:38:51 -0700
Message-ID: <CAHgHdKtLFWZcjr87hMH0hYDHgcifu4Tj7iHz-xh8qsJREt5cqA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Peter, Matthias, thanks kindly for the good work on skipscans!

While admiring the recent performance improvements, I found a test case
which fails after commit 21a152b37f36c9563d1b0b058fe1436baf578b4c.  Please
find a reproducible test case, attached.

Thanks!



On Fri, Apr 4, 2025 at 10:58 AM Peter Geoghegan <[email protected]> wrote:

> Improve nbtree skip scan primitive scan scheduling.
>
> Don't allow nbtree scans with skip arrays to end any primitive scan on
> its first leaf page without giving some consideration to how many times
> the scan's arrays advanced while changing at least one skip array
> (though continue not caring about the number of array advancements that
> only affected SAOP arrays, even during skip scans with SAOP arrays).
> Now when a scan performs more than 3 such array advancements in the
> course of reading a single leaf page, it is taken as a signal that the
> next page is unlikely to be skippable.  We'll therefore continue the
> ongoing primitive index scan, at least until we can perform a recheck
> against the next page's finaltup.
>
> Testing has shown that this new heuristic occasionally makes all the
> difference with skip scans that were expected to rely on the "passed
> first page" heuristic added by commit 9a2e2a28.  Without it, there is a
> remaining risk that certain kinds of skip scans will never quite manage
> to clear the initial hurdle of performing a primitive scan that lasts
> beyond its first leaf page (or that such a skip scan will only clear
> that initial hurdle when it has already wasted noticeably-many cycles
> due to inefficient primitive scan scheduling).
>
> Follow-up to commits 92fe23d9 and 9a2e2a28.
>
> Author: Peter Geoghegan <[email protected]>
> Reviewed-By: Matthias van de Meent <[email protected]>
> Discussion:
> https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com
>
> Branch
> ------
> master
>
> Details
> -------
>
> https://git.postgresql.org/pg/commitdiff/21a152b37f36c9563d1b0b058fe1436baf578b4c
>
> Modified Files
> --------------
> src/backend/access/nbtree/nbtsearch.c | 16 +++++++
> src/backend/access/nbtree/nbtutils.c  | 90
> +++++++++++++++++++++++------------
> src/include/access/nbtree.h           |  3 +-
> 3 files changed, 78 insertions(+), 31 deletions(-)
>
>

-- 
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attachments:

  [application/octet-stream] v1-0001-Add-test-showing-a-problem-with-btree-index-scans.patch (6.5K, 3-v1-0001-Add-test-showing-a-problem-with-btree-index-scans.patch)
  download | inline diff:
From 8e4015e01c416448795f29003f5a667ef45437d7 Mon Sep 17 00:00:00 2001
From: Mark Dilger <[email protected]>
Date: Sat, 26 Apr 2025 19:21:27 -0700
Subject: [PATCH v1] Add test showing a problem with btree index scans

While working with a more complex set of changes, I discovered a
problem which might have been with btree, though it could have been
with some of my own code changes.  At that time, while playing with
a much more complex regression test (not included here owing to the
interplay of my code changes which are not relevant), I caught a
stack trace:

TRAP: failed Assert("sktrig_required"), File: "nbtutils.c", Line: 1861, PID: 74377
0   postgres                            0x0000000102c485d0 ExceptionalCondition + 108
1   postgres                            0x00000001027e6ae8 _bt_advance_array_keys + 3828
2   postgres                            0x00000001027e5280 _bt_check_compare + 372
3   postgres                            0x00000001027e4d6c _bt_checkkeys + 200
4   postgres                            0x00000001027e08c0 _bt_readpage + 764
5   postgres                            0x00000001027df914 _bt_readnextpage + 1260
6   postgres                            0x00000001027dff70 _bt_next + 76
7   postgres                            0x00000001027dbc48 btgetbitmap + 136
8   postgres                            0x00000001027cd420 index_getbitmap + 64
9   postgres                            0x000000010294d480 MultiExecBitmapIndexScan + 204
10  postgres                            0x000000010294d0dc BitmapHeapNext + 312
11  postgres                            0x000000010293cf48 ExecScan + 228
12  postgres                            0x0000000102949a48 fetch_input_tuple + 116
13  postgres                            0x0000000102947f60 ExecAgg + 1372
14  postgres                            0x00000001029330f0 standard_ExecutorRun + 312
15  postgres                            0x0000000102b13df4 PortalRunSelect + 236
16  postgres                            0x0000000102b13a10 PortalRun + 492
17  postgres                            0x0000000102b12954 exec_simple_query + 1292
18  postgres                            0x0000000102b0fb58 PostgresMain + 1384
19  postgres                            0x0000000102b0b7fc BackendInitialize + 0
20  postgres                            0x0000000102a65d8c postmaster_child_launch + 372
21  postgres                            0x0000000102a69fdc ServerLoop + 4948
22  postgres                            0x0000000102a68300 InitProcessGlobals + 0
23  postgres                            0x0000000102987f0c help + 0
24  dyld                                0x000000018eb17154 start + 2476
2025-04-26 16:11:35.168 PDT postmaster[74356] LOG:  client backend (PID 74377) was terminated by signal 6: Abort trap: 6

After removing my own changes and code, I did not happen upon any
similar stack traces, but managed to isolate a simplified regression
test that consistently reproduces the bug.  A patch of that test is
included here.
---
 src/test/regress/expected/_bt_skipscan.out | 53 ++++++++++++++++++++++
 src/test/regress/parallel_schedule         |  2 +
 src/test/regress/sql/_bt_skipscan.sql      | 40 ++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 src/test/regress/expected/_bt_skipscan.out
 create mode 100644 src/test/regress/sql/_bt_skipscan.sql

diff --git a/src/test/regress/expected/_bt_skipscan.out b/src/test/regress/expected/_bt_skipscan.out
new file mode 100644
index 00000000000..0a70a554164
--- /dev/null
+++ b/src/test/regress/expected/_bt_skipscan.out
@@ -0,0 +1,53 @@
+CREATE TABLE test (
+	aid			INTEGER,
+	bid			INTEGER,
+	ary			FLOAT4[]
+);
+SELECT setseed(3.0/1024.0);
+ setseed 
+---------
+ 
+(1 row)
+
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, NULL, ary FROM 
+		(SELECT aid, array_agg(random()) AS ary
+			FROM generate_series(1,1000) AS aid, generate_series(1,10)
+			GROUP BY aid
+		) AS ss
+	);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, drift, array_agg(random()) AS ary
+		FROM test, generate_series(1,1000) AS drift, generate_series(1,10)
+		GROUP BY aid, drift
+	);
+CREATE INDEX test_idx ON test USING btree (aid, bid, ary);
+ANALYZE test;
+-- scan the table, ignoring the index
+SET enable_seqscan = on;
+SET enable_indexscan = off;
+SET enable_bitmapscan = off;
+SET enable_indexonlyscan = off;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+ count 
+-------
+   424
+(1 row)
+
+-- scan the index, expecting the same results as above
+SET enable_seqscan = off;
+SET enable_indexscan = on;
+SET enable_bitmapscan = on;
+SET enable_indexonlyscan = on;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+ count 
+-------
+   424
+(1 row)
+
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a424be2a6bf..043f472a604 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -11,6 +11,8 @@
 # required setup steps
 test: test_setup
 
+test: _bt_skipscan
+
 # ----------
 # The first group of parallel tests
 # ----------
diff --git a/src/test/regress/sql/_bt_skipscan.sql b/src/test/regress/sql/_bt_skipscan.sql
new file mode 100644
index 00000000000..8641f10c947
--- /dev/null
+++ b/src/test/regress/sql/_bt_skipscan.sql
@@ -0,0 +1,40 @@
+CREATE TABLE test (
+	aid			INTEGER,
+	bid			INTEGER,
+	ary			FLOAT4[]
+);
+SELECT setseed(3.0/1024.0);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, NULL, ary FROM 
+		(SELECT aid, array_agg(random()) AS ary
+			FROM generate_series(1,1000) AS aid, generate_series(1,10)
+			GROUP BY aid
+		) AS ss
+	);
+INSERT INTO test (aid, bid, ary)
+	(SELECT aid, drift, array_agg(random()) AS ary
+		FROM test, generate_series(1,1000) AS drift, generate_series(1,10)
+		GROUP BY aid, drift
+	);
+CREATE INDEX test_idx ON test USING btree (aid, bid, ary);
+ANALYZE test;
+
+-- scan the table, ignoring the index
+SET enable_seqscan = on;
+SET enable_indexscan = off;
+SET enable_bitmapscan = off;
+SET enable_indexonlyscan = off;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
+
+-- scan the index, expecting the same results as above
+SET enable_seqscan = off;
+SET enable_indexscan = on;
+SET enable_bitmapscan = on;
+SET enable_indexonlyscan = on;
+SELECT COUNT(*)
+	FROM test
+	WHERE aid = ANY(ARRAY[1,10,100,1000])
+	  AND ary < '{0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0,9,1.0}'::float4[];
-- 
2.39.5 (Apple Git-154)



view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pgsql: Improve nbtree skip scan primitive scan scheduling.
  In-Reply-To: <CAHgHdKtLFWZcjr87hMH0hYDHgcifu4Tj7iHz-xh8qsJREt5cqA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox