public inbox for [email protected]  
help / color / mirror / Atom feed
pgsql: Fix some performance issues in GIN query startup.
6+ messages / 1 participants
[nested] [flat]

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
REL_14_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/e2a6934a8855bc4c8421412bc4d9495de0753ab5

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
REL_15_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/2d313375c092d5bbe67539bfa3c1630808339f89

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
REL_17_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/9094eb25b7d956307677f9ff26c8a3fc900ca1c0

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
REL_13_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/308d0d443770a0dfe49e7e253792dc6b3a149fb0

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
REL_16_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/d52221cf0de411613ab7e0fe8b44e30cef64f1aa

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* pgsql: Fix some performance issues in GIN query startup.
@ 2025-03-06 16:54 Tom Lane <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Tom Lane @ 2025-03-06 16:54 UTC (permalink / raw)
  To: [email protected]

Fix some performance issues in GIN query startup.

If a GIN index search had a lot of search keys (for example,
"jsonbcol ?| array[]" with tens of thousands of array elements),
both ginFillScanKey() and startScanKey() took O(N^2) time.
Worse, those loops were uncancelable for lack of CHECK_FOR_INTERRUPTS.

The problem in ginFillScanKey() is the brute-force search key
de-duplication done in ginFillScanEntry().  The most expedient
solution seems to be to just stop trying to de-duplicate once
there are "too many" search keys.  We could imagine working harder,
say by using a sort-and-unique algorithm instead of brute force
compare-all-the-keys.  But it seems unlikely to be worth the trouble.
There is no correctness issue here, since the code already allowed
duplicate keys if any extra_data is present.

The problem in startScanKey() is the loop that attempts to identify
the first non-required search key.  In the submitted test case, that
vainly tests all the key positions, and each iteration takes O(N)
time.  One part of that is that it's reinitializing the entryRes[]
array from scratch each time, which is entirely unnecessary given
that the triConsistentFn isn't supposed to scribble on its input.
We can easily adjust the array contents incrementally instead.
The other part of it is that the triConsistentFn may itself take
O(N) time (and does in this test case).  This is all extremely
brute force: in simple cases with AND or OR semantics, we could
know without any looping whatever that all or none of the keys
are required.  But GIN opclasses don't have any API for exposing
that knowledge, so at least in the short run there is little to
be done about that.  Put in a CHECK_FOR_INTERRUPTS so that at
least the loop is cancelable.

These two changes together resolve the primary complaint that
the test query doesn't respond promptly to cancel interrupts.
Also, while they don't completely eliminate the O(N^2) behavior,
they do provide quite a nice speedup for mid-sized examples.

Bug: #18831
Reported-by: Niek <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0f21db36d663fcf0789290902c84cc460ef0df8b

Modified Files
--------------
src/backend/access/gin/ginget.c  | 10 ++++++----
src/backend/access/gin/ginscan.c |  7 ++++++-
2 files changed, 12 insertions(+), 5 deletions(-)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2025-03-06 16:54 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>
2025-03-06 16:54 pgsql: Fix some performance issues in GIN query startup. Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox