pgjdbc/pgjdbc GitHub issues and pull requests (mirror)
help / color / mirror / Atom feed[pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
19+ messages / 5 participants
[nested] [flat]
* [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 18:07 "vlsi (@vlsi)" <[email protected]>
0 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-19 18:07 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
It looks like BitSet is faster than HashSet, so let's use it.
Unfortunately, BitSet would consume memory in case the Oid value is high, and Oids can be up to 32bit, so we limit the max OID stored in a BitSet with 8192*8, so we effectively limit the extra per-connection memory by 16KiB (8KiB for send and 8KiB for receive)
The current maximum id is 2950 (UUID), so the added overhead is about 800 bytes per connection.
Here are benchmark results with Apple M1 Max, Java 17.0.10 ARM64.
The diff is about 1ns per operation (I guess it is negligible), however, the fix makes "useBinaryForSend" disappear from async profiler results, so it might be a net win. It would be great to make an end-to-end measurement somehow (e.g. CPU% for the app)
23 is present in the set, 73 and 123456 are both absent.
```
<pre>
Benchmark (oid) Mode Cnt Score Error Units
IntSetBenchmark.bitSet_contains 23 avgt 15 0,984 ± 0,009 ns/op
IntSetBenchmark.bitSet_contains 73 avgt 15 0,979 ± 0,005 ns/op
IntSetBenchmark.bitSet_contains 123456 avgt 15 0,688 ± 0,015 ns/op
IntSetBenchmark.hashSet_contains 23 avgt 15 2,026 ± 0,013 ns/op
IntSetBenchmark.hashSet_contains 73 avgt 15 1,985 ± 0,004 ns/op
IntSetBenchmark.hashSet_contains 123456 avgt 15 1,968 ± 0,032 ns/op
IntSetBenchmark.intSet_contains 23 avgt 15 1,101 ± 0,009 ns/op
IntSetBenchmark.intSet_contains 73 avgt 15 1,117 ± 0,005 ns/op
IntSetBenchmark.intSet_contains 123456 avgt 15 0,693 ± 0,010 ns/op
IntSetBenchmark.intOpenSet_contains 23 avgt 15 1,015 ± 0,011 ns/op
IntSetBenchmark.intOpenSet_contains 73 avgt 15 5,720 ± 0,596 ns/op
IntSetBenchmark.intOpenSet_contains 123456 avgt 15 8,430 ± 0,007 ns/op
IntSetBenchmark.roaring_contains 23 avgt 15 5,012 ± 0,044 ns/op
IntSetBenchmark.roaring_contains 73 avgt 15 5,561 ± 0,077 ns/op
IntSetBenchmark.roaring_contains 123456 avgt 15 1,163 ± 0,012 ns/op
```
/cc @plokhotnyuk
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 20:39 "davecramer (@davecramer)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: davecramer (@davecramer) @ 2024-05-19 20:39 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
I don't think limiting OID's to 32 bit is a good idea. There are more and more databases with very large schema's which would have considerably more OID's than 65k
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 20:51 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-19 20:51 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
This PR does not limit ids, and 32bit is the protocol limit (~ 4,294,967,295 oids)
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 21:18 "davecramer (@davecramer)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: davecramer (@davecramer) @ 2024-05-19 21:18 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
> so we limit the max OID stored in a BitSet with 8192*8,
what does this mean ?
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 00:33 "bokken (@bokken)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: bokken (@bokken) @ 2024-05-20 00:33 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
How often is asSet() actually called?
How often do expect to provide binary support for oids beyond 64k?
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 00:35 "bokken (@bokken)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: bokken (@bokken) @ 2024-05-20 00:35 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
Instead of this method, should IntSet implement Collection<Integer>?
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 05:42 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-20 05:42 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
Implementing colection would complicate implementation and testing. At the same time, Collection requires boxing which we do not want
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 06:08 "plokhotnyuk (@plokhotnyuk)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: plokhotnyuk (@plokhotnyuk) @ 2024-05-20 06:08 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
Other options would be using or reimplantation of IntHashMap (from agrona or fastutil libraries) or RoaringBitmap (https://github.com/RoaringBitmap/RoaringBitmap).
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:13 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-20 07:13 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
Both `RoaringBitmap` and `IntOpenHashSet` (`it.unimi.dsi:fastutil`) are somewhat slower for our use case.
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:17 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-20 07:17 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
Well, in theory, `asSet()` is used only in deprecated method `org.postgresql.core.QueryExecutor#getBinaryReceiveOids` that is never used within the driver's code.
I guess we can make it less efficient, so we could store small values in `BitSet` and large values in `HashSet` at a cost of slower `asSet()`
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:48 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-20 07:48 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
> so we limit the max OID stored in a BitSet with 8192*8,
It means we store small values in `BitSet` and large values (if any) in `HashSet`.
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:54 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-05-20 07:54 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
>How often is asSet() actually called?
It is used in a deprecated method only, so it is there for backward compatibility. I am leaning towards we could make this method "arbitrary slow".
> How often do expect to provide binary support for oids beyond 64k?
We have a fixed set of types we currently enable for binary. The maximal one is `UUID` which is `2950`.
Users can add extra types with `binaryTransferEnable` connection property. I doubt they do so.
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-03 11:06 "plokhotnyuk (@plokhotnyuk)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: plokhotnyuk (@plokhotnyuk) @ 2024-06-03 11:06 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
@bokken Could you please review this PR to be merged and released in the next version?
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-18 20:39 "svendiedrichsen (@svendiedrichsen)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: svendiedrichsen (@svendiedrichsen) @ 2024-06-18 20:39 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
@vlsi Is there anyone else to review this? Would be great to have it in the next release.
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-19 01:50 "bokken (@bokken)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: bokken (@bokken) @ 2024-06-19 01:50 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
Final?
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-19 01:51 "bokken (@bokken)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: bokken (@bokken) @ 2024-06-19 01:51 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java:51)
Perhaps a comment that null values are not supported.
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-23 17:40 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-06-23 17:40 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)
added `final`
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-23 17:41 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-06-23 17:41 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java:51)
The parameter type is not nullable (`Collection<? extends Integer>` vs `Collection<? extends @Nullable Integer>`)
^ permalink raw reply [nested|flat] 19+ messages in thread
* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-12-08 10:39 "vlsi (@vlsi)" <[email protected]>
17 siblings, 0 replies; 19+ messages in thread
From: vlsi (@vlsi) @ 2024-12-08 10:39 UTC (permalink / raw)
To: pgjdbc/pgjdbc <[email protected]>
This has been merged in 5e3190d5f3ad667a42210bee881b04404332cf30
^ permalink raw reply [nested|flat] 19+ messages in thread
end of thread, other threads:[~2024-12-08 10:39 UTC | newest]
Thread overview: 19+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-05-19 18:07 [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text "vlsi (@vlsi)" <[email protected]>
2024-05-19 20:39 ` "davecramer (@davecramer)" <[email protected]>
2024-05-19 20:51 ` "vlsi (@vlsi)" <[email protected]>
2024-05-19 21:18 ` "davecramer (@davecramer)" <[email protected]>
2024-05-20 00:33 ` "bokken (@bokken)" <[email protected]>
2024-05-20 00:35 ` "bokken (@bokken)" <[email protected]>
2024-05-20 05:42 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 06:08 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
2024-05-20 07:13 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:17 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:48 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:54 ` "vlsi (@vlsi)" <[email protected]>
2024-06-03 11:06 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
2024-06-18 20:39 ` "svendiedrichsen (@svendiedrichsen)" <[email protected]>
2024-06-19 01:50 ` "bokken (@bokken)" <[email protected]>
2024-06-19 01:51 ` "bokken (@bokken)" <[email protected]>
2024-06-23 17:40 ` "vlsi (@vlsi)" <[email protected]>
2024-06-23 17:41 ` "vlsi (@vlsi)" <[email protected]>
2024-12-08 10:39 ` "vlsi (@vlsi)" <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox