pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed
[pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
19+ messages / 5 participants
[nested] [flat]

* [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 18:07 "vlsi (@vlsi)" <[email protected]>
  0 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-19 18:07 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

It looks like BitSet is faster than HashSet, so let's use it.

Unfortunately, BitSet would consume memory in case the Oid value is high, and Oids can be up to 32bit, so we limit the max OID stored in a BitSet with 8192*8, so we effectively limit the extra per-connection memory by 16KiB (8KiB for send and 8KiB for receive)

The current maximum id is 2950 (UUID), so the added overhead is about 800 bytes per connection.

Here are benchmark results with Apple M1 Max, Java 17.0.10 ARM64.

The diff is about 1ns per operation (I guess it is negligible), however, the fix makes "useBinaryForSend" disappear from async profiler results, so it might be a net win. It would be great to make an end-to-end measurement somehow (e.g. CPU% for the app)

23 is present in the set, 73 and 123456 are both absent.

```
<pre>
Benchmark                             (oid)  Mode  Cnt  Score   Error  Units
IntSetBenchmark.bitSet_contains          23  avgt   15  0,984 ± 0,009  ns/op
IntSetBenchmark.bitSet_contains          73  avgt   15  0,979 ± 0,005  ns/op
IntSetBenchmark.bitSet_contains      123456  avgt   15  0,688 ± 0,015  ns/op

IntSetBenchmark.hashSet_contains         23  avgt   15  2,026 ± 0,013  ns/op
IntSetBenchmark.hashSet_contains         73  avgt   15  1,985 ± 0,004  ns/op
IntSetBenchmark.hashSet_contains     123456  avgt   15  1,968 ± 0,032  ns/op

IntSetBenchmark.intSet_contains          23  avgt   15  1,101 ± 0,009  ns/op
IntSetBenchmark.intSet_contains          73  avgt   15  1,117 ± 0,005  ns/op
IntSetBenchmark.intSet_contains      123456  avgt   15  0,693 ± 0,010  ns/op

IntSetBenchmark.intOpenSet_contains      23  avgt   15  1,015 ± 0,011  ns/op
IntSetBenchmark.intOpenSet_contains      73  avgt   15  5,720 ± 0,596  ns/op
IntSetBenchmark.intOpenSet_contains  123456  avgt   15  8,430 ± 0,007  ns/op

IntSetBenchmark.roaring_contains         23  avgt   15  5,012 ± 0,044  ns/op
IntSetBenchmark.roaring_contains         73  avgt   15  5,561 ± 0,077  ns/op
IntSetBenchmark.roaring_contains     123456  avgt   15  1,163 ± 0,012  ns/op
```

/cc @plokhotnyuk

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 20:39 ` "davecramer (@davecramer)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: davecramer (@davecramer) @ 2024-05-19 20:39 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

I don't think limiting OID's to 32 bit is a good idea. There are more and more databases with very large schema's which would have considerably more OID's than 65k

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 20:51 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-19 20:51 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

This PR does not limit ids, and 32bit is the protocol limit (~ 4,294,967,295 oids)

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-19 21:18 ` "davecramer (@davecramer)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: davecramer (@davecramer) @ 2024-05-19 21:18 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> so we limit the max OID stored in a BitSet with 8192*8,

what does this mean ?

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 00:33 ` "bokken (@bokken)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: bokken (@bokken) @ 2024-05-20 00:33 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

How often is asSet() actually called?
How often do expect to provide binary support for oids beyond 64k?

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 00:35 ` "bokken (@bokken)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: bokken (@bokken) @ 2024-05-20 00:35 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

Instead of this method, should IntSet implement Collection<Integer>?

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 05:42 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-20 05:42 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

Implementing colection would complicate implementation and testing. At the same time, Collection requires boxing which we do not want

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 06:08 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: plokhotnyuk (@plokhotnyuk) @ 2024-05-20 06:08 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Other options would be using or reimplantation of IntHashMap (from agrona or fastutil libraries) or RoaringBitmap (https://github.com/RoaringBitmap/RoaringBitmap).

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:13 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-20 07:13 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Both `RoaringBitmap` and `IntOpenHashSet` (`it.unimi.dsi:fastutil`) are somewhat slower for our use case.

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:17 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-20 07:17 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Well, in theory, `asSet()` is used only in deprecated method `org.postgresql.core.QueryExecutor#getBinaryReceiveOids` that is never used within the driver's code.

I guess we can make it less efficient, so we could store small values in `BitSet` and large values in `HashSet` at a cost of slower `asSet()`

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:48 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-20 07:48 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> so we limit the max OID stored in a BitSet with 8192*8,

It means we store small values in `BitSet` and large values (if any) in `HashSet`.

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-05-20 07:54 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-05-20 07:54 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

>How often is asSet() actually called?

It is used in a deprecated method only, so it is there for backward compatibility. I am leaning towards we could make this method "arbitrary slow".

> How often do expect to provide binary support for oids beyond 64k?

We have a fixed set of types we currently enable for binary. The maximal one is `UUID` which is `2950`.
Users can add extra types with `binaryTransferEnable` connection property. I doubt they do so.


^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-03 11:06 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: plokhotnyuk (@plokhotnyuk) @ 2024-06-03 11:06 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

@bokken Could you please review this PR to be merged and released in the next version?

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-18 20:39 ` "svendiedrichsen (@svendiedrichsen)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: svendiedrichsen (@svendiedrichsen) @ 2024-06-18 20:39 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

@vlsi Is there anyone else to review this? Would be great to have it in the next release.

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-19 01:50 ` "bokken (@bokken)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: bokken (@bokken) @ 2024-06-19 01:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

Final?

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-19 01:51 ` "bokken (@bokken)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: bokken (@bokken) @ 2024-06-19 01:51 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java:51)

Perhaps a comment that null values are not supported.

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-23 17:40 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-06-23 17:40 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java)

added `final`

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-06-23 17:41 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-06-23 17:41 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/util/internal/IntSet.java:51)

The parameter type is not nullable (`Collection<? extends Integer>` vs `Collection<? extends @Nullable Integer>`)

^ permalink  raw  reply  [nested|flat] 19+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text
@ 2024-12-08 10:39 ` "vlsi (@vlsi)" <[email protected]>
  17 siblings, 0 replies; 19+ messages in thread

From: vlsi (@vlsi) @ 2024-12-08 10:39 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

This has been merged in 5e3190d5f3ad667a42210bee881b04404332cf30

^ permalink  raw  reply  [nested|flat] 19+ messages in thread


end of thread, other threads:[~2024-12-08 10:39 UTC | newest]

Thread overview: 19+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-05-19 18:07 [pgjdbc/pgjdbc] PR #3249: perf: optimize Set<Integer> which are used for checking if oid should be transferred with binary or text "vlsi (@vlsi)" <[email protected]>
2024-05-19 20:39 ` "davecramer (@davecramer)" <[email protected]>
2024-05-19 20:51 ` "vlsi (@vlsi)" <[email protected]>
2024-05-19 21:18 ` "davecramer (@davecramer)" <[email protected]>
2024-05-20 00:33 ` "bokken (@bokken)" <[email protected]>
2024-05-20 00:35 ` "bokken (@bokken)" <[email protected]>
2024-05-20 05:42 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 06:08 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
2024-05-20 07:13 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:17 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:48 ` "vlsi (@vlsi)" <[email protected]>
2024-05-20 07:54 ` "vlsi (@vlsi)" <[email protected]>
2024-06-03 11:06 ` "plokhotnyuk (@plokhotnyuk)" <[email protected]>
2024-06-18 20:39 ` "svendiedrichsen (@svendiedrichsen)" <[email protected]>
2024-06-19 01:50 ` "bokken (@bokken)" <[email protected]>
2024-06-19 01:51 ` "bokken (@bokken)" <[email protected]>
2024-06-23 17:40 ` "vlsi (@vlsi)" <[email protected]>
2024-06-23 17:41 ` "vlsi (@vlsi)" <[email protected]>
2024-12-08 10:39 ` "vlsi (@vlsi)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox