[pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility

pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed

[pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
11+ messages / 2 participants
[nested] [flat]

* [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-02-20 11:45 "hlcianfagna (@hlcianfagna)" <[email protected]>
  0 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-02-20 11:45 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

https://github.com/pgjdbc/pgjdbc/pull/3434 introduced a change to avoid mistakenly treating included columns as part of the PK, however the way this is done relies on a particular metadata field which some "PostgreSQL-compatible" database systems (e.g. CrateDB) do not populate in the same way as PostgreSQL does, this refactor addresses this by getting the same information via the standard `information_schema.key_column_usage` view, this has the added benefit of shaving a few milliseconds on the `getPrimaryKeys` call.

### All Submissions:

* [YES] Have you followed the guidelines in our [Contributing](https://github.com/pgjdbc/pgjdbc/blob/master/CONTRIBUTING.md) document?
* [YES] Have you checked to ensure there aren't other open [Pull Requests](../../pulls) for the same update/change?

### Changes to Existing Features:

* [NO] Does this break existing behaviour? If so please explain.
* [YES] Have you added an explanation of what your changes do and why you'd like us to include them?
* [NA] Have you written new tests for your core changes, as applicable?
* [YES] Have you successfully run tests with your changes locally?

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-02-20 12:01 ` "davecramer (@davecramer)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: davecramer (@davecramer) @ 2025-02-20 12:01 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

Please make sure this does not introduce a performance regression

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-02-28 10:09 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-02-28 10:09 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> Please make sure this does not introduce a performance regression

Good point @davecramer , is there any particular benchmark that you would use to validate this?

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-02-28 12:17 ` "davecramer (@davecramer)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: davecramer (@davecramer) @ 2025-02-28 12:17 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

No real benchmark, just run explain analyze on the two queries to make sure you don't introduce a regression

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-07 17:50 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-03-07 17:50 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> No real benchmark, just run explain analyze on the two queries to make sure you don't introduce a regression

This new version I just committed is actually faster that what we have in master:

<details>

<summary>Before this PR</summary>

```
QUERY PLAN                                                                                                                                                               |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Sort  (cost=4446.51..4446.54 rows=13 width=324) (actual time=4.990..5.010 rows=162 loops=1)                                                                              |
  Sort Key: result.table_name, result.pk_name, result.key_seq                                                                                                            |
  Sort Method: quicksort  Memory: 79kB                                                                                                                                   |
  ->  Subquery Scan on result  (cost=53.16..4446.27 rows=13 width=324) (actual time=0.770..4.839 rows=162 loops=1)                                                       |
        Filter: ((result.key_seq <= result.key_count) AND (result.a_attnum = (result.keys).x))                                                                           |
        Rows Removed by Filter: 1794                                                                                                                                     |
        ->  Result  (cost=53.16..4331.07 rows=7680 width=360) (actual time=0.767..4.645 rows=1956 loops=1)                                                               |
              ->  ProjectSet  (cost=53.16..395.07 rows=7680 width=292) (actual time=0.762..4.224 rows=1956 loops=1)                                                      |
                    ->  Hash Join  (cost=53.16..160.83 rows=768 width=287) (actual time=0.570..1.256 rows=1316 loops=1)                                                  |
                          Hash Cond: (a.attrelid = ct.oid)                                                                                                               |
                          ->  Seq Scan on pg_attribute a  (cost=0.00..88.26 rows=3126 width=70) (actual time=0.007..0.256 rows=3126 loops=1)                             |
                          ->  Hash  (cost=51.89..51.89 rows=102 width=229) (actual time=0.520..0.533 rows=102 loops=1)                                                   |
                                Buckets: 1024  Batches: 1  Memory Usage: 35kB                                                                                            |
                                ->  Hash Join  (cost=28.53..51.89 rows=102 width=229) (actual time=0.322..0.479 rows=102 loops=1)                                        |
                                      Hash Cond: (ct.relnamespace = n.oid)                                                                                               |
                                      ->  Hash Join  (cost=27.44..50.24 rows=102 width=169) (actual time=0.301..0.427 rows=102 loops=1)                                  |
                                            Hash Cond: (ct.oid = i.indrelid)                                                                                             |
                                            ->  Seq Scan on pg_class ct  (cost=0.00..18.15 rows=415 width=72) (actual time=0.003..0.043 rows=415 loops=1)                |
                                            ->  Hash  (cost=26.17..26.17 rows=102 width=97) (actual time=0.261..0.272 rows=102 loops=1)                                  |
                                                  Buckets: 1024  Batches: 1  Memory Usage: 21kB                                                                          |
                                                  ->  Hash Join  (cost=6.92..26.17 rows=102 width=97) (actual time=0.088..0.228 rows=102 loops=1)                        |
                                                        Hash Cond: (ci.oid = i.indexrelid)                                                                               |
                                                        ->  Seq Scan on pg_class ci  (cost=0.00..18.15 rows=415 width=68) (actual time=0.002..0.054 rows=415 loops=1)    |
                                                        ->  Hash  (cost=5.64..5.64 rows=102 width=37) (actual time=0.077..0.078 rows=102 loops=1)                        |
                                                              Buckets: 1024  Batches: 1  Memory Usage: 15kB                                                              |
                                                              ->  Seq Scan on pg_index i  (cost=0.00..5.64 rows=102 width=37) (actual time=0.004..0.040 rows=102 loops=1)|
                                                                    Filter: indisprimary                                                                                 |
                                                                    Rows Removed by Filter: 62                                                                           |
                                      ->  Hash  (cost=1.04..1.04 rows=4 width=68) (actual time=0.009..0.009 rows=5 loops=1)                                              |
                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                 |
                                            ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68) (actual time=0.005..0.006 rows=5 loops=1)                  |
Planning Time: 0.772 ms                                                                                                                                                  |
Execution Time: 5.162 ms                                                                                                                                                 |
```

</details>

<details>

<summary>After this PR</summary>

```
QUERY PLAN                                                                                                                                                                                             |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Sort  (cost=93.68..93.68 rows=1 width=324) (actual time=1.222..1.228 rows=82 loops=1)                                                                                                                  |
  Sort Key: ((ss.relname)::information_schema.sql_identifier), ((ss.conname)::information_schema.sql_identifier), (((ss.x).n)::information_schema.cardinal_number)                                     |
  Sort Method: quicksort  Memory: 52kB                                                                                                                                                                 |
  ->  Result  (cost=31.73..93.67 rows=1 width=324) (actual time=0.789..1.186 rows=82 loops=1)                                                                                                          |
        One-Time Filter: (current_database() = current_database())                                                                                                                                     |
        ->  Nested Loop  (cost=31.73..93.66 rows=1 width=288) (actual time=0.786..1.160 rows=82 loops=1)                                                                                               |
              ->  Nested Loop  (cost=31.44..57.45 rows=7 width=232) (actual time=0.763..0.979 rows=82 loops=1)                                                                                         |
                    ->  Hash Join  (cost=31.29..51.28 rows=28 width=236) (actual time=0.759..0.839 rows=181 loops=1)                                                                                   |
                          Hash Cond: (pg_class_key.relname = ss.conname)                                                                                                                               |
                          ->  Seq Scan on pg_class pg_class_key  (cost=0.00..18.15 rows=415 width=68) (actual time=0.006..0.029 rows=415 loops=1)                                                      |
                          ->  Hash  (cost=30.94..30.94 rows=28 width=232) (actual time=0.721..0.724 rows=181 loops=1)                                                                                  |
                                Buckets: 1024  Batches: 1  Memory Usage: 55kB                                                                                                                          |
                                ->  Subquery Scan on ss  (cost=21.27..30.94 rows=28 width=232) (actual time=0.300..0.648 rows=181 loops=1)                                                             |
                                      ->  ProjectSet  (cost=21.27..30.66 rows=28 width=341) (actual time=0.299..0.630 rows=181 loops=1)                                                                |
                                            ->  Nested Loop  (cost=21.27..26.95 rows=14 width=224) (actual time=0.160..0.253 rows=110 loops=1)                                                         |
                                                  ->  Hash Join  (cost=21.13..26.05 rows=14 width=228) (actual time=0.134..0.191 rows=110 loops=1)                                                     |
                                                        Hash Cond: (r.relnamespace = nr.oid)                                                                                                           |
                                                        ->  Hash Join  (cost=20.04..24.87 rows=18 width=168) (actual time=0.106..0.146 rows=110 loops=1)                                               |
                                                              Hash Cond: (c.conrelid = r.oid)                                                                                                          |
                                                              ->  Seq Scan on pg_constraint c  (cost=0.00..4.54 rows=110 width=96) (actual time=0.004..0.018 rows=110 loops=1)                         |
                                                                    Filter: (contype = ANY ('{p,u,f}'::"char"[]))                                                                                      |
                                                                    Rows Removed by Filter: 2                                                                                                          |
                                                              ->  Hash  (cost=19.19..19.19 rows=68 width=76) (actual time=0.098..0.098 rows=68 loops=1)                                                |
                                                                    Buckets: 1024  Batches: 1  Memory Usage: 16kB                                                                                      |
                                                                    ->  Seq Scan on pg_class r  (cost=0.00..19.19 rows=68 width=76) (actual time=0.014..0.072 rows=68 loops=1)                         |
                                                                          Filter: (relkind = ANY ('{r,p}'::"char"[]))                                                                                  |
                                                                          Rows Removed by Filter: 347                                                                                                  |
                                                        ->  Hash  (cost=1.05..1.05 rows=3 width=68) (actual time=0.018..0.018 rows=5 loops=1)                                                          |
                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                             |
                                                              ->  Seq Scan on pg_namespace nr  (cost=0.00..1.05 rows=3 width=68) (actual time=0.014..0.016 rows=5 loops=1)                             |
                                                                    Filter: (NOT pg_is_other_temp_schema(oid))                                                                                         |
                                                  ->  Memoize  (cost=0.14..0.23 rows=1 width=4) (actual time=0.000..0.000 rows=1 loops=110)                                                            |
                                                        Cache Key: c.connamespace                                                                                                                      |
                                                        Cache Mode: logical                                                                                                                            |
                                                        Hits: 109  Misses: 1  Evictions: 0  Overflows: 0  Memory Usage: 1kB                                                                            |
                                                        ->  Index Only Scan using pg_namespace_oid_index on pg_namespace nc  (cost=0.13..0.22 rows=1 width=4) (actual time=0.022..0.022 rows=1 loops=1)|
                                                              Index Cond: (oid = c.connamespace)                                                                                                       |
                                                              Heap Fetches: 1                                                                                                                          |
                    ->  Index Scan using pg_index_indexrelid_index on pg_index i  (cost=0.14..0.22 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=181)                                         |
                          Index Cond: (indexrelid = pg_class_key.oid)                                                                                                                                  |
                          Filter: indisprimary                                                                                                                                                         |
                          Rows Removed by Filter: 1                                                                                                                                                    |
              ->  Index Scan using pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.28..5.16 rows=1 width=70) (actual time=0.002..0.002 rows=1 loops=82)                                     |
                    Index Cond: ((attrelid = ss.roid) AND (attnum = (ss.x).x))                                                                                                                         |
                    Filter: ((NOT attisdropped) AND (pg_has_role(ss.relowner, 'USAGE'::text) OR has_column_privilege(ss.roid, attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))                    |
Planning Time: 0.890 ms                                                                                                                                                                                |
Execution Time: 1.326 ms                                                                                                                                                                               |
```

</details>

The only thing is that `information_schema.key_column_usage` does not seem to include data for the `pg_toast` schema, would that be acceptable?

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-08 11:38 ` "davecramer (@davecramer)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: davecramer (@davecramer) @ 2025-03-08 11:38 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

seems there is a compile error

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-10 10:19 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-03-10 10:19 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

> seems there is a compile error

Apologies there was a missing semicolon

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-10 13:44 ` "davecramer (@davecramer)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: davecramer (@davecramer) @ 2025-03-10 13:44 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/jdbc/PgDatabaseMetaData.java:2347)

I think you need to rebase. This code where we are selecting `current_database() as TABLE_CAT` is in HEAD

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-10 14:37 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-03-10 14:37 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/jdbc/PgDatabaseMetaData.java:2347)

This has not changed since I forked this branch, but in the version I am proposing I have `table_catalog=current_database()` in the `WHERE` clause so I can select `table_catalog` as `TABLE_CAT`

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-10 14:41 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-03-10 14:41 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

(on pgjdbc/src/main/java/org/postgresql/jdbc/PgDatabaseMetaData.java:2347)

One more thing to note about this line, in the old version there was an outer query with quoted upper case identifiers and this inner query with unquoted identifiers. In the new version there are no subqueries, so I am returning quoted upper-case identifiers as was the case in the original outer query.

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility
@ 2025-03-12 09:18 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
  9 siblings, 0 replies; 11+ messages in thread

From: hlcianfagna (@hlcianfagna) @ 2025-03-12 09:18 UTC (permalink / raw)
  To: pgjdbc/pgjdbc <[email protected]>

As this is now performance-related, maybe @vlsi also wants to take a look?

^ permalink  raw  reply  [nested|flat] 11+ messages in thread

end of thread, other threads:[~2025-03-12 09:18 UTC | newest]

Thread overview: 11+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-02-20 11:45 [pgjdbc/pgjdbc] PR #3533: refactor: Use key_column_usage instead of indnkeyatts in PgDatabaseMetaData for faster speed and improved compatibility "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-02-20 12:01 ` "davecramer (@davecramer)" <[email protected]>
2025-02-28 10:09 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-02-28 12:17 ` "davecramer (@davecramer)" <[email protected]>
2025-03-07 17:50 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-03-08 11:38 ` "davecramer (@davecramer)" <[email protected]>
2025-03-10 10:19 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-03-10 13:44 ` "davecramer (@davecramer)" <[email protected]>
2025-03-10 14:37 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-03-10 14:41 ` "hlcianfagna (@hlcianfagna)" <[email protected]>
2025-03-12 09:18 ` "hlcianfagna (@hlcianfagna)" <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox