[PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA

public inbox for [email protected]  
help / color / mirror / Atom feed

[PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
3+ messages / 2 participants
[nested] [flat]

* [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
@ 2026-04-12 07:22 CharSyam <[email protected]>
  2026-04-13 00:43 ` Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA Michael Paquier <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: CharSyam @ 2026-04-12 07:22 UTC (permalink / raw)
  To: [email protected]

Hi hackers,

  While reading aclchk.c I noticed that objectsInSchemaToOids(), used
  by GRANT/REVOKE ... ON ALL TABLES IN SCHEMA, calls
  getRelationsInNamespace() five times for the OBJECT_TABLE case --
  once per relkind (RELATION, VIEW, MATVIEW, FOREIGN_TABLE,
  PARTITIONED_TABLE):

      case OBJECT_TABLE:
          objs = getRelationsInNamespace(namespaceId, RELKIND_RELATION);
          objects = list_concat(objects, objs);
          objs = getRelationsInNamespace(namespaceId, RELKIND_VIEW);
          ...
          objs = getRelationsInNamespace(namespaceId,
RELKIND_PARTITIONED_TABLE);
          objects = list_concat(objects, objs);
          break;

  pg_class does have an index on (relname, relnamespace), but there
  is no index matching (relnamespace, relkind), so each of those
  per-relkind calls falls back to a full heap scan via
  table_beginscan_catalog().  The work is just repeated five times.

  The attached patch introduces a small helper
  getRelationsInNamespaceMulti() that performs a single heap scan
  filtered by relnamespace and distributes matching tuples into
  per-relkind buckets supplied by the caller.  Relkind filtering is
  done in C after each tuple is read, which is trivially cheap.  The
  OBJECT_TABLE case uses the helper; OBJECT_SEQUENCE and
  OBJECT_PROPGRAPH are left on the original getRelationsInNamespace()
  helper because they only need a single relkind and benefit from the
  second ScanKey.

  Correctness / order preservation
  --------------------------------
   * Result order is identical.  The underlying pg_class heap (and
     thus its physical scan order) is the same regardless of how we
     filter, so each bucket ends up holding the same OIDs in the same
     relative order as a separate per-relkind heap scan would have
     produced.  Concatenating the buckets in the original relkind
     order reproduces the previous list tuple-for-tuple.

     I verified this empirically.  On a schema with interleaved
     relkinds (tables, views, matviews, partitioned tables) I ran two
     equivalent SQL formulations while forcing seq scans on pg_class:

       OLD-path model: UNION ALL of five
         "SELECT oid FROM pg_class
            WHERE relnamespace = X AND relkind = Y
            ORDER BY ctid"
         queries, one per relkind, in the same group order the code
         uses.

       NEW-path model: a single
         "SELECT oid FROM pg_class
            WHERE relnamespace = X
            ORDER BY ctid"
         bucketed by relkind and concatenated in the same group
         order.

     The two formulations produced identical OID sequences, element
     by element.  A positional FULL JOIN between them returned zero
     rows.

   * MVCC semantics are, if anything, a bit stricter.  The old code
     took five separate catalog scans, so in principle concurrent DDL
     could commit between scan N and scan N+1 and be visible to one
     but not another.  With a single scan everything is collected
     under one catalog snapshot.

   * Locking is unchanged in kind: AccessShareLock on pg_class is
     still taken, just once instead of five times.

  Benchmark
  ---------
  This is a targeted micro-optimization, not a dramatic speedup.
  With 10,000 tables in a single schema (pg_class ~10,452 rows),
  running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
  (6 iterations, first dropped as warmup), I measured a consistent
  ~15% reduction in end-to-end time:

                       baseline    patched     delta
      GRANT  (avg)     88.2 ms     75.9 ms    -14%
      REVOKE (avg)    134.9 ms    115.7 ms    -14%

  Per-iteration numbers (ms):

      baseline GRANT : 92, 87, 87, 85, 89
      patched  GRANT : 77, 72, 72, 76, 79, 79
      baseline REVOKE: 145, 144, 132, 128, 130, 128
      patched  REVOKE: 114, 117, 112, 120, 112, 119

  The absolute savings are small because most of the time in these
  commands is spent updating per-relation ACL tuples, not scanning
  pg_class.  For schemas with only a handful of relations the effect
  is not measurable.  The change is aimed at multi-tenant /
  partition-heavy installations that regularly issue
  "... ON ALL TABLES IN SCHEMA ..." statements over large catalogs.

  Testing
  -------
  Both `make check` and `make check-world` pass cleanly with the
  patch applied on top of current master (all suites green, no new
  failures).  TAP tests were not exercised (tree configured without
  --enable-tap-tests); I can rerun with TAP enabled if that is useful.

  The patch is attached (against master).  Feedback and review
  welcome -- in particular I'd like to know if anyone sees a
  correctness concern I missed, or prefers a different shape for the
  helper (e.g. returning a single flat list rather than per-relkind
  buckets).

  Thanks,
  charsyam


Attachments:

  [application/octet-stream] 0001-Reduce-pg_class-scans-in-GRANT-REVOKE-ON-ALL-TABLES-.patch (6.8K, 3-0001-Reduce-pg_class-scans-in-GRANT-REVOKE-ON-ALL-TABLES-.patch)
  download | inline diff:
From 32e86de0e36a9b9544426ab91c6af57d029d078c Mon Sep 17 00:00:00 2001
From: charsyam <[email protected]>
Date: Sun, 12 Apr 2026 16:20:35 +0900
Subject: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA

When processing GRANT/REVOKE ... ON ALL TABLES IN SCHEMA,
objectsInSchemaToOids() called getRelationsInNamespace() five times,
once per relkind (RELATION, VIEW, MATVIEW, FOREIGN_TABLE,
PARTITIONED_TABLE).  pg_class does have an index on
(relname, relnamespace), but there is no index matching
(relnamespace, relkind), so each of those per-relkind calls falls
back to a full heap scan via table_beginscan_catalog() -- i.e. the
catalog is scanned five times in total.

Introduce getRelationsInNamespaceMulti(), which performs a single
heap scan filtered by relnamespace and distributes matching tuples
into per-relkind buckets supplied by the caller.  Relkind filtering
is done in C after each tuple is read, which is trivially cheap.
The OBJECT_TABLE case uses the helper; OBJECT_SEQUENCE and
OBJECT_PROPGRAPH keep calling getRelationsInNamespace() unchanged
because they only need a single relkind and benefit from the second
ScanKey.

Behavior is preserved:

* Result order is identical.  The underlying pg_class heap (and
  therefore its physical scan order) is the same regardless of how
  we filter, so each bucket ends up holding exactly the OIDs that
  the corresponding per-relkind heap scan would have produced, in
  the same order.  Concatenating the buckets in the original
  relkind order reproduces the previous list tuple-for-tuple.
  This was verified empirically by comparing, on a schema with
  mixed relkinds, the OID sequence produced by the old UNION-ALL
  pattern against the new single-scan + bucketed pattern; the
  sequences are identical element by element.

* MVCC semantics are, if anything, a bit stricter: all relkinds
  are now collected under a single catalog snapshot rather than
  five.

* Locking is unchanged in kind -- AccessShareLock on pg_class is
  still taken, just once instead of five times.

A simple benchmark (10,000 tables in one schema, pg_class ~10,452
rows) shows a consistent ~15% reduction in end-to-end time of
GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA:

  GRANT : 88.2 ms -> 75.9 ms
  REVOKE: 134.9 ms -> 115.7 ms

The absolute savings are small because the bulk of the time in
these commands is spent updating per-relation ACL tuples, not
scanning pg_class.  For schemas with only a handful of relations
the effect is not measurable.  The change is therefore a targeted
improvement for deployments with very large catalogs
(multi-tenant / partition-heavy systems) that frequently run ALL
TABLES IN SCHEMA grants.
---
 src/backend/catalog/aclchk.c | 90 ++++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 10 deletions(-)

diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index 67424fe3b0c..a68f0a989d4 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -125,6 +125,11 @@ static List *objectNamesToOids(ObjectType objtype, List *objnames,
 							   bool is_grant);
 static List *objectsInSchemaToOids(ObjectType objtype, List *nspnames);
 static List *getRelationsInNamespace(Oid namespaceId, char relkind);
+/* Single-scan helper over pg_class for multiple relkinds in one namespace */
+static void getRelationsInNamespaceMulti(Oid namespaceId,
+										 const char *relkinds,
+										 int nkinds,
+										 List **buckets);
 static void expand_col_privileges(List *colnames, Oid table_oid,
 								  AclMode this_privileges,
 								  AclMode *col_privileges,
@@ -797,16 +802,25 @@ objectsInSchemaToOids(ObjectType objtype, List *nspnames)
 		switch (objtype)
 		{
 			case OBJECT_TABLE:
-				objs = getRelationsInNamespace(namespaceId, RELKIND_RELATION);
-				objects = list_concat(objects, objs);
-				objs = getRelationsInNamespace(namespaceId, RELKIND_VIEW);
-				objects = list_concat(objects, objs);
-				objs = getRelationsInNamespace(namespaceId, RELKIND_MATVIEW);
-				objects = list_concat(objects, objs);
-				objs = getRelationsInNamespace(namespaceId, RELKIND_FOREIGN_TABLE);
-				objects = list_concat(objects, objs);
-				objs = getRelationsInNamespace(namespaceId, RELKIND_PARTITIONED_TABLE);
-				objects = list_concat(objects, objs);
+				{
+					const char kinds[] = {
+						RELKIND_RELATION,
+						RELKIND_VIEW,
+						RELKIND_MATVIEW,
+						RELKIND_FOREIGN_TABLE,
+						RELKIND_PARTITIONED_TABLE
+					};
+					List *buckets[lengthof(kinds)];
+					int i;
+
+					for (i = 0; i < (int) lengthof(kinds); i++)
+						buckets[i] = NIL;
+
+					getRelationsInNamespaceMulti(namespaceId, kinds, lengthof(kinds), buckets);
+
+					for (i = 0; i < (int) lengthof(kinds); i++)
+						objects = list_concat(objects, buckets[i]);
+				}
 				break;
 			case OBJECT_SEQUENCE:
 				objs = getRelationsInNamespace(namespaceId, RELKIND_SEQUENCE);
@@ -907,6 +921,62 @@ getRelationsInNamespace(Oid namespaceId, char relkind)
 	return relations;
 }
 
+/*
+ * getRelationsInNamespaceMulti
+ *
+ * Perform a single heap scan over pg_class for the given namespace, and
+ * distribute matching tuples into per-relkind buckets provided by the
+ * caller.  There is no pg_class index matching (relnamespace, relkind),
+ * so the previous per-relkind variant also resorted to a full heap scan;
+ * this helper simply collapses N such scans into one.
+ *
+ * Order preservation: within each bucket, entries appear in the order
+ * they were encountered during the heap scan.  Because the underlying
+ * heap (and thus its physical scan order) is the same regardless of
+ * how we filter, each bucket ends up holding the same OIDs in the same
+ * relative order as a separate per-relkind heap scan would have
+ * produced.  Concatenating the buckets in the caller's requested
+ * relkind order therefore reproduces the list that the previous code
+ * built from N separate getRelationsInNamespace() calls, tuple for
+ * tuple.
+ */
+static void
+getRelationsInNamespaceMulti(Oid namespaceId, const char *relkinds, int nkinds, List **buckets)
+{
+	ScanKeyData key;
+	Relation	rel;
+	TableScanDesc scan;
+	HeapTuple	tuple;
+	int			i;
+
+	/* Open pg_class once and scan by namespace; filter relkind in-code */
+	ScanKeyInit(&key,
+				Anum_pg_class_relnamespace,
+				BTEqualStrategyNumber, F_OIDEQ,
+				ObjectIdGetDatum(namespaceId));
+
+	rel = table_open(RelationRelationId, AccessShareLock);
+	scan = table_beginscan_catalog(rel, 1, &key);
+
+	while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
+	{
+		Form_pg_class cform = (Form_pg_class) GETSTRUCT(tuple);
+		char		rk = cform->relkind;
+
+		for (i = 0; i < nkinds; i++)
+		{
+			if (rk == relkinds[i])
+			{
+				buckets[i] = lappend_oid(buckets[i], cform->oid);
+				break;
+			}
+		}
+	}
+
+	table_endscan(scan);
+	table_close(rel, AccessShareLock);
+}
+
 
 /*
  * ALTER DEFAULT PRIVILEGES statement
-- 
2.50.1 (Apple Git-155)



^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
  2026-04-12 07:22 [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA CharSyam <[email protected]>
@ 2026-04-13 00:43 ` Michael Paquier <[email protected]>
  2026-04-13 09:52   ` Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA CharSyam <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Michael Paquier @ 2026-04-13 00:43 UTC (permalink / raw)
  To: CharSyam <[email protected]>; +Cc: [email protected]

On Sun, Apr 12, 2026 at 04:22:24PM +0900, CharSyam wrote:
>   Benchmark
>   ---------
>   This is a targeted micro-optimization, not a dramatic speedup.
>   With 10,000 tables in a single schema (pg_class ~10,452 rows),
>   running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
>   (6 iterations, first dropped as warmup), I measured a consistent
>   ~15% reduction in end-to-end time:
> 
>                        baseline    patched     delta
>       GRANT  (avg)     88.2 ms     75.9 ms    -14%
>       REVOKE (avg)    134.9 ms    115.7 ms    -14%

I am pretty sure that there are users with millions of relations in a
single schema that could benefit from that.  At least that would not
be surprising with partitioning these days, and foreign tables.  What
kind of numbers do you get if you bump up the number of digits for
these tests.  Let's say a comparison based on a few million relations
at least?

The change you are proposing looks simple enough, quickly skimming
through the patch.  There may be more optimizations doable here, I
have not looked at that, still I tend to like such micro-optimization
proposals as they provide a silent benefit.
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA
  2026-04-12 07:22 [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA CharSyam <[email protected]>
  2026-04-13 00:43 ` Re: [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA Michael Paquier <[email protected]>
@ 2026-04-13 09:52   ` CharSyam <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: CharSyam @ 2026-04-13 09:52 UTC (permalink / raw)
  To: Michael Paquier <[email protected]>; +Cc: [email protected]

  I ran a benchmark on the patch *Reduce pg_class scans in GRANT/REVOKE
  ON ALL TABLES IN SCHEMA* (collapses 5 per-relkind pg_class heap scans
  into 1 scan distributed into per-relkind buckets). Summary below.(It took
much time to tests)


  ## Assumptions

  - Two builds of PostgreSQL 19devel from the same source tree (one
    patched, one at master tip), identical compile flags, separate
    --prefix.
  - Separate data directories, run sequentially on an otherwise idle
    host.
  - GUCs: shared_buffers=2GB, max_locks_per_transaction=100000,
    fsync=off, synchronous_commit=off, full_page_writes=off,
    autovacuum=off.
  - bench_s schema contains N empty tables (CREATE TABLE t_i()).
  - Measured operations:
      GRANT  SELECT ON ALL TABLES IN SCHEMA bench_s TO   bench_role
      REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA bench_s FROM bench_role
  - Best of 3 runs reported (seconds).
  - Two scenarios:
      A. Clean catalog   — VACUUM FULL pg_class; VACUUM FULL pg_attribute
                           immediately before measurement. pg_class
                           densely packed.
      B. Bloated catalog — pre-bloat phase: GRANT+REVOKE on all N tables
                           repeated C cycles, no VACUUM. Both patched
                           and master operate on catalogs with identical
                           relpages and n_dead_tup.


  ## Results — Scenario A (Clean catalog)

  macOS (Apple Silicon), best of 3, seconds:

    ntables | GRANT patched | GRANT master | REVOKE patched | REVOKE master
    --------+---------------+--------------+----------------+--------------
     20,000 |     0.116     |    0.115     |     0.231      |    0.226
     40,000 |     0.250     |    0.250     |     0.460      |    0.460
    100,000 |     0.730     |    0.678     |     1.193      |    1.193

  Honestly, there is no measurable performance difference in the clean
  state. Patched and master are statistically indistinguishable within
  run-to-run noise. This matches the design of the patch: when pg_class
  is densely packed, repeating a small seq scan five times is cheap, so
  collapsing it into one has nothing meaningful to save. The patch adds
  no overhead either — worst case is a tie.


  ## Results — Scenario B (Bloated catalog)

  ### Linux x86_64, C=20, best of 3, seconds

    ntables   | dead_tup  | GRANT patched | GRANT master |   Δ     | REVOKE
patched | REVOKE master |   Δ

----------+-----------+---------------+--------------+---------+----------------+---------------+---------
       10,000 |         0 |    0.0924     |    0.0935    |  −1.2 % |
0.1668     |    0.1696     |  −1.6 %
       20,000 |   109,825 |    0.2027     |    0.2069    |  −2.0 % |
0.3381     |    0.3533     |  −4.3 %
       50,000 |   329,468 |    0.5555     |    0.5895    |  −5.8 % |
0.8901     |    0.9371     |  −5.0 %
      100,000 |   879,311 |    1.1732     |    1.1968    |  −2.0 % |
1.8808     |    1.9555     |  −3.8 %
      200,000 | 1,978,925 |    2.2188     |    2.3470    |  −5.5 % |
3.7290     |    3.9064     |  −4.5 %
      500,000 | 4,178,604 |    6.0260     |    6.6663    |  −9.6 % |
9.8162     |   10.2169     |  −3.9 %
    1,000,000 | 9,678,399 |   12.9241     |   14.7657    | −12.5 % |
 24.8893     |   28.7566     | −13.4 %

  ### macOS (Apple Silicon), C=20 (C=10 at 1M), best of 3, seconds

    ntables   | dead_tup  | GRANT patched | GRANT master |   Δ    | REVOKE
patched | REVOKE master |   Δ

----------+-----------+---------------+--------------+--------+----------------+---------------+--------
       20,000 |   299,960 |    0.168      |    0.163     |  +3 %  |
0.260      |    0.278      |  −6 %
       40,000 |   519,601 |    0.307      |    0.307     |   0 %  |
0.552      |    0.564      |  −2 %
      100,000 |   959,268 |    0.784      |    0.934     | −16 %  |
1.405      |    1.419      |  ~0 %
      200,000 | 2,058,886 |    1.787      |    1.878     |  −5 %  |
2.745      |    2.849      |  −4 %
      500,000 | 4,258,565 |    4.727      |    5.197     |  −9 %  |
7.126      |    7.908      | −10 %
    1,000,000 | 9,758,364 |   10.977      |   11.126     |  −1 %  |
 19.473      |   20.759      |  −6 %

  Negative Δ = patched faster. Under catalog bloat the patch produces a
  consistent, reproducible improvement on both operating systems.


  Happy to share the bench scripts and raw logs on request.

  Thanks,
  charsyam

2026년 4월 13일 (월) 오전 9:43, Michael Paquier <[email protected]>님이 작성:

> On Sun, Apr 12, 2026 at 04:22:24PM +0900, CharSyam wrote:
> >   Benchmark
> >   ---------
> >   This is a targeted micro-optimization, not a dramatic speedup.
> >   With 10,000 tables in a single schema (pg_class ~10,452 rows),
> >   running GRANT/REVOKE SELECT ON ALL TABLES IN SCHEMA in a loop
> >   (6 iterations, first dropped as warmup), I measured a consistent
> >   ~15% reduction in end-to-end time:
> >
> >                        baseline    patched     delta
> >       GRANT  (avg)     88.2 ms     75.9 ms    -14%
> >       REVOKE (avg)    134.9 ms    115.7 ms    -14%
>
> I am pretty sure that there are users with millions of relations in a
> single schema that could benefit from that.  At least that would not
> be surprising with partitioning these days, and foreign tables.  What
> kind of numbers do you get if you bump up the number of digits for
> these tests.  Let's say a comparison based on a few million relations
> at least?
>
> The change you are proposing looks simple enough, quickly skimming
> through the patch.  There may be more optimizations doable here, I
> have not looked at that, still I tend to like such micro-optimization
> proposals as they provide a silent benefit.
> --
> Michael
>


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

end of thread, other threads:[~2026-04-13 09:52 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-12 07:22 [PATCH] Reduce pg_class scans in GRANT/REVOKE ON ALL TABLES IN SCHEMA CharSyam <[email protected]>
2026-04-13 00:43 ` Michael Paquier <[email protected]>
2026-04-13 09:52   ` CharSyam <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox