MIME-Version: 1.0
In-Reply-To: <ce8eaea2-3008-8cc1-fa39-129e9e82eaa2@pritambaral.com>
References: 
 <CABZYQRKnp=FxZ7tQeyytDjUOnHP9J90irxRBEAc+-XGbKdgf2A@mail.gmail.com>
 <ce8eaea2-3008-8cc1-fa39-129e9e82eaa2@pritambaral.com>
From: =?UTF-8?Q?Ulf_Lohbr=C3=BCgge?= <ulf.lohbruegge@gmail.com>
Date: Wed, 28 Jun 2017 11:38:14 +0200
Message-ID: 
 <CABZYQRK6i+9jedfoQ+7aia+0c_ox0d=qV9_2p+PWv99Tegt95Q@mail.gmail.com>
Subject: Re: Performance of information_schema with many schemata
 and tables
To: pgsql-performance@postgresql.org
Content-Type: multipart/alternative; boundary="001a11376f00769a3a055301f1cc"
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a11376f00769a3a055301f1cc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Nope, I didn't try that yet. But I don't have the impression that
reindexing the indexes in information_schema will help. The table
information_schema.tables consists of the following indexes:

    "pg_class_oid_index" UNIQUE, btree (oid)
    "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
    "pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

The costly sequence scan in question on pg_class happens with the following
WHERE clause:

WHERE (c.relkind =3D ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"])) AN=
D
NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid,
'SELECT, INSERT, UPDATE, REFERENCES'::text));

Besides pg_class_oid_index none of the referenced columns is indexed. I
tried to add an index on relowner but didn't succeed because the column is
used in the function call pg_has_role and the query is still forced to do a
sequence scan.

Regards,
Ulf

2017-06-28 3:31 GMT+02:00 Pritam Baral <pritam@pritambaral.com>:

> On Wednesday 28 June 2017 05:27 AM, Ulf Lohbr=C3=BCgge wrote:
> > Hi all,
> >
> > we use schemata to separate our customers in a multi-tenant setup
> (9.5.7, Debian stable). Each tenant is managed in his own schema with all
> the tables that only he can access. All tables in all schemata are the sa=
me
> in terms of their DDL: Every tenant uses e.g. his own table 'address'. We
> currently manage around 1200 schemata (i.e. tenants) on one cluster. Ever=
y
> schema consists currently of ~200 tables - so we end up with ~240000 tabl=
es
> plus constraints, indexes, sequences et al.
> >
> > Our current approach is quite nice in terms of data privacy because
> every tenant is isolated from all other tenants. A tenant uses his own us=
er
> that gives him only access to the corresponding schema. Performance is
> great for us - we didn't expect Postgres to scale so well!
> >
> > But performance is pretty bad when we query things in the
> information_schema:
> >
> > SELECT
> >   *
> > FROM information_schema.tables
> > WHERE table_schema =3D 'foo'
> > AND table_name =3D 'bar';``
> >
> > Above query results in a large sequence scan with a filter that removes
> 1305161 rows:
> >
> >
>
>                        QUERY PLAN
> > ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------------------------------------
> >  Nested Loop Left Join  (cost=3D0.70..101170.18 rows=3D3 width=3D265) (=
actual
> time=3D383.505..383.505 rows=3D0 loops=3D1)
> >    ->  Nested Loop  (cost=3D0.00..101144.65 rows=3D3 width=3D141) (actu=
al
> time=3D383.504..383.504 rows=3D0 loops=3D1)
> >          Join Filter: (nc.oid =3D c.relnamespace)
> >          ->  Seq Scan on pg_class c  (cost=3D0.00..101023.01 rows=3D867
> width=3D77) (actual time=3D383.502..383.502 rows=3D0 loops=3D1)
> >                Filter: ((relkind =3D ANY ('{r,v,f}'::"char"[])) AND
> (((relname)::information_schema.sql_identifier)::text =3D 'bar'::text) AN=
D
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT=
,
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)=
))
> >                Rows Removed by Filter: 1305161
> >          ->  Materialize  (cost=3D0.00..56.62 rows=3D5 width=3D68) (nev=
er
> executed)
> >                ->  Seq Scan on pg_namespace nc  (cost=3D0.00..56.60 row=
s=3D5
> width=3D68) (never executed)
> >                      Filter: ((NOT pg_is_other_temp_schema(oid)) AND
> (((nspname)::information_schema.sql_identifier)::text =3D 'foo'::text))
> >    ->  Nested Loop  (cost=3D0.70..8.43 rows=3D1 width=3D132) (never exe=
cuted)
> >          ->  Index Scan using pg_type_oid_index on pg_type t
> (cost=3D0.42..8.12 rows=3D1 width=3D72) (never executed)
> >                Index Cond: (c.reloftype =3D oid)
> >          ->  Index Scan using pg_namespace_oid_index on pg_namespace nt
> (cost=3D0.28..0.30 rows=3D1 width=3D68) (never executed)
> >                Index Cond: (oid =3D t.typnamespace)
> >  Planning time: 0.624 ms
> >  Execution time: 383.784 ms
> > (16 rows)
> >
> > We noticed the degraded performance first when using the psql cli.
> Pressing tab after beginning a WHERE clause results in a query against th=
e
> information_schema which is pretty slow and ends in "lag" when trying to
> enter queries.
> >
> > We also use Flyway (https://flywaydb.org/) to handle our database
> migrations. Unfortunately Flyway is querying the information_schema to
> check if specific tables exist (I guess this is one of the reasons
> information_schema exists) and therefore vastly slows down the migration =
of
> our tenants. Our last migration run on all tenants (schemata) almost took
> 2h because the above query is executed multiple times per tenant. The
> migration run consisted of multiple sql files to be executed and triggere=
d
> more than 10 queries on information_schema per tenant.
> >
> > I don't think that Flyway is to blame because querying the
> information_schema should be a fast operation (and was fast for us when w=
e
> had less schemata). I tried to speedup querying pg_class by adding indexe=
s
> (after enabling allow_system_table_mods) but didn't succeed. The function
> call 'pg_has_role' is probably not easy to optimize.
> >
> > Postgres is really doing a great job to handle those many schemata and
> tables but doesn't scale well when querying information_schema. I actuall=
y
> don't want to change my current multi-tenant setup (one schema per tenant=
)
> as it is working great but the slow information_schema is killing our
> deployments.
> >
> > Are there any other options besides switching from one-schema-per-tenan=
t-approach?
> Any help is greatly appreciated!
>
> Have you tried a `REINDEX SYSTEM <dbname>`?
>
> >
> > Regards,
> > Ulf
>
> --
> #!/usr/bin/env regards
> Chhatoi Pritam Baral
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org=
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--001a11376f00769a3a055301f1cc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><span style=3D"font-size:12.8px">Nope, I didn&#39;t try th=
at yet. But I don&#39;t have the impression that reindexing the indexes in =
information_schema will help. The table information_schema.tables consists =
of the following indexes:</span><div style=3D"font-size:12.8px"><br></div><=
div style=3D"font-size:12.8px"><div>=C2=A0 =C2=A0 &quot;pg_class_oid_index&=
quot; UNIQUE, btree (oid)</div><div>=C2=A0 =C2=A0 &quot;pg_class_relname_ns=
p_index&quot; UNIQUE, btree (relname, relnamespace)</div><div>=C2=A0 =C2=A0=
 &quot;pg_class_tblspc_relfilenode_<wbr>index&quot; btree (reltablespace, r=
elfilenode)</div></div><div style=3D"font-size:12.8px"><br></div><div style=
=3D"font-size:12.8px">The costly sequence scan in question on pg_class happ=
ens with the following WHERE clause:</div><div style=3D"font-size:12.8px"><=
br></div><div style=3D"font-size:12.8px">WHERE (c.relkind =3D ANY (ARRAY[&#=
39;r&#39;::&quot;char&quot;, &#39;v&#39;::&quot;char&quot;, &#39;f&#39;::&q=
uot;char&quot;])) AND NOT pg_is_other_temp_schema(nc.<wbr>oid) AND (pg_has_=
role(c.relowner, &#39;USAGE&#39;::text) OR has_table_privilege(c.oid, &#39;=
SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER&#39;::text) O=
R has_any_column_privilege(c.<wbr>oid, &#39;SELECT, INSERT, UPDATE, REFEREN=
CES&#39;::text));<br></div><div style=3D"font-size:12.8px"><br></div><div s=
tyle=3D"font-size:12.8px">Besides pg_class_oid_index none of the referenced=
 columns is indexed. I tried to add an index on relowner but didn&#39;t suc=
ceed because the column is used in the function call pg_has_role and the qu=
ery is still forced to do a sequence scan.</div><div style=3D"font-size:12.=
8px"><br></div><div style=3D"font-size:12.8px">Regards,</div><div style=3D"=
font-size:12.8px">Ulf</div></div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">2017-06-28 3:31 GMT+02:00 Pritam Baral <span dir=3D"ltr">&=
lt;<a href=3D"mailto:pritam@pritambaral.com" target=3D"_blank">pritam@prita=
mbaral.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class=3D"=
HOEnZb"><div class=3D"h5">On Wednesday 28 June 2017 05:27 AM, Ulf Lohbr=C3=
=BCgge wrote:<br>
&gt; Hi all,<br>
&gt;<br>
&gt; we use schemata to separate our customers in a multi-tenant setup (9.5=
.7, Debian stable). Each tenant is managed in his own schema with all the t=
ables that only he can access. All tables in all schemata are the same in t=
erms of their DDL: Every tenant uses e.g. his own table &#39;address&#39;. =
We currently manage around 1200 schemata (i.e. tenants) on one cluster. Eve=
ry schema consists currently of ~200 tables - so we end up with ~240000 tab=
les plus constraints, indexes, sequences et al.<br>
&gt;<br>
&gt; Our current approach is quite nice in terms of data privacy because ev=
ery tenant is isolated from all other tenants. A tenant uses his own user t=
hat gives him only access to the corresponding schema. Performance is great=
 for us - we didn&#39;t expect Postgres to scale so well!<br>
&gt;<br>
&gt; But performance is pretty bad when we query things in the information_=
schema:<br>
&gt;<br>
&gt; SELECT<br>
&gt;=C2=A0 =C2=A0*<br>
&gt; FROM information_schema.tables<br>
&gt; WHERE table_schema =3D &#39;foo&#39;<br>
&gt; AND table_name =3D &#39;bar&#39;;``<br>
&gt;<br>
&gt; Above query results in a large sequence scan with a filter that remove=
s 1305161 rows:<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN<br>
&gt; ------------------------------<wbr>------------------------------<wbr>=
------------------------------<wbr>------------------------------<wbr>-----=
-------------------------<wbr>------------------------------<wbr>----------=
--------------------<wbr>------------------------------<wbr>---------------=
---------------<wbr>------------------------------<wbr>--------------------=
----------<wbr>-------------------------<br>
&gt;=C2=A0 Nested Loop Left Join=C2=A0 (cost=3D0.70..101170.18 rows=3D3 wid=
th=3D265) (actual time=3D383.505..383.505 rows=3D0 loops=3D1)<br>
&gt;=C2=A0 =C2=A0 -&gt;=C2=A0 Nested Loop=C2=A0 (cost=3D0.00..101144.65 row=
s=3D3 width=3D141) (actual time=3D383.504..383.504 rows=3D0 loops=3D1)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Join Filter: (nc.oid =3D c.relnamesp=
ace)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt;=C2=A0 Seq Scan on pg_class c=
=C2=A0 (cost=3D0.00..101023.01 rows=3D867 width=3D77) (actual time=3D383.50=
2..383.502 rows=3D0 loops=3D1)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Filter: ((relki=
nd =3D ANY (&#39;{r,v,f}&#39;::&quot;char&quot;[])) AND (((relname)::inform=
ation_<wbr>schema.sql_identifier)::text =3D &#39;bar&#39;::text) AND (pg_ha=
s_role(relowner, &#39;USAGE&#39;::text) OR has_table_privilege(oid, &#39;SE=
LECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER&#39;::text) OR =
has_any_column_privilege(oid, &#39;SELECT, INSERT, UPDATE, REFERENCES&#39;:=
:text)))<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Rows Removed by=
 Filter: 1305161<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt;=C2=A0 Materialize=C2=A0 (cost=
=3D0.00..56.62 rows=3D5 width=3D68) (never executed)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt;=C2=A0 Seq=
 Scan on pg_namespace nc=C2=A0 (cost=3D0.00..56.60 rows=3D5 width=3D68) (ne=
ver executed)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 Filter: ((NOT pg_is_other_temp_schema(oid)) AND (((nspname)::informa=
tion_<wbr>schema.sql_identifier)::text =3D &#39;foo&#39;::text))<br>
&gt;=C2=A0 =C2=A0 -&gt;=C2=A0 Nested Loop=C2=A0 (cost=3D0.70..8.43 rows=3D1=
 width=3D132) (never executed)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt;=C2=A0 Index Scan using pg_type=
_oid_index on pg_type t=C2=A0 (cost=3D0.42..8.12 rows=3D1 width=3D72) (neve=
r executed)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: (c.=
reloftype =3D oid)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 -&gt;=C2=A0 Index Scan using pg_name=
space_oid_index on pg_namespace nt=C2=A0 (cost=3D0.28..0.30 rows=3D1 width=
=3D68) (never executed)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: (oi=
d =3D t.typnamespace)<br>
&gt;=C2=A0 Planning time: 0.624 ms<br>
&gt;=C2=A0 Execution time: 383.784 ms<br>
&gt; (16 rows)<br>
&gt;<br>
&gt; We noticed the degraded performance first when using the psql cli. Pre=
ssing tab after beginning a WHERE clause results in a query against the inf=
ormation_schema which is pretty slow and ends in &quot;lag&quot; when tryin=
g to enter queries.<br>
&gt;<br>
&gt; We also use Flyway (<a href=3D"https://flywaydb.org/" rel=3D"noreferre=
r" target=3D"_blank">https://flywaydb.org/</a>) to handle our database migr=
ations. Unfortunately Flyway is querying the information_schema to check if=
 specific tables exist (I guess this is one of the reasons information_sche=
ma exists) and therefore vastly slows down the migration of our tenants. Ou=
r last migration run on all tenants (schemata) almost took 2h because the a=
bove query is executed multiple times per tenant. The migration run consist=
ed of multiple sql files to be executed and triggered more than 10 queries =
on information_schema per tenant.<br>
&gt;<br>
&gt; I don&#39;t think that Flyway is to blame because querying the informa=
tion_schema should be a fast operation (and was fast for us when we had les=
s schemata). I tried to speedup querying pg_class by adding indexes (after =
enabling allow_system_table_mods) but didn&#39;t succeed. The function call=
 &#39;pg_has_role&#39; is probably not easy to optimize.<br>
&gt;<br>
&gt; Postgres is really doing a great job to handle those many schemata and=
 tables but doesn&#39;t scale well when querying information_schema. I actu=
ally don&#39;t want to change my current multi-tenant setup (one schema per=
 tenant) as it is working great but the slow information_schema is killing =
our deployments.<br>
&gt;<br>
&gt; Are there any other options besides switching from one-schema-per-tena=
nt-<wbr>approach? Any help is greatly appreciated!<br>
<br>
</div></div>Have you tried a `REINDEX SYSTEM &lt;dbname&gt;`?<br>
<br>
&gt;<br>
&gt; Regards,<br>
&gt; Ulf<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
--<br>
#!/usr/bin/env regards<br>
Chhatoi Pritam Baral<br>
<br>
<br>
<br>
--<br>
Sent via pgsql-performance mailing list (<a href=3D"mailto:pgsql-performanc=
e@postgresql.org">pgsql-performance@postgresql.<wbr>org</a>)<br>
To make changes to your subscription:<br>
<a href=3D"http://www.postgresql.org/mailpref/pgsql-performance" rel=3D"nor=
eferrer" target=3D"_blank">http://www.postgresql.org/<wbr>mailpref/pgsql-pe=
rformance</a><br>
</font></span></blockquote></div><br></div>

--001a11376f00769a3a055301f1cc--