MIME-Version: 1.0
From: Rick Otten <rottenwindfish@gmail.com>
Date: Mon, 18 Sep 2017 07:25:14 -0400
Message-ID: 
 <CAMAYy4Jd12CyPFeumOd8g2ar_DmgfH4cvuE5rxocsdi=KDpf0A@mail.gmail.com>
Subject: max partitions behind a view?
To: "pgsql-performa." <pgsql-performance@postgresql.org>
Content-Type: multipart/alternative; boundary="001a114589ac1bdb43055974ff19"
Precedence: bulk
Sender: pgsql-performance-owner@postgresql.org

--001a114589ac1bdb43055974ff19
Content-Type: text/plain; charset="UTF-8"

I use materialized views to cache results from a foreign data wrapper to a
high latency, fairly large (cloud) Hadoop instance.  In order to boost
refresh times I split the FDW and materialized views up into partitions.

Note:  I can't use pg_partman or native partitioning because those don't
really work with this architecture - they are designed for "real" tables.
I can't really use citus because it isn't FDW/matview aware at this time
either.

I then join the various materialized views together with a regular view
made up of a bunch of 'union all' statements.

I have a set of functions which automatically create the new partitions and
then replace the top level view to add them in on the fly.  At this time I
probably have about 60 partitions.

With that approach I can refresh individual chunks of data, or I can
refresh several chunks in parallel.  Generally this has been working pretty
well.  One side effect is that because this is not a real partition, the
planner does have to check each partition whenever I run a query to see if
it has the data I need.  With appropriate indexes, this is ok, checking the
partitions that don't have the data is very quick.  It does make for some
long explain outputs though.

The challenge is that because of an exponential rate of data growth, I
might have to significantly increase the number of partitions I'm working
with - to several hundred at a minimum and potentially more than 1000...

This leads me to the question how many 'union all' statements can I have in
one view?   Should I create a hierarchy of views to gradually roll the data
up instead of putting them all in one top-level view?

--001a114589ac1bdb43055974ff19
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I use materialized views to cache results from a foreign d=
ata wrapper to a high latency, fairly large (cloud) Hadoop instance.=C2=A0 =
In order to boost refresh times I split the FDW and materialized views up i=
nto partitions. =C2=A0<div><br></div><div>Note: =C2=A0I can&#39;t use pg_pa=
rtman or native partitioning because those don&#39;t really work with this =
architecture - they are designed for &quot;real&quot; tables.=C2=A0 I can&#=
39;t really use citus because it isn&#39;t FDW/matview aware at this time e=
ither.</div><div><br></div><div>I then join the various materialized views =
together with a regular view made up of a bunch of &#39;union all&#39; stat=
ements.</div><div><br></div><div>I have a set of functions which automatica=
lly create the new partitions and then replace the top level view to add th=
em in on the fly.=C2=A0 At this time I probably have about 60 partitions.</=
div><div><br></div><div>With that approach I can refresh individual chunks =
of data, or I can refresh several chunks in parallel.=C2=A0 Generally this =
has been working pretty well.=C2=A0 One side effect is that because this is=
 not a real partition, the planner does have to check each partition whenev=
er I run a query to see if it has the data I need.=C2=A0 With appropriate i=
ndexes, this is ok, checking the partitions that don&#39;t have the data is=
 very quick.=C2=A0 It does make for some long explain outputs though.</div>=
<div><br></div><div>The challenge is that because of an exponential rate of=
 data growth, I might have to significantly increase the number of partitio=
ns I&#39;m working with - to several hundred at a minimum and potentially m=
ore than 1000...</div><div><br></div><div>This leads me to the question how=
 many &#39;union all&#39; statements can I have in one view? =C2=A0 Should =
I create a hierarchy of views to gradually roll the data up instead of putt=
ing them all in one top-level view?</div><div><br></div></div>

--001a114589ac1bdb43055974ff19--