Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dtuCm-000449-Oy for pgsql-performance@arkaria.postgresql.org; Mon, 18 Sep 2017 11:27:12 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dtuCm-0006y9-AK for pgsql-performance@arkaria.postgresql.org; Mon, 18 Sep 2017 11:27:12 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dtuB1-0003nV-4z for pgsql-performance@postgresql.org; Mon, 18 Sep 2017 11:25:23 +0000 Received: from mail-qk0-x231.google.com ([2607:f8b0:400d:c09::231]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1dtuAu-0007v3-BS for pgsql-performance@postgresql.org; Mon, 18 Sep 2017 11:25:21 +0000 Received: by mail-qk0-x231.google.com with SMTP id r141so164668qke.2 for ; Mon, 18 Sep 2017 04:25:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=hUVg5EoF5KL4FTowBDKkKzTuELlCt2+GYpAZoXs4K9c=; b=gjn4fD3T3TAJyI76DM0HYQh7IIXtbww1+kRywNHfyCEyD2h7AeC1dGmD0/ITUnV2zu FMk/HKRO601zV8g+gw3QmK4/La4OHt+h8k4xvIspJcxwndicNsfQ3lVOvdG4XyfFfkqH iuSh4M4IctO8PCLike+gCOtpf0HYNHxdV5Bln9F/OtlIIL6O45vd62s7dFfOb0ZVj1Gn +D+FcEVfSsiaZBShJ/nXgIu6sx5Yu2JF2F9MpLLZHQgqSBMRNZvRsZs5VUgkjSHym7Jv OpLARdFXQPgYGE96qSVQXBFR4aYQyDyFIAV0UxlRZcuKGrvq2t0JnifgfBPi1+H95UYm 4M9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=hUVg5EoF5KL4FTowBDKkKzTuELlCt2+GYpAZoXs4K9c=; b=KOIdKrPKsPTru6PKAS9Ma7Z934SvSbG/4FqX41gtI0S+aa7AFKnBge241ngviCPJPG MdjCTIXMS2BjEDZyXRynjL0VMlB8On7KbUSxm+caAyn2ZkI7xZ1M1hvvNho4dUo7CivW U1fKaJpgXQzyp0djSlS+aTEwV3ODox95jbYfgL40jg2uXiO4e1qYwFX4j3d6n7ERWBa7 ocsqEPPi0dRd4Wb3SCBb4ZSROszTgbsN4oM78NeyuUjeFni+7bwwRnL1SamqKNJAxtO8 YLPzMTR/D8U2rLDA0fEJcbUDsy1BQEo4oVIE8aaAXFFoQ9vaIMxf+GWuc1krB6Vyq6dP c0Kw== X-Gm-Message-State: AHPjjUg2qeQcDB8BX3zVNZ8nl09Ypiot7xX7wubZtaqhlBFtPOMCv7fD yZ6ZSiHjM0YjbBdsOJ5SUFVazNifLRComkK0J4M= X-Google-Smtp-Source: AOwi7QDNMAgtzycjqBPCJyJxSNknOk3eh/Nm8CXmstmckHsavgF0mgxkDjzU+Ib3gobG/V8QozUPDEtri2a1Gm75co4= X-Received: by 10.55.16.75 with SMTP id a72mr19622498qkh.146.1505733914645; Mon, 18 Sep 2017 04:25:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.177.13 with HTTP; Mon, 18 Sep 2017 04:25:14 -0700 (PDT) From: Rick Otten Date: Mon, 18 Sep 2017 07:25:14 -0400 Message-ID: Subject: max partitions behind a view? To: "pgsql-performa." Content-Type: multipart/alternative; boundary="001a114589ac1bdb43055974ff19" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org --001a114589ac1bdb43055974ff19 Content-Type: text/plain; charset="UTF-8" I use materialized views to cache results from a foreign data wrapper to a high latency, fairly large (cloud) Hadoop instance. In order to boost refresh times I split the FDW and materialized views up into partitions. Note: I can't use pg_partman or native partitioning because those don't really work with this architecture - they are designed for "real" tables. I can't really use citus because it isn't FDW/matview aware at this time either. I then join the various materialized views together with a regular view made up of a bunch of 'union all' statements. I have a set of functions which automatically create the new partitions and then replace the top level view to add them in on the fly. At this time I probably have about 60 partitions. With that approach I can refresh individual chunks of data, or I can refresh several chunks in parallel. Generally this has been working pretty well. One side effect is that because this is not a real partition, the planner does have to check each partition whenever I run a query to see if it has the data I need. With appropriate indexes, this is ok, checking the partitions that don't have the data is very quick. It does make for some long explain outputs though. The challenge is that because of an exponential rate of data growth, I might have to significantly increase the number of partitions I'm working with - to several hundred at a minimum and potentially more than 1000... This leads me to the question how many 'union all' statements can I have in one view? Should I create a hierarchy of views to gradually roll the data up instead of putting them all in one top-level view? --001a114589ac1bdb43055974ff19 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I use materialized views to cache results from a foreign d= ata wrapper to a high latency, fairly large (cloud) Hadoop instance.=C2=A0 = In order to boost refresh times I split the FDW and materialized views up i= nto partitions. =C2=A0

Note: =C2=A0I can't use pg_pa= rtman or native partitioning because those don't really work with this = architecture - they are designed for "real" tables.=C2=A0 I can&#= 39;t really use citus because it isn't FDW/matview aware at this time e= ither.

I then join the various materialized views = together with a regular view made up of a bunch of 'union all' stat= ements.

I have a set of functions which automatica= lly create the new partitions and then replace the top level view to add th= em in on the fly.=C2=A0 At this time I probably have about 60 partitions.

With that approach I can refresh individual chunks = of data, or I can refresh several chunks in parallel.=C2=A0 Generally this = has been working pretty well.=C2=A0 One side effect is that because this is= not a real partition, the planner does have to check each partition whenev= er I run a query to see if it has the data I need.=C2=A0 With appropriate i= ndexes, this is ok, checking the partitions that don't have the data is= very quick.=C2=A0 It does make for some long explain outputs though.
=

The challenge is that because of an exponential rate of= data growth, I might have to significantly increase the number of partitio= ns I'm working with - to several hundred at a minimum and potentially m= ore than 1000...

This leads me to the question how= many 'union all' statements can I have in one view? =C2=A0 Should = I create a hierarchy of views to gradually roll the data up instead of putt= ing them all in one top-level view?

--001a114589ac1bdb43055974ff19--