Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qQsMX-001G9d-R0 for pgsql-admin@arkaria.postgresql.org; Tue, 01 Aug 2023 16:36:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qQsMV-008JRG-9T for pgsql-admin@arkaria.postgresql.org; Tue, 01 Aug 2023 16:36:44 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qQsMU-008JR8-TJ for pgsql-admin@lists.postgresql.org; Tue, 01 Aug 2023 16:36:44 +0000 Received: from mail-oi1-x229.google.com ([2607:f8b0:4864:20::229]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qQsMT-000ETh-Oj for pgsql-admin@lists.postgresql.org; Tue, 01 Aug 2023 16:36:43 +0000 Received: by mail-oi1-x229.google.com with SMTP id 5614622812f47-3a6f3ef3155so3280830b6e.1 for ; Tue, 01 Aug 2023 09:36:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690907800; x=1691512600; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=eTwt40EUEtgmhVWOJoDXVquCaQlEZeG4Foa4rPuF9Js=; b=ro2WYhEK5BWk6Mja/GgeRHU2t+HDJqIoV+83NngFBN+X+9vGR0t3IlubM9yNP6Oijx bU7Xxmihd+avHacHeQuI/FEnen+bnWFLxlenxT6xh4Gd4ethaymFDxw3IZZVhqRxx/wF ImY+ZJf74Yd/CiFV/cmryfOr4cjVFXzKBSW72pXo98cifjG6SNV4Q/lyaTdVgF+0An+e TnluPHprqsuGcKt3F2XfsJUDaW1hYV1Yl8AGhR7YM/26pHExfMRGs7liB5ft6+JGTVAQ aj9l/uyGECB2lejSWDSPB7WSENbIACljH8mRd1E7zVD1zqkIJ3fAkTtBtWJAERQ3vZJU pW/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690907800; x=1691512600; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eTwt40EUEtgmhVWOJoDXVquCaQlEZeG4Foa4rPuF9Js=; b=LMwRdWCFkiE+97VgfYqcQJ2ebgHDacmOLR1xtCcDnXWTvOebmJqjRzsvcyzdv/C8/e FEt94xGCtOk2/tDOV86XXdIaa7C0l55mn0BnB1BFRVN1aHiG2HciABjRiWbv4i5q656m QoQCwkIdCB/ZyK86misUm8QRmu/yk5rBrk1n0aieVSbhZ/eNOtwOxLnMdc4NhKdi51zH F4oE1GsJbIOSBCE4uUFCALRUPzfPHutrK+SiWnQI3IMMfM+kVMrHTkgVsY+kdvUrzzsB rpcuA2qpD7nyMJynaXg6qb6OHCuHmYny86AJnerf2arpebDVw7tWLk20AOyDcJRUHytG NtbQ== X-Gm-Message-State: ABy/qLazafZcrYo4gPFR0Td21y+ejjVhdp8NFcHPqYm3xLJEC71KIM9A Y1iAMXwrNJ6JUWaTaVSQxyO8R1ccP6X/FFn2vcr2xEAL X-Google-Smtp-Source: APBJJlHEZlRKUxaw0BHl+YzvzPAgwGJL9wca1PtfcYAf0AQgeIl+wV2oeCIh4KEQmskUP8iM2d8BXFUQL36sPBrHcCg= X-Received: by 2002:a05:6808:19a0:b0:3a7:5314:e55e with SMTP id bj32-20020a05680819a000b003a75314e55emr1627806oib.24.1690907799702; Tue, 01 Aug 2023 09:36:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: richard coleman Date: Tue, 1 Aug 2023 12:36:28 -0400 Message-ID: Subject: Re: analyze foreign tables To: Laurenz Albe Cc: Pgsql-admin Content-Type: multipart/alternative; boundary="000000000000bfd21e0601df264b" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000bfd21e0601df264b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Laurenz, Thanks for taking the time to respond. Right now I'm stuck with cronning a script to manually run analyze on every foreign table in every database, which in our case is most of them. Would it be possible to transfer table statistics between the same version of PostgreSQL, ex: source is pg15, target is pg15? Otherwise, anything that can be done to speed this up would be very helpful= . Thanks again, rik. On Tue, Aug 1, 2023 at 12:16=E2=80=AFPM Laurenz Albe wrote: > On Tue, 2023-08-01 at 09:47 -0400, richard coleman wrote: > > In PostgreSQL foreign tables are not automatically analyzed and analyze > must be > > specifically called on each table. In the case of sharing tables betwe= en > > PostgreSQL clusters, there is the use_remote_estimate foreign server > option. > > > > In some of our multi terabyte databases, manually running analyze on al= l > of > > the foreign tables can take more than a day. This is per database > containing > > the foreign schema. Since we have certain large schema that we have > centrally > > located and share to all of our other database clusters, this really > adds up. > > > > use_remote_estimate isn't really a solution as it adds way too much > overhead > > and processing time to every query run. > > > > Since these tables are being continuously analyzed in the database that > hosts > > the data, is there some way that they statistics could be easily passed > through > > the foreign server mechanism to the remote database that's calling the > query? > > > > Unless I'm missing something we can either: > > 1. manually run analyze on each foreign table in each database that > points to > > the host table > > 2. set use_remote_estimate =3D true which will cause PostgreSQL to > re-obtain > > statistics on a per query basis. > > > > What I am hoping for is either: > > 1. pass through the results of analyze from the source database to the > one > > where the foreign query is being run > > 2. add the ability to automatically run analyze on foreign tables just > as they > > are currently run on local tables. > > > > Of the two, #1 would seem to be the easiest and least wasteful of > resources. > > Unfortunately, both your wishes don't look feasible: > > - Transferring table statistics would mean that PostgreSQL understands > statistics > from other server versions. This is complicated, and we have decided > not to > do this for pg_upgrade, so I don't think we'll try to do it here. > > - Autoanalyzing foreign tables would mean that we have some idea how much > data > has changed on the remote server. How should we do that? > > What I can imagine is that instead of reading the complete remote table > during > ANALYZE, PostgreSQL applies TABLESAMPLE to fetch only part. That could b= e > a > workable enhancement. > > Yours, > Laurenz Albe > --000000000000bfd21e0601df264b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Laurenz,=C2=A0

Thanks for taking the ti= me to respond.

Right now I'm stuck with cronni= ng a script to manually run analyze on every foreign table in every databas= e, which in our case is most of them.

Would it be = possible to transfer table statistics between the same version of PostgreSQ= L, ex: source is pg15, target is pg15?

Otherwise, = anything that can be done to speed this up would be very helpful.

Thanks again,=C2=A0
rik.


On T= ue, Aug 1, 2023 at 12:16=E2=80=AFPM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Tue, 2023-08-01 at 09:47 -= 0400, richard coleman wrote:
> In PostgreSQL foreign tables are not automatically analyzed and analyz= e must be
> specifically called on each table.=C2=A0 In the case of sharing tables= between
> PostgreSQL clusters, there is the=C2=A0use_remote_estimate foreign ser= ver option.
>
> In some of our multi terabyte databases, manually running analyze on a= ll of
> the foreign tables can take more than a day.=C2=A0 This is per databas= e containing
> the foreign schema.=C2=A0 Since we have certain large schema that we h= ave centrally
> located and share to all of our other database clusters, this really a= dds up.
>
> use_remote_estimate isn't really a solution as it adds way too muc= h overhead
> and processing time to every query run.
>
> Since these tables are being continuously analyzed in the database tha= t hosts
> the data, is there some way that they statistics could be easily passe= d through
> the foreign server mechanism to the remote database that's calling= the query?
>
> Unless I'm missing something we can either:
> 1. manually run analyze on each foreign table in each database that po= ints to
>=C2=A0 =C2=A0 the host table
> 2. set use_remote_estimate =3D true which will cause PostgreSQL to re-= obtain
>=C2=A0 =C2=A0 statistics on a per query basis.
>
> What I am hoping for is either:
> 1. pass through the results of analyze from the source database to the= one
>=C2=A0 =C2=A0 where the foreign query is being run
> 2. add the ability to automatically run analyze on foreign tables just= as they
>=C2=A0 =C2=A0 are currently run on local tables.
>
> Of the two, #1 would seem to be the easiest and least wasteful of reso= urces.

Unfortunately, both your wishes don't look feasible:

- Transferring table statistics would mean that PostgreSQL understands stat= istics
=C2=A0 from other server versions.=C2=A0 This is complicated, and we have d= ecided not to
=C2=A0 do this for pg_upgrade, so I don't think we'll try to do it = here.

- Autoanalyzing foreign tables would mean that we have some idea how much d= ata
=C2=A0 has changed on the remote server.=C2=A0 How should we do that?

What I can imagine is that instead of reading the complete remote table dur= ing
ANALYZE, PostgreSQL applies TABLESAMPLE to fetch only part.=C2=A0 That coul= d be a
workable enhancement.

Yours,
Laurenz Albe
--000000000000bfd21e0601df264b--