Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRb09-002Iyp-6y for pgsql-admin@arkaria.postgresql.org; Thu, 03 Aug 2023 16:16:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1qRb06-00EjVr-3r for pgsql-admin@arkaria.postgresql.org; Thu, 03 Aug 2023 16:16:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1qRb05-00EjVj-Np for pgsql-admin@lists.postgresql.org; Thu, 03 Aug 2023 16:16:34 +0000 Received: from mail-oi1-x22f.google.com ([2607:f8b0:4864:20::22f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qRazz-000MtD-Gl for pgsql-admin@lists.postgresql.org; Thu, 03 Aug 2023 16:16:33 +0000 Received: by mail-oi1-x22f.google.com with SMTP id 5614622812f47-3a734b8a27fso719245b6e.1 for ; Thu, 03 Aug 2023 09:16:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691079385; x=1691684185; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=yFXQdrBZ9xL3uWqkXKFheGS7X3sbVyv5NhwZ7zdrTy8=; b=a8Pugg+T39wdxmZNa697l6YmfELKoAFNetKtWZqdCvZrzBA/5JIfNJSBfVvuKBMOV/ Ix8CyyooiEJwWXEg1kNMFy7ydn+avZXMJEAIQjmQ4c/CsQdAMPMr9uKWyVpT14K5mMNV 4zJmvbi4m/smt141cZ9/LJexkd3ZK7a6YkvHH6IuBpOg8BIA2Nfmyte0KqCDC7p46yNw ljZqSI0Gszai0WRxRGQgVkGCNEMAOq617Pr5v8ddXpHw59f3jCmyMC6V+a66jfyfXJ7g DnlApKn0cZ7J1UBRuUVTrYF4mtABBzCvlIpA27s439EQo9Eq55VRX8+Rc7d/TG0vyMwq +tGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691079385; x=1691684185; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yFXQdrBZ9xL3uWqkXKFheGS7X3sbVyv5NhwZ7zdrTy8=; b=UppX6RUSO/q6KGKIQhFKbTwUpMkIqRZyRsxSYB+6/jkhXkLYA9AJdTGp9lrLcb+jgT ZzAA8y7GqAjIAmZ0CfW+cRZwukfZfRX8jp0ogWkRoyewWDc/sGz0PzL5zSooRKfQogAv lz4gMavfuxTMWyUIZqnLuSOjew4Wcfv1A8og4wSeMfvG3bzkttELlmK04Y8wQ+5xiXJr /GthCHV7ET0CAJYas97+VjYxVdQoKnT1LktYb3Vo0/iYUKXUgKXBh6g4eDGxlkP63KzO nBJcksgkjCal0B0Gt62Jw7lYcnzl61wEnNRNQwHKtOnPxYdQi49JzZ37/sdxGJJtz4Ib v7OA== X-Gm-Message-State: ABy/qLY+YstBbLbnrYzU4+bTupxuopU1jUJcNRGKFeNRV3uBcJrda2kQ Ula+FQSPo3d1YLeZsfmFcN9WA5QzBE0YRPrur9E= X-Google-Smtp-Source: APBJJlFRPx5wtFxgh47W5qd3KMm87QcKjA8Ay/kFg+B+dgLQJw3hmAWvAI/USBdTRzMN8QcT6DpWme78JfLsbxjsBV4= X-Received: by 2002:a05:6808:20a0:b0:3a7:1b28:4bc9 with SMTP id s32-20020a05680820a000b003a71b284bc9mr17535401oiw.54.1691079385463; Thu, 03 Aug 2023 09:16:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: richard coleman Date: Thu, 3 Aug 2023 12:16:14 -0400 Message-ID: Subject: Re: analyze foreign tables To: Jeff Janes Cc: Pgsql-admin Content-Type: multipart/alternative; boundary="0000000000000ec06d0602071a8d" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000ec06d0602071a8d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Jeff, In my experience the overhead is directly related to the size and the complexity of the tables in the query. A simple query only referencing a small < 5M table with only a primary key, it isn't noticeable. A typical complicated query referencing numerous tables ranging in sizes from <5M to > 2.5T each with a primary key and 0 - 30 indices (per table), it can add 30 seconds to many minutes to each run of the query. When some of these queries are re-run constantly it becomes untenable. In one case, manually running analyze on the foreign tables and then running the query it returns in about 30 sec. Setting use_remote_estimate =3D true made each run return in about 5 minutes. Unfortunately, manually running analyze on each foreign table in the schema (500+ tables, ranging from < 1M to > 3T) takes more than a day to complete. On the server hosting the tables, the auto analyze is running constantly, as expected. All of the clusters are sitting at PostgreSQL 15. Hence my desire to find a more performant, less resource intensive way to pass the continuously updated statistics of these tables to the other PostgreSQL clusters holding the foreign table pointers to them. I know it's anecdotal, but I hope it helps anyway. rik. On Thu, Aug 3, 2023 at 10:19=E2=80=AFAM Jeff Janes w= rote: > On Tue, Aug 1, 2023 at 9:47=E2=80=AFAM richard coleman < > rcoleman.ascentgl@gmail.com> wrote: > >> >> use_remote_estimate isn't really a solution as it adds way too much >> overhead and processing time to every query run. >> > > Maybe this is the thing which should be addressed. Can you quantify what > you see here? How much overhead is being added for each query? Is this > principally processing time, or network latency? > > > Since these tables are being continuously analyzed in the database that >> hosts the data, is there some way that they statistics could be easily >> passed through the foreign server mechanism to the remote database that'= s >> calling the query? >> > > Since FDW can cross version boundaries, it is hard to see how this would > work. Maybe something could be done for the special case of where the > versions match. I think collations/encoding would be a problem, though. > > >> What I am hoping for is either: >> >> 2. add the ability to automatically run analyze on foreign tables just a= s >> they are currently run on local tables. >> > > That wouldn't work because communication is always initiated on the wrong > side. But it should be fairly easy to script something outside of the > database which would connect to both, and poll the "foreign" > pg_stat_all_tables.last_autovacuum and initiate a local ANALYZE for each > table which was recently autoanalyzed on the foreign side. > > Cheers, > > Jeff > --0000000000000ec06d0602071a8d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Jeff,
=C2=A0
In my experience the overhead is dire= ctly related to the size and the complexity of the tables in the query.

A simple query only referencing a small < 5M= table with only a primary key, it isn't noticeable.

A typical complicated query referencing numerous tables ranging in s= izes from <5M to > 2.5T each with a primary key and 0 - 30 indices (p= er table), it can add 30 seconds to many minutes to each run of the query.= =C2=A0 When some of these queries are re-run constantly it becomes untenabl= e.=C2=A0 In one case, manually running analyze on the foreign tables and th= en running the query it returns in about 30 sec. Setting use_remote_estimat= e =3D true made each run return in about 5 minutes.

Unfortunately, manually running analyze on each foreign table in the sche= ma (500+ tables, ranging from < 1M to > 3T) takes more than a day to = complete.=C2=A0 On the server hosting the tables, the auto analyze is runni= ng constantly, as expected.=C2=A0 All of the clusters are sitting at Postgr= eSQL 15.

Hence my desire to find a more performant= , less resource intensive way to pass the continuously updated statistics o= f these tables to the other PostgreSQL clusters holding the foreign table p= ointers to them.

I know it's anecdotal, but I = hope it helps anyway.

rik.


--0000000000000ec06d0602071a8d--