Re: analyze foreign tables - richard coleman

public inbox for [email protected]  
help / color / mirror / Atom feed

From: richard coleman <[email protected]>
To: Jeff Janes <[email protected]>
Cc: Pgsql-admin <[email protected]>
Subject: Re: analyze foreign tables
Date: Thu, 3 Aug 2023 12:16:14 -0400
Message-ID: <CAGA3vBtr=LjXtugfL5352PiRe0vrUm+jPpZTN=u8NT48BLf3yA@mail.gmail.com> (raw)
In-Reply-To: <CAMkU=1x5m60yaz5sriL12LF=53C8PTZbGiOGtb0Z159nTDKSrg@mail.gmail.com>
References: <CAGA3vBt726ha1P91DrSOF8oc57X27tLO8o9Wi9oDN5L3qqRi7Q@mail.gmail.com>
	<CAMkU=1x5m60yaz5sriL12LF=53C8PTZbGiOGtb0Z159nTDKSrg@mail.gmail.com>

Jeff,

In my experience the overhead is directly related to the size and the
complexity of the tables in the query.

A simple query only referencing a small < 5M table with only a primary key,
it isn't noticeable.

A typical complicated query referencing numerous tables ranging in sizes
from <5M to > 2.5T each with a primary key and 0 - 30 indices (per table),
it can add 30 seconds to many minutes to each run of the query.  When some
of these queries are re-run constantly it becomes untenable.  In one case,
manually running analyze on the foreign tables and then running the query
it returns in about 30 sec. Setting use_remote_estimate = true made each
run return in about 5 minutes.

Unfortunately, manually running analyze on each foreign table in the schema
(500+ tables, ranging from < 1M to > 3T) takes more than a day to
complete.  On the server hosting the tables, the auto analyze is running
constantly, as expected.  All of the clusters are sitting at PostgreSQL 15.

Hence my desire to find a more performant, less resource intensive way to
pass the continuously updated statistics of these tables to the other
PostgreSQL clusters holding the foreign table pointers to them.

I know it's anecdotal, but I hope it helps anyway.

rik.

On Thu, Aug 3, 2023 at 10:19 AM Jeff Janes <[email protected]> wrote:

> On Tue, Aug 1, 2023 at 9:47 AM richard coleman <
> [email protected]> wrote:
>
>>
>> use_remote_estimate isn't really a solution as it adds way too much
>> overhead and processing time to every query run.
>>
>
> Maybe this is the thing which should be addressed.  Can you quantify what
> you see here?  How much overhead is being added for each query?  Is this
> principally processing time, or network latency?
>
>
> Since these tables are being continuously analyzed in the database that
>> hosts the data, is there some way that they statistics could be easily
>> passed through the foreign server mechanism to the remote database that's
>> calling the query?
>>
>
> Since FDW can cross version boundaries, it is hard to see how this would
> work.  Maybe something could be done for the special case of where the
> versions match. I think collations/encoding would be a problem, though.
>
>
>> What I am hoping for is either:
>>
>> 2. add the ability to automatically run analyze on foreign tables just as
>> they are currently run on local tables.
>>
>
> That wouldn't work because communication is always initiated on the wrong
> side.  But it should be fairly easy to script something outside of the
> database which would connect to both, and poll the "foreign"
> pg_stat_all_tables.last_autovacuum and initiate a local ANALYZE for each
> table which was recently autoanalyzed on the foreign side.
>
> Cheers,
>
> Jeff
>

view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: analyze foreign tables
  In-Reply-To: <CAGA3vBtr=LjXtugfL5352PiRe0vrUm+jPpZTN=u8NT48BLf3yA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox