MIME-Version: 1.0
References: 
 <CACG=ezZOrNsuLoETLD1gAswZMuH2nGGq7Ogcc0QOE5hhWaw=cw@mail.gmail.com>
 <CAD21AoCdx5ZNS_cO7bYz1Zfb+Kw1kuJV2wtewrz7T1pPpjcWGw@mail.gmail.com>
 <CAA5RZ0vN_RjrHR+HXTkfHydRDZ-yGrpapWQ3-oGj1W34AoftmQ@mail.gmail.com>
 <CAJDiXgigcF3CMY86oREdQvxUDaUDFihkK9f78rdEyLTLeB0hdA@mail.gmail.com>
 <CAA5RZ0s4eXW1V+fqu-WDBkFh+h43dYke81Tht1V0sFRJ5vjX2Q@mail.gmail.com>
In-Reply-To: 
 <CAA5RZ0s4eXW1V+fqu-WDBkFh+h43dYke81Tht1V0sFRJ5vjX2Q@mail.gmail.com>
From: Daniil Davydov <3danissimo@gmail.com>
Date: Sat, 3 May 2025 14:32:20 +0700
Message-ID: 
 <CAJDiXgh1RLozmtCZW=Q14zmSQCGUCtda5=KRbeZq=BBn0saJAA@mail.gmail.com>
Subject: Re: POC: Parallel processing of indexes in autovacuum
To: Sami Imseih <samimseih@gmail.com>
Cc: Masahiko Sawada <sawada.mshk@gmail.com>, Maxim Orlov <orlovmg@gmail.com>,
	Postgres hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAJDiXgh1RLozmtCZW%3DQ14zmSQCGUCtda5%3DKRbeZq%3DBBn0saJAA%40mail.gmail.com>
Precedence: bulk

On Sat, May 3, 2025 at 3:17=E2=80=AFAM Sami Imseih <samimseih@gmail.com> wr=
ote:
>
> I think in most cases, the user will want to determine the priority of
> a table getting parallel vacuum cycles rather than having the autovacuum
> determine the priority. I also see users wanting to stagger
> vacuums of large tables with many indexes through some time period,
> and give the
> tables the full amount of parallel workers they can afford at these
> specific periods
> of time. A/V currently does not really allow for this type of
> scheduling, and if we
> give some kind of GUC to prioritize tables, I think users will constantly=
 have
> to be modifying this priority.

If the user wants to determine priority himself, we anyway need to
introduce some parameter (GUC or table option) that will give us a
hint how we should schedule a/v work.
You think that we should think about a more comprehensive behavior for
such a parameter (so that the user doesn't have to change it often)? I
will be glad to know your thoughts.

> > If I understood correctly, then we are talking about the fact that
> > TIDStore can store so many tuples that in fact a second pass is never
> > needed.
> > But the number of passes does not affect the presented optimization in
> > any way. We must think about a large number of indexes that must be
> > processed. Even within a single pass we can have a 40% increase in
> > speed.
>
> I am not discounting that a single table vacuum with many indexes will
> maybe perform better with parallel index scan, I am merely saying that
> the TIDStore optimization now makes index vacuums better and perhaps
> there is less of an incentive to use parallel.

I still insist that this does not affect the parallel index vacuum,
because we don't get an advantage in repeated passes. We get the same
speed increase whether we have this optimization or not.
Although it's even possible that the opposite is true - the situation
will be better with the new TIDStore, but I can't say for sure.

> > > Now, If I am going to allocate extra workers to run vacuum in paralle=
l, why
> > > not just provide more autovacuum workers instead so I can get more ta=
bles
> > > vacuumed within a span of time?
> >
> > For now, only one process can clean up indexes, so I don't see how
> > increasing the number of a/v workers will help in the situation that I
> > mentioned above.
> > Also, we don't consume additional resources during autovacuum in this
> > patch - total number of a/v workers always <=3D autovacuum_max_workers.
>
> Increasing a/v workers will not help speed up a specific table, what I
> am suggesting is that instead of speeding up one table, let's just allow
> other tables to not be starved of a/v cycles due to lack of a/v workers.

OK, I got it. But what if vacuuming of a single table will take (for
example) 60% of all time? This is still a possible situation, and the
fast vacuum of all other tables will not help us.

--
Best regards,
Daniil Davydov