Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uCAKg-00BR1H-5x for pgsql-hackers@arkaria.postgresql.org; Tue, 06 May 2025 04:55:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uCAKf-006oB1-7X for pgsql-hackers@arkaria.postgresql.org; Tue, 06 May 2025 04:55:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <3danissimo@gmail.com>) id 1uCAKe-006oAt-TN for pgsql-hackers@lists.postgresql.org; Tue, 06 May 2025 04:55:04 +0000 Received: from mail-yb1-xb32.google.com ([2607:f8b0:4864:20::b32]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <3danissimo@gmail.com>) id 1uCAKb-000OG7-32 for pgsql-hackers@lists.postgresql.org; Tue, 06 May 2025 04:55:04 +0000 Received: by mail-yb1-xb32.google.com with SMTP id 3f1490d57ef6-e732fa4e2f1so5126771276.0 for ; Mon, 05 May 2025 21:55:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746507301; x=1747112101; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=27mDoZIUfEX1dYcuDIirRBvd6NrsL755HXvv5O7j7h4=; b=JaA127TrzeTeL2puT3nTry1lqTDUMbMpvh+AttmVfBrgWF+G0eJOMpJDHC1JYhS3BE lSAOTOLaBGFKN+YcPs1YZhdWdtStevnKGQpogvdUgSUeI9aZ8h71zAgaZMpcBh/yTYpy ltHmM/DIsMYnC4vbcJz0yQqKU+Oylk99Gl/5ar0n7Pnnn9ioQlG9PyqKL0PG2hTJOvPQ X0jNxg0DwUT6QCM5ZC+XgYEsLV8IJ64zq4wpwLjgzj4r2TvutdUabMpUwn1uvtqjkPTx jFEVDlHY4sFPOIlad4L2rig9MD+8og4YXpnDeto9V9UvRvGRPloLe7vBopmin/tVjUYG Bhgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746507301; x=1747112101; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=27mDoZIUfEX1dYcuDIirRBvd6NrsL755HXvv5O7j7h4=; b=QFp6fNCoL5R1clcX86eMBR+dDNNyDdRKzDUSu1YlTIFRZW+J6fdllnx9Hf3K2mAP6V xH4q9heYQFO0wjPh0tLRtIZ3ha4wg0HUuru/bVUOINwygDQobgNooN+qMYiuC4iNZJ7S 2kmPuNcZKcVZrYAwrnkkpcr4C+fRcNbBcFTmD6h7HdgU1YndLaBJ5uA3BS0YtHs3n19f 6/+iO3CPHW+WzR/x05pmqwLtyGapaar2mxQ27vKQ3+lLkERUUFApI9yGcGurITyGdqQ7 4mtdreyE8NLyUAM5kgsAitVqy6yEIDHoUHxBF1ZfV+iuCT4h23L4BBtN4lTkWCMNyNj7 2zBA== X-Forwarded-Encrypted: i=1; AJvYcCWtocEJKNpEJYv95fux80KKejsVa1cNQk+cZXmKbpSgQ03flyXLto4eqnRiR9y+j6bQ3EXPsChPfAVbcgnG@lists.postgresql.org X-Gm-Message-State: AOJu0Yw5lpGPqyHHmfZE3vDzF0/UJ9WQq3bcC6ypPWTc+AioZJaFuUv0 VdusmAjp6eSJdq36/sDNQHFGo8oDF+oMV2BJdoUucGVa28MvT7izwmZNtJbRL5gwaP2eCazJhgE LqnMOVUNtvzunlXeGj03k/x8m4Rg= X-Gm-Gg: ASbGncueJSkShrUV6ZIOh6sN46CUwFHGLVcWABn+wVdoK+wCzXmV/Ls+d8xZH/7Kv9M ZTZgfIVisjJzLi4h9zauAXV3LON+tCJh3q0QvudflYtBFw3usl0S+xJr03b0y14W83uSJS3NRw5 NSOdYbVilhpQ9pD98y5mAfdGMYatdq3fs= X-Google-Smtp-Source: AGHT+IHneM25JjwARJ6eTvd9Tk8WkDgCAtUqV4xVSqusBOKNX/AWppaoHBzb8r05o8YXeMYNeqrS5rPFhbFKiNqbC00= X-Received: by 2002:a05:6902:1ac5:b0:e69:371d:677a with SMTP id 3f1490d57ef6-e75c08c4dddmr2099248276.1.1746507300868; Mon, 05 May 2025 21:55:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Daniil Davydov <3danissimo@gmail.com> Date: Tue, 6 May 2025 11:54:49 +0700 X-Gm-Features: ATxdqUHmYb8FY1rBsd0NbvCBGKe8-caqbTcJ9AcZeqA2Z1sD5Q-vH2Jv_azWYcQ Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Masahiko Sawada Cc: Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, May 6, 2025 at 6:57=E2=80=AFAM Masahiko Sawada wrote: > > What I roughly imagined is that we don't need to change the entire > autovacuum scheduling, but would like autovacuum workers to decides > whether or not to use parallel vacuum during its vacuum operation > based on GUC parameters (having a global effect) or storage parameters > (having an effect on the particular table). The criteria of triggering > parallel vacuum in autovacuum might need to be somewhat pessimistic so > that we don't unnecessarily use parallel vacuum on many tables. > +1, I think about it in the same way. I will expand on this topic in more detail in response to Sami's letter [1], so as not to repeat myself. > > Here are my thoughts on this. A/v worker has a very simple role - it > > is born after the launcher's request and must do exactly one 'task' - > > vacuum table or participate in parallel index vacuum. > > We also have a dedicated 'launcher' role, meaning the whole design > > implies that only the launcher is able to launch processes. > > > > If we allow a/v worker to use bgworkers, then : > > 1) A/v worker will go far beyond his responsibility. > > 2) Its functionality will overlap with the functionality of the launche= r. > > While I agree that the launcher process is responsible for launching > autovacuum worker processes but I'm not sure it should be for > launching everything related autovacuums. It's quite possible that we > have parallel heap vacuum and processing the particular index with > parallel workers in the future. The code could get more complex if we > have the autovacuum launcher process launch such parallel workers too. > I believe it's more straightforward to divide the responsibility like > in a way that the autovacuum launcher is responsible for launching > autovacuum workers and autovacuum workers are responsible for > vacuuming tables no matter how to do that. It sounds very tempting. At the very beginning I did exactly that (to make sure that nothing would break in a parallel autovacuum). Only later it was decided to abandon the use of bgworkers. For now both approaches look fair for me. What do you think - will others agree that we can provide more responsibility to a/v workers? > > 3) Resource consumption can jump dramatically, which is unexpected for > > the user. > > What extra resources could be used if we use background workers > instead of autovacuum workers? I meant that more processes are starting to participate in the autovacuum than indicated in autovacuum_max_workers. And if a/v worker will use additional bgworkers =3D> other operations cannot get these resources. > > Autovacuum will also be dependent on other resources > > (bgworkers pool). The current design does not imply this. > > I see your point but I think it doesn't necessarily need to reflect it > at the infrastructure layer. For example, we can internally allocate > extra background worker slots for parallel vacuum workers based on > max_parallel_index_autovac_workers in addition to > max_worker_processes. Anyway we might need something to check or > validate max_worker_processes value to make sure that every autovacuum > worker can use the specified number of parallel workers for parallel > vacuum. I don't think that we can provide all supportive workers for each parallel index vacuuming request. But I got your point - always keep several bgworkers that only a/v workers can use if needed and the size of this additional pool (depending on max_worker_processes) must be user-configurable. > > I wanted to create a patch that would fit into the existing mechanism > > without drastic innovations. But if you think that the above is not so > > important, then we can reuse VACUUM PARALLEL code and it would > > simplify the final implementation) > > I'd suggest using the existing infrastructure if we can achieve the > goal with it. If we find out there are some technical difficulties to > implement it without new infrastructure, we can revisit this approach. OK, in the near future I'll implement it and send a new patch to this thread. I'll be glad if you will take a look on it) [1] https://www.postgresql.org/message-id/CAA5RZ0vfBg%3Dc_0Sa1Tpxv8tueeBk8C= 5qTf9TrxKBbXUqPc99Ag%40mail.gmail.com -- Best regards, Daniil Davydov