Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uB7NT-006jgL-3k for pgsql-hackers@arkaria.postgresql.org; Sat, 03 May 2025 07:33:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uB7MR-00DvLG-LA for pgsql-hackers@arkaria.postgresql.org; Sat, 03 May 2025 07:32:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <3danissimo@gmail.com>) id 1uB7MR-00DvL7-93 for pgsql-hackers@lists.postgresql.org; Sat, 03 May 2025 07:32:36 +0000 Received: from mail-yb1-xb2e.google.com ([2607:f8b0:4864:20::b2e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from <3danissimo@gmail.com>) id 1uB7MP-000plR-3D for pgsql-hackers@lists.postgresql.org; Sat, 03 May 2025 07:32:35 +0000 Received: by mail-yb1-xb2e.google.com with SMTP id 3f1490d57ef6-e7387d4a334so2376305276.2 for ; Sat, 03 May 2025 00:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746257551; x=1746862351; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mA4hSai12lVMEEZikeciF/eZ14yA4ZD43mx0PIQvqAw=; b=H+bRm70vLoadQFZp/+nMPYIudgVpHZCHDgMi8aUg1/Qf9ujy/PtANwk4tsTmn5q6mv HHCIHQK8cn5X2O/HVhblCLuUZ6XU1mXDof7sTjQpZI7XU6WZEBWbRa/MzQkCD8/7ufgb ugrrHZf2lgmpDHtP0hE0Qu2jmXxvt/zRlu+trxcY2dnd6SmdbdzcaMklvjf1vUFzq2eR IsQzsv74TerNbFzEZRKOAwgv46MqTO0Ye8ohxVVN90CCPTV6Q5sjB4VeR3blUo0xgwEX 1LPbb7qTTMCkIRQolmUCNNNrHc1770ycHQ2JVLaiAo1Z2wDTXI8u/N9kSpD1LFx6l12s HRpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746257551; x=1746862351; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mA4hSai12lVMEEZikeciF/eZ14yA4ZD43mx0PIQvqAw=; b=kOcC9k1xtonZPYkh8t2b5wqhXvwkx49Ov8Np3SgACN80r3b8ypjHbxM3Ky7+eU2wGu yva+JVVmRZnqJY+GXB1f73osFIgHtLqy8Ov1dIKiA/+osh1cK0VaRRDnTeb4/Tn97hN0 SU3zmag2fRHC4umPJf15j7GjUoh7+Y52QdPoRVRxEIAZLuWUcang4UQuflmO5pE4rZW2 eJ8Ils6Be+bdcwogF0VCGTvLixtEVFy7XMd1RyWNWhppL8BuG2YFHMmea3Dy8vLy2cKP jy7VldYgeS8xnNtsVlrTp55+MmWVUh8gUBSe1yAHZ24h0M81K0zNOhPHITp36u/+HBOV jExQ== X-Forwarded-Encrypted: i=1; AJvYcCWgtHMXvHdR7ws4qQiOVJxPme0WQ9ZVdKZiud04nrqLRgVzbF0uoGuIDbq1uu5AIH0zZ38ZR2KfAbsAlebz@lists.postgresql.org X-Gm-Message-State: AOJu0YyBN489IXyhslRD1SS1Lgs8YKL1QPzIR/8Sp8DJFripU91uijyo PMZXP0YRRCrh60Q2tY61OClISFfd29GvuNJQ9fg/ConOnvR0EK9FeBebkAoznb8reSpp1PIgOwS /lG2EIdmtxr1X3P5SkPWD5Qww0W0= X-Gm-Gg: ASbGncviGPvp3G9h4782O5X+rjTu0CHgb0w29ppfPjUkdcaDA0vg8WDnFBcLh3NCMTV PatEruP5wZM+j5bITDLiLNC1ZNSRIAfOlA6eFnpPE8lHxvJvGhiDli3fqzDPWDhHgK+4XmFJLde 8C2JRewym2suAVXnDvAS8T X-Google-Smtp-Source: AGHT+IHVo8tCeJrNsPMDCab9v28espnPIBgARadexM55Unvsi2mMIHEaVVpHShzpZxC+YnC3S+qFxc8PKURtGDyMRCU= X-Received: by 2002:a05:6902:2782:b0:e72:e0c4:fc59 with SMTP id 3f1490d57ef6-e757d364d97mr185714276.39.1746257551546; Sat, 03 May 2025 00:32:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Daniil Davydov <3danissimo@gmail.com> Date: Sat, 3 May 2025 14:32:20 +0700 X-Gm-Features: ATxdqUGsMTul8NS1elsP5wtILJil5mH04eIiN7tk-0BcDvAyAXQvcO6mCDruUlk Message-ID: Subject: Re: POC: Parallel processing of indexes in autovacuum To: Sami Imseih Cc: Masahiko Sawada , Maxim Orlov , Postgres hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, May 3, 2025 at 3:17=E2=80=AFAM Sami Imseih wr= ote: > > I think in most cases, the user will want to determine the priority of > a table getting parallel vacuum cycles rather than having the autovacuum > determine the priority. I also see users wanting to stagger > vacuums of large tables with many indexes through some time period, > and give the > tables the full amount of parallel workers they can afford at these > specific periods > of time. A/V currently does not really allow for this type of > scheduling, and if we > give some kind of GUC to prioritize tables, I think users will constantly= have > to be modifying this priority. If the user wants to determine priority himself, we anyway need to introduce some parameter (GUC or table option) that will give us a hint how we should schedule a/v work. You think that we should think about a more comprehensive behavior for such a parameter (so that the user doesn't have to change it often)? I will be glad to know your thoughts. > > If I understood correctly, then we are talking about the fact that > > TIDStore can store so many tuples that in fact a second pass is never > > needed. > > But the number of passes does not affect the presented optimization in > > any way. We must think about a large number of indexes that must be > > processed. Even within a single pass we can have a 40% increase in > > speed. > > I am not discounting that a single table vacuum with many indexes will > maybe perform better with parallel index scan, I am merely saying that > the TIDStore optimization now makes index vacuums better and perhaps > there is less of an incentive to use parallel. I still insist that this does not affect the parallel index vacuum, because we don't get an advantage in repeated passes. We get the same speed increase whether we have this optimization or not. Although it's even possible that the opposite is true - the situation will be better with the new TIDStore, but I can't say for sure. > > > Now, If I am going to allocate extra workers to run vacuum in paralle= l, why > > > not just provide more autovacuum workers instead so I can get more ta= bles > > > vacuumed within a span of time? > > > > For now, only one process can clean up indexes, so I don't see how > > increasing the number of a/v workers will help in the situation that I > > mentioned above. > > Also, we don't consume additional resources during autovacuum in this > > patch - total number of a/v workers always <=3D autovacuum_max_workers. > > Increasing a/v workers will not help speed up a specific table, what I > am suggesting is that instead of speeding up one table, let's just allow > other tables to not be starved of a/v cycles due to lack of a/v workers. OK, I got it. But what if vacuuming of a single table will take (for example) 60% of all time? This is still a possible situation, and the fast vacuum of all other tables will not help us. -- Best regards, Daniil Davydov