Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqcJS-007mhO-Hn for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 18:20:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tqcJR-00EBBf-1B for pgsql-hackers@arkaria.postgresql.org; Fri, 07 Mar 2025 18:20:45 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tqcJQ-00EBBW-9y for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 18:20:44 +0000 Received: from fhigh-a4-smtp.messagingengine.com ([103.168.172.155]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tqcJM-001Xs0-08 for pgsql-hackers@lists.postgresql.org; Fri, 07 Mar 2025 18:20:43 +0000 Received: from phl-compute-05.internal (phl-compute-05.phl.internal [10.202.2.45]) by mailfhigh.phl.internal (Postfix) with ESMTP id E78BE11401DA; Fri, 7 Mar 2025 13:20:38 -0500 (EST) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-05.internal (MEProxy); Fri, 07 Mar 2025 13:20:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :reply-to:subject:subject:to:to:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; t=1741371638; x=1741458038; bh=p 4h/TozMJg/F1+iRESKeWzDJcM2Mpgg+WzN8s/g9uxA=; b=mEVpnncvdTc/gOvrQ T0TbbK9OwIo86Uug9OeS8Um4h3xKlnSEkx3qjY7+huz78jPR5RhYW2ASCrbxh++L rW6MfoQlBZ6wR3an9zXFZLZroKeGARmXtrwxlq95iJP6pM7Fb2QcPyqIq0DY4OKB CaVQTF1+Vcen1qyrHs+su5gkXJHbabsloI8eZSLYTGm6HGkZRDGrgSY+4MT+umFk 1EJm39ESn3qdNCQ6qaARgtnSZYvV5tim8HeaW/cacaH3uGR0xoRch3kcEavyBnwt EJoFCBo6YT7svFK/SwD3bA/eBcPyFjDgac1aRplA8X5u6qRL84BL8C7KoDa26Pa6 nJWPA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdduuddufeeiucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkgggtugfgjgesthekredttddt jeenucfhrhhomheplmhlvhgrrhhoucfjvghrrhgvrhgruceorghlvhhhvghrrhgvsegrlh hvhhdrnhhoqdhiphdrohhrgheqnecuggftrfgrthhtvghrnhepuedtgfffvdevhffggefh veejvdehteetleeglefhudduudduveeiheetgeduiefgnecuffhomhgrihhnpehpohhsth hgrhdrvghspdgvnhhtvghrphhrihhsvggusgdrtghomhenucevlhhushhtvghrufhiiigv pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrlhhvhhgvrhhrvgesrghlvhhhrdhnoh dqihhprdhorhhgpdhnsggprhgtphhtthhopeegpdhmohguvgepshhmthhpohhuthdprhgt phhtthhopehorhhlohhvmhhgsehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlh dqhhgrtghkvghrsheslhhishhtshdrphhoshhtghhrvghsqhhlrdhorhhgpdhrtghpthht ohepshdruggvrhgvvhihrghnkhhosehpohhsthhgrhgvshhprhhordhruhdprhgtphhtth hopegrkhhorhhothhkohhvsehpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: ia2694551:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 7 Mar 2025 13:20:37 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=alvh.no-ip.org; s=schmee; t=1741371634; bh=xFqXKW0ikGH+8Nv2rXeiyYchvAXZ0vYkUct921poeHw=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=TefkrT5PGJTK3mkUMwUjmKrjsrShwhXYX7tvMCC2PWLGzyvt6aSL1038JkAAXwB78 7v0NdK8mxnoTG3TyGRUz5+XcbRpQRn5FJE+071JnSIcaRHPTlRxQ2NYyBC9KXA+rid ylvvZvTvmF03nHZujBOmeNYQ92xK+fjsTfSEp9FJLUkurZdr9mSumkm7ArhDo6pUnU NVQSRNwFwrnUnIBMRSVs/3vq5dfDb+8BsZJa8XqkY0BOurif2lwGucTISpsZu6v0uf mZ+ziP2nsGFSN5YG4frzH+jUqVwNwbsWmSp8j/vRRai7n3HXonv36GGUBzoTcJmFry yZYCdy7JzFkCA== Received: by schmee.alvh.no-ip.org (Postfix, from userid 1000) id A48AE8F; Fri, 7 Mar 2025 19:20:34 +0100 (CET) Date: Fri, 7 Mar 2025 19:20:34 +0100 From: =?utf-8?Q?=C3=81lvaro?= Herrera To: Alexander Korotkov Cc: pgsql-hackers@lists.postgresql.org, Maxim Orlov , Svetlana Derevyanko Subject: Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs Message-ID: <202503071820.j25zn3lo4hvn@alvherre.pgsql> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2024-Mar-25, Alexander Korotkov wrote: > reindexdb: Add the index-level REINDEX with multiple jobs > > Straight-forward index-level REINDEX is not supported with multiple jobs as > we cannot control the concurrent processing of multiple indexes depending on > the same relation. Instead, we dedicate the whole table to certain reindex > job. Thus, if indexes in the lists belong to different tables, that gives us > a fair level of parallelism. I tested this, because of a refactoring suggestion [1] and I find that it's rather completely broken. I ran this to setup a bunch of tables that I'd want reindexed in parallel: create table foo (a int); insert into foo select * from generate_series(1, (10^7)::numeric); create index foo1 on foo (a); create index foo2 on foo (a); create table bar (a int); insert into bar select * from generate_series(1, (2 * (10^7))::numeric); create index bar1 on bar (a); create index bar2 on bar (a); create table baz (a int); insert into baz select * from generate_series(1, (5 * (10^6))::numeric); create index baz1 on baz (a); create index baz2 on baz (a); create index baz3 on baz (a); create index baz4 on baz (a); I then run this: reindexdb -j4 --echo -i foo1 -i foo2 -i bar1 -i bar2 -i baz1 -i baz2 -i baz3 -i baz4 | grep REINDEX Looking at active processes with psql's \watch during the run, I learn that what happens is that we process the indexes on baz first, without any other process in parallel, until we get to the last one of baz table, and we start processing one from baz and one from foo in parallel. But when the one in baz is done, we only continue with one process until the list reaches indexes of 'bar', and we process two in parallel, and then we ran out of indexes in foo so we complete without any more paralellism. This is a waste and surely not what was intended: surely what we want is that given that we have more parallel jobs available than there are tables, we would start processing the first index of each table at roughly the same time, namely right at program start. That would keep three processes occupied until we ran out of indexes on one table, so we'd keep two processes occupied, and so on. But this is not what happens. Am I misunderstanding something? [1] https://postgr.es/m/CAEudQApP=u5-9PR_fs1DpZToQNrtTFSP+_fjrOgfi73UkrBXKQ@mail.gmail.com -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/