Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vUVSh-00Dx3z-1E for pgsql-hackers@arkaria.postgresql.org; Sat, 13 Dec 2025 19:39:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vUVSg-00BgwP-0w for pgsql-hackers@arkaria.postgresql.org; Sat, 13 Dec 2025 19:39:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vUVSf-00BgwG-2z for pgsql-hackers@lists.postgresql.org; Sat, 13 Dec 2025 19:39:26 +0000 Received: from mail-wm1-x32e.google.com ([2a00:1450:4864:20::32e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vUVSe-000ZoK-05 for pgsql-hackers@lists.postgresql.org; Sat, 13 Dec 2025 19:39:26 +0000 Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-4775e891b5eso10019065e9.2 for ; Sat, 13 Dec 2025 11:39:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1765654763; x=1766259563; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w5sa16E3yGTQvod7xQm1kp1HOwkT0tcdAUTWpw7tcDA=; b=jj4OgSy+Jl6vVcbt4O7F+jv10H+pGjsW3QJttq65r9+FeYtifPMgFdGUo+5mB2CYaT pHUHdowZxF3p/rEiZYoDtYAb8YZ8Scy5qDRmRWXkJ209PYvl60OnbIs7/5qeF+sysxSy y2KaOSJ8lHTugmoLOVcQ6Dfn0eKkHL8n0+TiCbSDdgsquRaNzl2b2ylnTUyMRQWn6w4H SSak+Owwz2lZVdKoVnJiHW0e3bYENt7Aam7XiFaSNU1/J7tcQY3rGe2w43iHGDCYeego 6QT9ApJ6+S9400l2N/U49kdQ6xlf5i+nmTs0SpF22utR5OjwCf4GOZCSKmf7EpAt6KTi y3eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765654763; x=1766259563; h=message-id:date:content-transfer-encoding:mime-version:comments :references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=w5sa16E3yGTQvod7xQm1kp1HOwkT0tcdAUTWpw7tcDA=; b=lpb8pssOiSn+h+fpMOE21qTv/tSvwLHa5FdxDo0dbcy8kkfVCZZniy0vDJDunJKEdx 7kRCwdwQY4hPaiV0AHrUUGwMFtYApDxYIoVgqQx7mtqiA3F8gUybMkHOhas44Sqnon44 VGTRz28Kd/BgkHXtHK2azXq5vkVOJT6DS3FzVLnXrvlvsp1ounnsJtCYossRcEtd/YXh vIYGsjXVQ3FoIjCktzm2JUI5X6a5iUleJI+8fdMq8ztQz7rgUEO3iJWYIgBRlxi61dIm 87HXRkHpTkXZGVHdRTDux7S5S2byNBotkq6T4/uK2WaBJI0lt2XDsGhwAaxgLrUevmNa 6zDA== X-Forwarded-Encrypted: i=1; AJvYcCUKdzBSV8ODXejUTmWGXUeELnwjYyQACfB5ZUlckPzPpwYI7v/s5KkPrI90y9kNukSgFFnmpZV8UBCffRi4@lists.postgresql.org X-Gm-Message-State: AOJu0Yyz7N2Qqws4RHB0CrKTSTU0uq3saAsU5/cOktBWF8GbrZO7Z2FI hPGEpiwGFJDwoHGi5vEAMP0ep2+0lW5rvB7nnkLe99cKHzV3kdX10bwseHjzPuLZzVg= X-Gm-Gg: AY/fxX6a6BaSYbQIbg+z/KQBVzUHJjYd6liQr0yF0sBcjDMwp4k23t/a75ecBe8ppVZ xBKIRi/PZnz8g1Q9s9x1RWw1RiJVZFbeodZMzayN6VbvBuuQe60FrndMM9640eERLnf37+rMLJq CKrPLOAILp45/kgcndetdMM1HdOGp75uZg1hAow/lE5g415lRK4F4j804Qct1W7he/dqOlr8bK5 jRgZ63k1WtZ6e+aVNBiH4unhdS+/mBiFMyCFukxmjlHwOMje71VjiojKlF9FQ8izK3dEX2sbELc ed3UWhqbSosmpeZQjWAdB8EFsz6TLQozvEd0Mjn8RTwruzhjBGiTgZ04UqPoQ8Ciagv+sSe0boQ vym18mbqyecp6J2DpBWB9IgKOCxVClZffvAxVGMRiXY8B8MAs/SQqs0kW80ZjTvmSZgx4Wgluac aYg3zgLBMmXZWglUTHmQnUt1sX X-Google-Smtp-Source: AGHT+IFpOslSd9nNi7y0STVi+PhlH4yzvMREIged8RpyUrkI9DVPYo7yKjOXFnLGhmsw+epeuK6z6g== X-Received: by 2002:a05:600c:4f90:b0:477:9dc1:b706 with SMTP id 5b1f17b1804b1-47a8f9055bfmr56982075e9.19.1765654762275; Sat, 13 Dec 2025 11:39:22 -0800 (PST) Received: from localhost (109-81-168-246.rct.o2.cz. [109.81.168.246]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42fa8a741c9sm20584526f8f.19.2025.12.13.11.39.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Dec 2025 11:39:22 -0800 (PST) From: Antonin Houska To: Mihail Nikalayeu cc: Alvaro Herrera , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202510301734.pj4uds3mqxx4@alvherre.pgsql> <116433.1764870207@localhost> <171530.1765306357@localhost> Comments: In-reply-to Mihail Nikalayeu message dated "Thu, 11 Dec 2025 21:38:00 +0100." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 13 Dec 2025 20:39:21 +0100 Message-ID: <212153.1765654761@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Mihail Nikalayeu wrote: > On Tue, Dec 9, 2025 at 7:52=E2=80=AFPM Antonin Houska wr= ote: > > Worker makes more sense to me - the initial implementation is in 0005. >=20 > Comments for 0005, so far: Thanks! > --- > > export_initial_snapshot >=20 > Hm, should we use ExportSnapshot instead? And ImportSnapshort to import i= t. There is at least one thing that I don't want: ImportSnapshot calls SetTransactionSnapshot() at the end. I chose the way leader process uses to serialize and pass snapshot to background workers. > --- > > get_initial_snapshot >=20 > Should we check if a worker is still alive while waiting? Also is > "process_concurrent_changes". ConditionVariableSleep() should handle that - see the WL_EXIT_ON_PM_DEATH f= lag in ConditionVariableTimedSleep(). > And AFAIU RegisterDynamicBackgroundWorker does not guarantee new > workers to be started (in case of some fork-related issues). Yes, user will get ERROR in such a case. This is different from parallel workers in query processing: if parallel worker cannot be started, the lead= er (AFAICS) still executes the query. I'm not sure though if we should impleme= nt REPACK (CONCURRENTLY) in such a way that it works even w/o the worker. The code would be more complex and the behaviour quite different (I mean the possibly huge amount of unprocessed WAL that you pointed out earlier.) > --- > > Assert(res =3D SHM_MQ_DETACHED); >=20 > =3D=3D Thanks! > --- > > /* Wait a bit before we retry reading WAL. */ > > (void) WaitLatch(MyLatch, > > WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, > > 1000L, > > WAIT_EVENT_REPACK_WORKER_MAIN); >=20 > Looks like we need ResetLatch(MyLatch); here. You seem to be right. > --- > > * - decoding_ctx - logical decoding context, to capture concurrent data >=20 > Need to be removed together with parameters. Do you mean in 0005? (It'd help if you pasted the hunk headers.) This should be fixed in v28 [1] > --- > > hpm_context =3D AllocSetContextCreate(TopMemoryContext, > > "ProcessParallelMessages", > > ALLOCSET_DEFAULT_SIZES); >=20 > "ProcessRepacklMessages" ok, the copy and pasting is a problem that needs to be addressed (mentioned= in the last paragraph of the commit message of 0005). > --- > > if (XLogRecPtrIsInvalid(lsn_upto)) > > { > > SpinLockAcquire(&shared->mutex); > > lsn_upto =3D shared->lsn_upto; > > /* 'done' should be set at the same time as 'lsn_upto' */ > > done =3D shared->done; > > SpinLockRelease(&shared->mutex); > > > > /* Check if the work happens to be complete. */ > > continue; > > } >=20 > May be moved to the start of the loop to avoid duplication. I found more problems in this part when working on v28, maybe check that. > --- > > SpinLockAcquire(&shared->mutex); > > valid =3D shared->sfs_valid; > > SpinLockRelease(&shared->mutex); >=20 > Better to remember last_exported here to avoid any races/misses. What races/misses exactly? > --- > > shared->lsn_upto =3D InvalidXLogRecPtr; >=20 > I think it is better to clear it once it is read (after removing duplicat= ion). Maybe, I'll think about it. > --- > > bool done; >=20 > bool exit_after_lsn_upto? Not sure. > --- > > bool sfs_valid; >=20 > Do we really need it? I think it is better to leave only last_exported > and in process_concurrent_changes wait add argument > (last_processed_file) and wait for last_exported to become higher. I'll consider that (The variable is replaced in the 0006 part of v28, but t= he idea should still be applicable.) > --- > What if we reverse roles of leader-worker? >=20 > Leader gets a snapshot, transfers it to workers (multiple probably for > parallel scan) using already ready mechanics - workers are processing > the scan of the table in parallel. Leader decodes the WAL. Insertion into a table by multiple workers is a special thing, but maybe it= 'd be doable in this case, but ... > Also, workers may be assigned with a list of indexes they need to build. >=20 > Feels like it reuses more from current infrastructure and also needs > less different synchronization logic. But I'm not sure about the > indexes phase - maybe it is not so easy to do. ... my feelings were the opposite, i.e. I thought require higher amount of code rearrangement. Moreover, the part 0006 of v28 (snapshot switching) wou= ld be trickier. It processes one range of blocks after another, and parallelism would make it more difficult. > --- > Also, should we add some kind of back pressure between building > indexes/new heap and num of WAL we have? > But probably it is out of scope of the patch. Do you mean that the decoding worker should be less active if the amount of WAL doesn't grow too fast? > --- > To build N indexes we need to scan table N times. What is about > building multiple indexes during a single heap scan? That sounds like a separate feature, and similarly difficult as enhancing CREATE INDEX so it can create multiple indexes at a time. > -- > Just a gentle reminder about the XMIN_COMMITTED flag and WAL storm > after the switch. ok, I have it in my notes, moved it more to the top :-) [1] https://www.postgresql.org/message-id/210036.1765651719%40localhost --=20 Antonin Houska Web: https://www.cybertec-postgresql.com