Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1viqCT-00Gr4w-22 for pgsql-hackers@arkaria.postgresql.org; Thu, 22 Jan 2026 08:37:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1viqCR-00C1mB-1s for pgsql-hackers@arkaria.postgresql.org; Thu, 22 Jan 2026 08:37:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1viqCQ-00C1lx-2x for pgsql-hackers@lists.postgresql.org; Thu, 22 Jan 2026 08:37:55 +0000 Received: from mail-wr1-x430.google.com ([2a00:1450:4864:20::430]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1viqCL-001s90-1C for pgsql-hackers@lists.postgresql.org; Thu, 22 Jan 2026 08:37:54 +0000 Received: by mail-wr1-x430.google.com with SMTP id ffacd0b85a97d-435a11957f6so524303f8f.0 for ; Thu, 22 Jan 2026 00:37:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1769071063; x=1769675863; darn=lists.postgresql.org; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=H7VvwTH71DdRSt6rYeW2PaI1a31qJfLmORQuj23ltCI=; b=dPMjsVB8MGX1nnyu7Y8wdcgCeReWlRhkhLj7e9RLTuYOvW9qcklu9ewDWMxUEtVvhj iHBN5RIJtiojPNivMCwkHtHO64h8BHMX4TUg/lmFqSpiCRp7YsyPIc+hn91IHG5OlTvr B6tJ5Cni6uzJptzRbAJz8UkqRgCs/rGFiQxSSPr/NCQZNakYVZVNjI+sd3/sFKiHneVG 8Dq6OchIlctte10I22G8mZTFuAeFtGcCmdpgR4+qveYuLltcsgyjaQ1eTpSPuQdgVnxF pJ3JnSvP50UJ8hsNQN2MJqdMsNHG/67niL4k1slzVoWJwUXemAMCLFRgamSDSZ0N8W35 ddNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769071063; x=1769675863; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=H7VvwTH71DdRSt6rYeW2PaI1a31qJfLmORQuj23ltCI=; b=HLI3cjtv0hUAv2Yzyb2kyS7lUr9dGYmYrWIYWTQEMBTPIBP1jw1y2BH44YuLLAICJX tpIVa3IdwkZvecjSq8/FuwY8xd1WDoAPRtH11OsOuGkdTm8tPEW+aF6CWJwLiMwHPJnc 2dpR5sjZ9Qa9royxeExN1lICIeEjshOc/n/c2HG79rvKSi4Oc8jHY9QuSh91Z1Bcm4TE rJfoO7R2JTkclls+CSPLvNHzmPuGjM054OBsgOac9DlEtGUQWCYZUByo7re5TtRlI4bX D+Y/UZuA9Luu5JQX8VeqZOjmUNEZaWQWmqPGtc8IdsaAik0ABL49ADxvj/hTr/k3PqM8 Vz5g== X-Forwarded-Encrypted: i=1; AJvYcCXXTGMh9ORKLomK/8++zKsUYRFhLfX3yORjzSVLvDilOvy4DSVMvHG/RNESjLgM1orx1gbSsiRGFs60Isyd@lists.postgresql.org X-Gm-Message-State: AOJu0YwKp8jM2WtaEW8iGNTWnPH5wMSYt669c3gSKYLc5fdjd5R17D+R C36CcPtrhk1UpCEQH9zlb+HfT+aHrmk8XCermbClhF2duXq6sKoZv2ErLNoD6mUNk6E= X-Gm-Gg: AZuq6aI6GpHj4K3iDAd3jqI1rhusEfB39IMurXP6snvJWrI5m/P83FD4ArC3Ngcce6I HHzcv/jrLzkzp0qdbthKNqZIveyWomI8wypY2kh36KhYAlGzcnYU5zqefDfYBtUBPKzSUgL0ryL 5q1XFaiu/nStUA9SicrXkxEhCOhMo6r6bMirAu1bcsSUT15SSssBzwMzL11CLEUyQ4kLiky2cMU SPQ9zSCJ5BcaUWa4PDVf1lGYUoIA9nW8M4twV/J18WZ/GcF0IhfGU8wdSR/HKLk7GYDAZYNRD8I pjciQZrcZ9zRSRDMHKF+v8MmXisrTFgXOLQoyBfundzwmU/Aa5uuEGZi2EI/XGYIU9yFyVnmce+ JwzfQZKduOngU06M1yae0vunbtkG744gjrnm7sXwagqDTqoj1v9eZUGE7qLSfS1SpMVFHzv3B/s zDJ9khmQG3u1FDB8zUDBQ6KxPQ X-Received: by 2002:a05:6000:290d:b0:435:ad52:31d4 with SMTP id ffacd0b85a97d-435ad5234ccmr1172160f8f.26.1769071062720; Thu, 22 Jan 2026 00:37:42 -0800 (PST) Received: from localhost (109-81-168-246.rct.o2.cz. [109.81.168.246]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43596291e0asm14374935f8f.15.2026.01.22.00.37.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Jan 2026 00:37:41 -0800 (PST) From: Antonin Houska To: Mihail Nikalayeu cc: Alvaro Herrera , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202512151349.vlq3mpfniyk3@alvherre.pgsql> <11247.1767609087@localhost> <11558.1767609632@localhost> <141054.1767891540@localhost> <137668.1768235610@localhost> Comments: In-reply-to Mihail Nikalayeu message dated "Mon, 19 Jan 2026 00:52:00 +0300." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Date: Thu, 22 Jan 2026 09:37:40 +0100 Message-ID: <74802.1769071060@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --=-=-= Content-Type: text/plain Mihail Nikalayeu wrote: > Some comments for 0006: > > > SnapBuildSnapshotForRepack(SnapBuild *builder) > Does it also "replays" previously processed WAL to the position that snapshot is ready to use? > I am afraid we may see some non-yet processed parts of WAL leading to duplicate insertion. The changes present in WAL decoded prior the snapshot creation are not replayed - these changes are visible to the snapshot. (This is not really specific to the 0006 part.) > > first_block > What is the reason for that variable? It seems to be always the first block > of relation. Although scan usually starts at the first block, it does not have to, especially due to synchronized sequential scans. > Also, what if we have a huge amount of empty space at the start. In that case the first block will be the block of the first "filled" page. But > insert may (and will) fill empty pages before first_block - out of the range. Good catch! I think I used this "lazy initialization" because I couldn't find the start block in TableScanDesc, and missed the problem that you describe here. > So, I think we should delete it and always use 0 instead. No, we need to set first_block to heapScan->rs_startblock before the scan starts. > > tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL); > With SnapshotAny we are going to check the same tuple multiple times. Better to let scan logic handle it (and change snapshots used by scan > code). The current API does not seem to support changing snapshot of an in-progress scan and I don't want to change that. Plus note that the current implementation of CLUSTER also uses SnapshotAny and then checks the visibility separately. Finally, SnapshotAny is not really an expensive visibility check, if it can be considered a visibility check at all. > > if (blkno >= range_end) > I don't think it is legal to switch a snapshot while holding the tuple. Nothing is protecting it from being pruned\reused. Snapshot protects the table as whole from pruning live (or recently dead) tuples, but when you have fetched a tuple, the containing buffer remains pinned. The buffer pin itself makes prunning of the page impossible. And especially with REPACK (CONCURRENTLY), page pruning is also restricted by the replication slot's xmin. This is increased by calling LogicalIncreaseXminForSlot() from the decoding worker, each time it has created a new snapshot. > > PopActiveSnapshot(); > > InvalidateCatalogSnapshot(); > I think it is a good idea to add here assert for MyProc->xmin and MyProc->xid to be invalid. To ensure we really allow the horizon to advance. I've added it only for xmin. xid is valid because REPACK is executed in a transaction. That reminds me that PROC_IN_VACUUM should be present in MyProc->statusFlags. Fixed. > > /* Set to request a snapshot. */ > > bool snapshot_requested; > We know the end or region in advance, so it should be possible to filter before writing changes to file. Yes, filtering before writing makes sense, I'll consider that. > So, it is some kind of "this is the end of region, create new file and store everything before + create new snapshot for me". The last se of changes does not have to be followed by a snapshot - that's the purpose of snapshot_requested. > > PopActiveSnapshot(); > Sometimes without InvalidateCatalogSnapshot(). [ It'd be a bit easier to find the code if you included hunk headers. ] heapam_relation_copy_for_cluster() does not access catalogs after the first invalidation. The following comment is related: /* * XXX It might be worth Assert(CatalogSnapshot == NULL) here, * however that symbol is not external. */ > > PushActiveSnapshot(GetTransactionSnapshot()); > GetLatestSnapshot() feels better here. What will then happen to code that uses GetActiveSnapshot() ? > > * The individual builds might still be a problem, but that's a > > * separate issue. > Opening the index may create a catalog snapshot, so it needs to be invalidated after. It'll be invalidated in the next iteration. The point of this invalidation is to use one snapshot per index. > > * TODO Can we somehow use the fact that the new heap is not yet > > * visible to other transaction, and thus cannot be vacuumed? > Snapshot resetting [0] may work here (without CIC, just as part of the scan + some code to ensure catalog snapshot is managed correctly). > Also, to correctly build a unique index - some tech from [0] is required (building a unique index with multiple snapshots is a little bit tricky). > Or we may implement some super lightweight way - just SnapshotAny without any visibility checks (just assume everything is ok since it > copied from another relation with the same index set). ok, I'll check your patch. > > This approach introduces one limitation though: if the USING INDEX clause is > > specified, an explicit sort is always used. Index scan wouldn't work because > > it does not return the tuples sorted by CTID. > > Technically we may just use keys (if they are comparable) as a way to specify regions. Instead of number of pages to switch snapshot - > number of tuples or time. > But because we don't know the region end in advance - we have to keep all the changes in file and filter only while applying. Good idea. Unfortunately it questions your proposal to filter the changes before writing, as suggested above. > > range_end = repack_blocks_per_snapshot; > Should be repack_blocks_per_snapshot + ctx->first_block ? Indeed, I also failed to avoid assuming first_block==0 :-) > > * XXX It might be worth Assert(CatalogSnapshot == NULL) > > * here, however that symbol is not external. > As said above - better assert for MyProc->xmin/xid + add InvalidateCatalogSnapshot. I proposed the Assert above, but still thinking about it. > > Is REPACKED (CONCURRENTLY) is being run by this backend? > REPACK and double "is" Other comments accepted. Thanks. -- Antonin Houska Web: https://www.cybertec-postgresql.com --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename=v31-0001-Add-REPACK-command.patch Content-Transfer-Encoding: quoted-printable