Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vhahS-001sIz-2F for pgsql-hackers@arkaria.postgresql.org; Sun, 18 Jan 2026 21:52:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vhahP-00Av2X-2K for pgsql-hackers@arkaria.postgresql.org; Sun, 18 Jan 2026 21:52:44 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vhahP-00Av2O-11 for pgsql-hackers@lists.postgresql.org; Sun, 18 Jan 2026 21:52:43 +0000 Received: from mail-lj1-x22d.google.com ([2a00:1450:4864:20::22d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vhahM-001EHl-2q for pgsql-hackers@lists.postgresql.org; Sun, 18 Jan 2026 21:52:43 +0000 Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-382fa0dc9f4so40321621fa.3 for ; Sun, 18 Jan 2026 13:52:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1768773155; cv=none; d=google.com; s=arc-20240605; b=dqoRNkAjnIhU2lD4X7EoppvWVC5tpeWjlpxAPMnrQTpLQHFrXBgQVQLR/H1wIjMKE0 yqm9A1uDxOqBOftz7GfAQAFeCkv8tclBz9dsr5TCaLxMhELTWBtugcQjKgdakyRW+7hy hAmjgdruC+r9WKeccLCp1lY6B13EXno/sbMnLgw+DDGpxS2MN750Oy8gFIfVQAMGBXWX jgEFDQKuvfk+fnl3bUR5spBXNErgZyPHmDBDSGgGdIawH9ZaALxZxMu5BRU0HJpY9jgD 2JbsZ/pPxH19K93lLWcwWNHK1qgALBM4BUGgX35tsf1vUCDfpPdoXD1cNaTtOaD8JAVN TCeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=1OiIN1B8SRGTzphqOQ8Nmcf2kXSqZ4T5IUtCUOI2/is=; fh=FEpUEBAFG7bkBSBpPNOLl9B46FaTpYS1E8Sfmm0pLb8=; b=a7Gus3TCGyFo+WyIqVgFdq10toEjJ7XfmeN5tXmjlNyexbeiwB63SmuNkpH1d2ztnD UndNQ0rWAK32ZM8rl6q6QKxJ4oN9P2PISPbOuAFrp9JLEpZDvjfqdfmUMPnMFtr9jyaS ZnoXECOl9xoPvCN6GLOdsY2D2MGvAtobVW2UX14P5Bgfx77/out7ciZAAVE4KyBFNJ03 kRZhzV91PSezbfEQp1KOH03lzBrrpJKyjOp2D+brB1IMD3nyy+rizclg0z4OGv80t3s7 Pczuy5wo83qhzLr2qIVu7MwObfr0Tm8IeTYL/+LYjIe8yisfOvjlKec5CDXrk3tDZNGh WEgw==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768773155; x=1769377955; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=1OiIN1B8SRGTzphqOQ8Nmcf2kXSqZ4T5IUtCUOI2/is=; b=Ifu43DkNO/cUcgbqtCagjmAnOTpdJjRer40tdJcbvEq7wHforAmvw5XyJ9hesH04sk HlORNEyu4AQFIbPcueFeVeuBl/5yKQfF9prK025sgSy5hcx5wgv/GilDPB1xudESh3xQ V4p7wg8KFI6kU41AGo/1qD4wFN4RuJpH+J0uTy4fdZWE5cVVRXOVqycG3unUsN53eJeD H2Hcpdbi6c7B/rc2ChSEnlRvFme3iIYJgYMRGQqOdt5fVyHRHd5X8m6DCwtkGEhWXans iraLLfVHvC566X/Nk/9MAeBYNgfKreyiK3f40mAsuYs8eEoEF2+hKe4KeFpE7RTx6W03 qTdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768773155; x=1769377955; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1OiIN1B8SRGTzphqOQ8Nmcf2kXSqZ4T5IUtCUOI2/is=; b=cu9t9mxR/PgSUtFLwkk1NohD6z+7kU0OX3lW4pwAEr9UdjRw5Im6FKa+Fij44U6lOn 2f46LBZ8mtFOriWjFTubT0YIkdz1lJqT/rx91BdChHG+TA/w7vQrUTSljr4uDChmmMSO OGSYvmrmEIX3eJz5VPkVhjwtawyhpO4gjU5M5NGBF4x6EfkwTCcbAZ7y/uS091ow3oUh ZK7E5VQNlsAGwKuh4EHSGws+LqK23Z7mSwtmOio8BVRH0YrqHsEKfANLSXMZx6lzKeA7 URkqq3kJiGuP2UgCqF+lwEhMTd4epiCNEIZao6pAhQLRWdHLCFp5p/RZS+xRxiupfHCM jKaw== X-Forwarded-Encrypted: i=1; AJvYcCU4z/Sogef36vVPbBlhhpBtX5yOityLCDLnBflmJr1cz7SUUIIOgqJfHxXW6Fq/LBhSZWzY42pLbfctoGFU@lists.postgresql.org X-Gm-Message-State: AOJu0YyTb5t65Rn/OH2UEnIZW4OHVrV6jaKP2eCY/bFqrlixpR2hhGL8 Bek6d2PQ1mddmlDh9ddDEE018MFyMHcam4UXbdiyApHP9Dp/NVwqBL2A2hq3Hzjagqg3jjDuFJ7 Xp7jcC+0QyfrDr0r0dNw4BRQMbGQdnsWhStfpg/9Utit3 X-Gm-Gg: AY/fxX6qKXkdMxNPxN/v0wk51Pr+vXtdZ4Kdyd73oXitmXcG4ua4EMOXAwYcOOxXR62 RYhOe2Y/bo510PcG+ervAmCjoedGeknIjRB+C8xCFTDV0vhP+/+gk8Rs5YR0j4n02KFsBUNhWti ESBSXe25IlziZvjXSAkxSAzOrpJ5DqrObI+nexiMUQfWRASL1N899b1887YTUC+7j6kT9Jbc00D XJ8X56+urwcoJsSJaOee7UEsMgK8zfVMYYUkcaYwDKUlKlO/mjv8iWyl24pYbuaihLf9FFcdtMf 4d2zAhDVqbTX5z0Pug12x1DlS4qL2EW6xxaMHvOILBxNw2H/Qk9dFKlkaEoLiTUcpA== X-Received: by 2002:a05:651c:19a1:b0:383:10dd:b2ba with SMTP id 38308e7fff4ca-383841ab39dmr36039871fa.11.1768773154202; Sun, 18 Jan 2026 13:52:34 -0800 (PST) MIME-Version: 1.0 References: <202512151349.vlq3mpfniyk3@alvherre.pgsql> <11247.1767609087@localhost> <11558.1767609632@localhost> <141054.1767891540@localhost> <137668.1768235610@localhost> In-Reply-To: From: Mihail Nikalayeu Date: Mon, 19 Jan 2026 00:52:00 +0300 X-Gm-Features: AZwV_Qi5t7MyK2Hx6Fm5YHa2wT96nZQWDc2Dl9v9gszzaQypLPtvNsKKkGYbrkM Message-ID: Subject: Re: Adding REPACK [concurrently] To: Antonin Houska Cc: Alvaro Herrera , Pg Hackers , Robert Treat Content-Type: multipart/alternative; boundary="0000000000008b66de0648b09782" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000008b66de0648b09782 Content-Type: text/plain; charset="UTF-8" Hello, Antonin! Some comments for 0006: > SnapBuildSnapshotForRepack(SnapBuild *builder) Does it also "replays" previously processed WAL to the position that snapshot is ready to use? I am afraid we may see some non-yet processed parts of WAL leading to duplicate insertion. > first_block What is the reason for that variable? It seems to be always the first block of relation. Also, what if we have a huge amount of empty space at the start. In that case the first block will be the block of the first "filled" page. But insert may (and will) fill empty pages before first_block - out of the range. So, I think we should delete it and always use 0 instead. > DecodeMultiInsert ItemPointerSet-related logic seems missing. > tableScan = table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL); With SnapshotAny we are going to check the same tuple multiple times. Better to let scan logic handle it (and change snapshots used by scan code). See [0] for a way to reset snapshots during the scan. > if (blkno >= range_end) I don't think it is legal to switch a snapshot while holding the tuple. Nothing is protecting it from being pruned\reused. Snapshots need to be switched "between" pages. You may check how it is done at [0]. > PopActiveSnapshot(); > InvalidateCatalogSnapshot(); I think it is a good idea to add here assert for MyProc->xmin and MyProc->xid to be invalid. To ensure we really allow the horizon to advance. > /* Set to request a snapshot. */ > bool snapshot_requested; We know the end or region in advance, so it should be possible to filter before writing changes to file. So, it is some kind of "this is the end of region, create new file and store everything before + create new snapshot for me". > PopActiveSnapshot(); Sometimes without InvalidateCatalogSnapshot(). > PushActiveSnapshot(GetTransactionSnapshot()); GetLatestSnapshot() feels better here. > * The individual builds might still be a problem, but that's a > * separate issue. Opening the index may create a catalog snapshot, so it needs to be invalidated after. > * TODO Can we somehow use the fact that the new heap is not yet > * visible to other transaction, and thus cannot be vacuumed? Snapshot resetting [0] may work here (without CIC, just as part of the scan + some code to ensure catalog snapshot is managed correctly). Also, to correctly build a unique index - some tech from [0] is required (building a unique index with multiple snapshots is a little bit tricky). Or we may implement some super lightweight way - just SnapshotAny without any visibility checks (just assume everything is ok since it copied from another relation with the same index set). > This approach introduces one limitation though: if the USING INDEX clause is > specified, an explicit sort is always used. Index scan wouldn't work because > it does not return the tuples sorted by CTID. Technically we may just use keys (if they are comparable) as a way to specify regions. Instead of number of pages to switch snapshot - number of tuples or time. But because we don't know the region end in advance - we have to keep all the changes in file and filter only while applying. > Assert(XLogRecPtrIsInvalid(shared->lsn_upto)); > /* Initially we're expected to provide a snapshot and only that. */ > Assert(shared->snapshot_requested && > XLogRecPtrIsInvalid(shared->lsn_upto)); XLogRecPtrIsInvalid(shared->lsn_upto) assertion is duplicated here. > range_end = repack_blocks_per_snapshot; Should be repack_blocks_per_snapshot + ctx->first_block ? (but better to remove the first_block at all). > * XXX It might be worth Assert(CatalogSnapshot == NULL) > * here, however that symbol is not external. As said above - better assert for MyProc->xmin/xid + add InvalidateCatalogSnapshot. > extern Snapshot > extern void Some "externs" used in C files (not headers). > ctx->block_ranges = NIL; may be used with list_free? > * Remember here to which pages should applied to changes recorded in given > * file. "should apply" > of pages, so that VACUUM does not get block for too long "blocked" > there are no restriction on block "there is" or "restrictions" > repack_add_block_range extra new-line after function. > Is REPACKED (CONCURRENTLY) is being run by this backend? REPACK and double "is" [0]: https://commitfest.postgresql.org/patch/6401/ --0000000000008b66de0648b09782 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello, Antonin!

Some comments for 0006:=

> SnapBuildSnapshotForRepack(SnapBuild *builde= r)
Does it also "replays" previously processed WAL to t= he position that snapshot is ready to use?
I am afraid we may see= some non-yet processed parts of WAL leading to duplicate insertion.

>=C2=A0first_block
What is the reason = for that variable? It seems to be always the first block of relation.=C2=A0=
Also, what if we have a huge amount of empty space at the start.= In that case the first block will be the block of the first "filled&q= uot; page. But insert may (and will) fill empty pages before first_block - = out of the range.
So, I think we should delete it and always use = 0 instead.

>=C2=A0DecodeMultiInsert
<= div>ItemPointerSet-related logic seems missing.

<= div>> tableScan =3D table_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) N= ULL);
With SnapshotAny we are going to check the same tuple multiple t= imes. Better to let scan logic handle it (and change snapshots used by scan= code).
See [0] for a way to reset snapshots during the scan.
<= div>
>=C2=A0if (blkno >=3D range_end)
I don&#= 39;t think it is legal to switch a snapshot while holding the tuple. Nothin= g is protecting it from being pruned\reused.
Snapshots need to be= switched "between" pages. You may check how it is done at [0].

> PopActiveSnapshot();
> InvalidateCatalog= Snapshot();
I think it is a good idea to add here=C2=A0assert for= MyProc->xmin and MyProc->xid to be invalid. To ensure we really allo= w the horizon to advance.

> /* Set to request a= snapshot. */
> bool snapshot_requested;
We know the end= or region in advance, so it should be possible to filter before writing ch= anges to file.
So, it is some kind of "this is the end of re= gion, create new file and store everything before=C2=A0+ create new snapsho= t for me".

>=C2=A0PopActiveSnapshot();
=
Sometimes=C2=A0without InvalidateCatalogSnapshot().

> PushActiveSnapshot(GetTransactionSnapshot());
GetLatestSna= pshot() feels better here.

> * The individual b= uilds might still be a problem, but that's a
> * separate issue.<= /div>
Opening the index may create a catalog snapshot, so it needs to b= e invalidated after.

> * TODO Can we somehow us= e the fact that the new heap is not yet
> * visible to other t= ransaction, and thus cannot be vacuumed?
Snapshot resetting [0] m= ay work here (without CIC, just as part of the scan + some code to ensure c= atalog snapshot is managed correctly).
Also, to correctly build a= unique index - some tech from [0] is required (building a unique index wit= h multiple snapshots is a little bit tricky).
Or we may implement= some super lightweight way - just SnapshotAny without any visibility check= s (just assume everything is ok since it copied from another relation with = the same index set).

> This approach introduces= one limitation though: if the USING INDEX clause is
> specified, an = explicit sort is always used. Index scan wouldn't work because
> = it does not return the tuples sorted by CTID.

Tech= nically we may just use keys (if they are comparable) as a way to specify r= egions. Instead of number of pages to switch snapshot - number of tuples or= time.
But because we don't know the region end in advance - = we have to keep=C2=A0all the changes in file and filter only while applying= .

> Assert(XLogRecPtrIsInvalid(shared->lsn_upt= o));
>=C2=A0 /* Initially we're expected to provide a snapshot an= d only that. */
>=C2=A0 Assert(shared->snapshot_requested &&am= p;
>=C2=A0=C2=A0=C2=A0 XLogRecPtrIsInvalid(shared->lsn_upto));

XLogRecPtrIsInvalid(shared->lsn_upto) assertion=C2= =A0is duplicated here.

>=C2=A0range_end =3D= repack_blocks_per_snapshot;
Should be repack_blocks_per_snapshot= =C2=A0+=C2=A0ctx->first_block ? (but better to remove the first_block at= all).

> * XXX It might be worth Assert(Catalog= Snapshot =3D=3D NULL)
> * here, however that symbol is not external.<= br>
As said above - better assert for MyProc->xmin/xid=C2=A0+ = add InvalidateCatalogSnapshot.

>=C2=A0extern Sn= apshot
> extern void

Some "exter= ns" used in C files (not headers).

> ctx-&= gt;block_ranges =3D NIL;
may be used with list_free?

=
>=C2=A0 * Remember here to which pages should applied to chan= ges recorded in given
>=C2=A0 * file.
"should apply"<= /div>

>=C2=A0=C2=A0of pages, so that VACUUM does not = get block for too long
"blocked"

>=C2=A0there are no restriction on block
"there is&quo= t; or "restrictions"

>=C2=A0repack_ad= d_block_range
extra new-line after function.

=
>=C2=A0Is REPACKED (CONCURRENTLY) is being run by this backend?
REPACK and double "is"


--0000000000008b66de0648b09782--