Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ur4Bl-0004hW-K8 for pgsql-hackers@arkaria.postgresql.org; Wed, 27 Aug 2025 00:38:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ur4Bk-00BYHG-QI for pgsql-hackers@arkaria.postgresql.org; Wed, 27 Aug 2025 00:38:57 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ur4Bk-00BYH6-Dn for pgsql-hackers@lists.postgresql.org; Wed, 27 Aug 2025 00:38:57 +0000 Received: from mail-vk1-xa35.google.com ([2607:f8b0:4864:20::a35]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ur4Bi-001vhB-0Y for pgsql-hackers@lists.postgresql.org; Wed, 27 Aug 2025 00:38:55 +0000 Received: by mail-vk1-xa35.google.com with SMTP id 71dfb90a1353d-54003fb8be9so745153e0c.1 for ; Tue, 26 Aug 2025 17:38:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756255133; x=1756859933; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/0+EpCS6k0L+3hXcz52Y3lGpakTHL1RXnPAmT3q/v5Y=; b=RyBszpaGz+dh521MRu1xu4e1MImLwbuC+iACyL00aKRlaBKSm7gF0XHK5wALrSzJfH 5WzEJBvcbTCMNGxXtHg6wjuy3YO7jU7Z0WJx+iW3IUMkop0oOk8Rucye0RRl6KVe9TZy cPykAs7VG8B+kP5oCPBoMx6gLLFq1/Q2385J1r8oxzn6YOAgre0AUT6oTvYhbrQ+vub/ JAQ4GsZZgdfag3ppiax3OHJS4uNmVB+y/OkudqdY0vGIYzdR5Sf8opUal7HIkP797Wtq d++Ys+xWiZL5IPGNN+GXxkBc5rCo+We1+ZcfbMfU21AqpdAOXFfNzyCnfhDuk84riAem 85cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756255133; x=1756859933; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/0+EpCS6k0L+3hXcz52Y3lGpakTHL1RXnPAmT3q/v5Y=; b=Z6GvtFV+WJKOkOyE6Xy3PdC/xvADVtQTdgNd1M/SNs18jT+BzQ3wp6vDoWhdb1LOZ4 zIoVZplrm3ET2L/hUMzQ0a9T5b7Wlh+C2TUr2+w1zLKBqQoaaSfrp6i9kLj/+0w/Mjlc O/GMMT3Q3QmnZ7+V3Vz7Y5h+C/AvQZO4TPauSdtSUTgl5TaiUjYyNGkgkpeXN+Jm4sR/ mXMji7E83jZTc1UF6ljosQagB3zX94+J/EcTJugLSdlFhyl4qCJ98xop7GiUJRaIvbF3 1X47nX5BXAC3b9VxIwP7ZD33+Ga4Ol2nAx8khecoxpGUMB/ErJP9ZFV6tNt8czG1wAOE 0HYQ== X-Forwarded-Encrypted: i=1; AJvYcCWj1geu7KcNbJPF/cPAojPUr42JkWTVJ3/rXwhX0p1vH49U4IREP8/L1T+bARKvDRnUi6rKodGN31Ub8Vif@lists.postgresql.org X-Gm-Message-State: AOJu0YxBuaI+N/2C0dk+InIq/tPw8f25Dzn7ovBB+bQuBkDoRtYwtgns 5byoXjyymfpNgvfMnyQb4KSeyoArhq0pfOielcNesj9r6SJ9OPxPUSAz/sd6t7+XpxUsrx0s/GZ AmrYIugDyoCJfyM9/sHRCOyxcAKkna0g= X-Gm-Gg: ASbGnctrta5kDcXKaXuv6GmsiPOhCHc8DWGJPSCZBsGUnqJTrg9vWSeLqurjwKDxjFv Kg11H9Ja92NSOqc1iwIN/eJQ+Swtoo6PpogbNtRauOvi8F3Vqooa2Kpl7Mds4YAOM2paQlU99W+ YNbCEY7MZ0UEVJOV+JYS9bWoBqQ2/Cvsb68pwW98SC/gLPHCUosRgjuWpzMsWHJYu/4Q2YlqZQ9 9LZ8X4= X-Google-Smtp-Source: AGHT+IGsbxq2989pSmNBZ/lNhoJsSBakEE+e2avCP3xNpZWgrgkBFNh/lkZnY6fuG7d3WVE3LR0+OzWnLWFDwpfPZnw= X-Received: by 2002:a05:6122:1313:b0:539:5d6e:71fc with SMTP id 71dfb90a1353d-53c8a2edfb9mr4955380e0c.5.1756255133584; Tue, 26 Aug 2025 17:38:53 -0700 (PDT) MIME-Version: 1.0 References: <202508091333.qvgvo7ikuezm@alvherre.pgsql> <40729.1755799624@localhost> <9536.1756127358@localhost> <21931.1756136535@localhost> <24483.1756142534@localhost> <4790.1756197960@localhost> <29527.1756215093@localhost> In-Reply-To: <29527.1756215093@localhost> From: Mihail Nikalayeu Date: Wed, 27 Aug 2025 02:38:00 +0200 X-Gm-Features: Ac12FXwqYeifxgeJG_fLmZTF9iYJo8mCWl1I7J61S55cc-31L3Q26G3YwFKI06Y Message-ID: Subject: Re: Adding REPACK [concurrently] To: Antonin Houska Cc: Alvaro Herrera , Fujii Masao , Robert Treat , Pg Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello, Antonin! Antonin Houska : > > Where exactly should HeapTupleSatisfiesDirty() conclude that the tuple is > visible? TransactionIdIsCurrentTransactionId() will not do w/o the > modifications that you proposed earlier [1] and TransactionIdIsInProgress= () is > not suitable as I explained in [2]. HeapTupleSatisfiesDirty is designed to respect even not-yet-committed transactions and provides additional related information. else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple))) { /* * Return the speculative token to caller. Caller can worry about * xmax, since it requires a conclusively locked row version, and * a concurrent update to this tuple is a conflict of its * purposes. */ if (HeapTupleHeaderIsSpeculative(tuple)) { snapshot->speculativeToken =3D HeapTupleHeaderGetSpeculativeToken(tuple); Assert(snapshot->speculativeToken !=3D 0); } snapshot->xmin =3D HeapTupleHeaderGetRawXmin(tuple); /* XXX shouldn't we fall through to look at xmax? */ return true; /* in insertion by other */ } So, it returns true when TransactionIdIsInProgress is true. However, that alone is not sufficient to trust the result in the common cas= e. You may check check_exclusion_or_unique_constraint or RelationFindReplTupleByIndex for that pattern: if xmin is set in the snapshot (a special hack in SnapshotDirty to provide additional information from the check), we wait for the ongoing transaction (or one that is actually committed but not yet properly reflected in the proc array), and then retry the entire tuple search. So, the race condition you explained in [2] will be resolved by a retry, and the changes to TransactionIdIsInProgress described in [1] are not necessary. > I understand your idea that there are no transaction aborts in the new ta= ble, > which makes things simpler. I cannot judge if it's worth inventing a new = kind > of snapshot. Anyway, I think you'd then also need to hack > HeapTupleSatisfiesUpdate(). Isn't that too invasive? It seems that HeapTupleSatisfiesUpdate is also fine as it currently exists (we'll see the committed version after retry).. The solution appears to be non-invasive: * uses the existing snapshot type * follows the existing usage pattern * leaves TransactionIdIsInProgress and HeapTupleSatisfiesUpdate unchanged The main change is that xmin/xmax values are forced from the arguments - but that seems unavoidable in any case. I'll try to make some kind of prototype this weekend + cover race condition you mentioned in specs. Maybe some corner cases will appear. By the way, there's one more optimization we could apply in both MVCC-safe and non-MVCC-safe cases: setting the HEAP_XMIN_COMMITTED / HEAP_XMAX_COMMITTED bit in the new table: * in the MVCC-safe approach, the transaction is already committed. * in the non-MVCC-safe case, it isn=E2=80=99t committed yet - but no one wi= ll examine that bit before it commits (though this approach does feel more fragile). This could help avoid potential storms of full-page writes caused by SetHintBit after the table switch. Best regards, Mikhail.