Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMi00-000Hi9-1T for pgsql-hackers@arkaria.postgresql.org; Tue, 12 May 2026 07:57:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wMhzu-003oTW-0M for pgsql-hackers@arkaria.postgresql.org; Tue, 12 May 2026 07:57:46 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wMhzt-003oTO-2G for pgsql-hackers@lists.postgresql.org; Tue, 12 May 2026 07:57:45 +0000 Received: from mail-lj1-x232.google.com ([2a00:1450:4864:20::232]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wMhzr-00000000Ady-07So for pgsql-hackers@lists.postgresql.org; Tue, 12 May 2026 07:57:44 +0000 Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-393da8f389bso52212531fa.1 for ; Tue, 12 May 2026 00:57:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1778572662; cv=none; d=google.com; s=arc-20240605; b=cKeAjtHKlEiVMqOSRBkF6P10qt+Nr38yjte7b+NLvEV3F0irlkC4bYOG/19uFTD3vI btbf0yE/OCnwZR3PQcNdHmoLHkGC3dpXH706lSSqVjqNdjUk7htA+6C57rebQWtk9wY7 QM4itIp9NAeO6suLSsx4HUECDMFcxkFtpHduo4dLChOB3GCqglSBv4h8dX6GYixkWIu0 ACUfk/I/9yo4ykuu7eopioeMuCtJs6nHNWuLx1isDbtDv4tFKHX525pKML7TbckQva5S lSEGDky4vsVqdioKiLaYlaRH+RWbZAYZppbHQxwYVIIURNXV78KBzg5U7izE5MNg9bNA JISg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oEHjGTUlpLcJFvQpjSHOi/JhaGpZTZBh/dYePwf9GT4=; fh=B8npiyZ8XGmY/fg2z1Uv4s7D/t0PTvn4hsUhdXtLRk4=; b=KAl39Apn8dVy96oDfdk2ps+vgdFSuxV2dPwSYg0lGRuYEhkuszfocuFTTIlKE1JY7U U6BKrp9s7cjwcQRaJPE76B+Wgvcaf6/n64eMbU6N4lm4iDo5JiyFLvI829lEHxPWBn5S nPccA1aAFE6qz/7Fss2lITr8SuoXT8aI+Zd2bxxjvI9+LDi2xgcWu0P5+1KVG6T54OKn UBN/Zi+pKG4nok8o5u6QK5TrmbwRSFvFBP3PJD43dOPttIyKfxN3gMbOWHDgutL2EAb+ yYVqX/TQYqSpfpH3Ca7csp+QyrMWNUmo214jDYgScJD5Shu2PMPTPdFHFQkftpTCzJxs D30Q==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778572662; x=1779177462; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oEHjGTUlpLcJFvQpjSHOi/JhaGpZTZBh/dYePwf9GT4=; b=N+rdrAJrwtyMZ3Pd+4mxoqGa3hUjARzRxrK5psvmmYFDpBjliZvJ2Teuy95+LWXFKm d+WHLvamU0dw5A8X+wiUTkIXEDn/+pMJP2vyamzUH1i3B/uGZFwcJWjChGVtjXNfbWdA 39PGme2qNRVlWWJlCrUUESj7xTCOInVOTcsC6JRuUxGyuDOWPLbp3JgJveoa/juI0t4T 0ZYWT3hhrwVgdE0zjKZxDVPjRVCf3x2nGulPSCkHHFpigDfFoMlqerQwaGVWR+S2rG32 RHu4praTuu0EjG0k94uJRpuI4Xht0KNLv7+RWCNqVLVeQbiW/VGzV5Ze+TvEupccGvdS +4lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778572662; x=1779177462; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=oEHjGTUlpLcJFvQpjSHOi/JhaGpZTZBh/dYePwf9GT4=; b=nYLHnn1+o4nRTmL7SNztXTgQd84MY10JKKu0CIaBeIGctONlNWKGt844SzaP5H7KWz oVlgY9ObBv24hgF9p6WDIfdUWeb0VlgiWdDpRnLn3K7R+XKNtESg0MTUxtpWjnB043NN +CnfchDyGEXHsSU5KCL3npjcGBtsU4nQ88Zoq/iqwvfpN+lSxyPTx4N/Vv/vCuvR4fCS URZa8wEP2HUjRN+Gn55BkBKDbHFYahqeL5DZ/rENVzkpK3PfOwO8umKhPBMKxp/z4524 zOcsqpjMtXLHA7jlGnqP2iRq1nyTkhdoY2jUWQfJ1qBk8Segc2JlQvwM6YQ0OhedPN0D aElw== X-Forwarded-Encrypted: i=1; AFNElJ8zxVR72W/Vw+KhmC+DU5s4Nf6q1QMNuQZszNTgWk05O/Y390U1ixxDujoLkCPyyR4KwdmFPS/JXUvIyhRZ@lists.postgresql.org X-Gm-Message-State: AOJu0YxquwZc0fOE7p4wUuSpOMwRzK1lFsWsglR3txpfHvbruQ/ZfHWH Qk86qOhc1c0+znD9I1V0HZczBPkaoWtwaDpmcnCSotQaNsY4vInicjdPZpfNUgqJPOp1Xqq+BTU NfommL2ouYYU+Z1vLUIeiCog3+RnVDJA= X-Gm-Gg: Acq92OHjS9g4r520tlLJ10+1vK8VtSvdM4D4lFEJe6HpKXK4jwAu+QgwDVdDRbMDoaQ 3MDRrYJjewkwjIvxL8gvAqIUJiQ2ZjlfZSJaumF9+1TvKVfRImw0C6x9KWylbL3Vd3XeuZ5LR3k JTewWcMR2GHlsUwXvXAoVD21R7Ah2vEIvjv+QTAGfUsk9LYPaRustaVwwndgr4MjfdYV9bHugcs p7IV0BAWGLMtTFmHRBokzYL1/7Tu+q2byBq8FAjsCGU8jg/qan5xCrC9mawdyeUbBzW8nki0jwd FQliEcLOJl8bydMZ6rkSd8c+4KeNXGsT4M+93LLuHuIikDlu6dfqKmlHXltkNufIbG7uxikJbRg 2hUo5svSSZIXkpLTf5A== X-Received: by 2002:a2e:a546:0:b0:38c:3410:5539 with SMTP id 38308e7fff4ca-3940800458bmr43663091fa.6.1778572661408; Tue, 12 May 2026 00:57:41 -0700 (PDT) MIME-Version: 1.0 References: <202604071230.b5axxf3qna3m@alvherre.pgsql> <227677.1775576304@localhost> <85813.1777901089@localhost> <27869.1777985266@localhost> <70574.1778512672@localhost> <82942.1778527801@localhost> In-Reply-To: <82942.1778527801@localhost> From: Amit Kapila Date: Tue, 12 May 2026 13:27:29 +0530 X-Gm-Features: AVHnY4LHZ1qz5XfKJKJoYygVcCUGWZzk_ZN0hNDCZZC1tKF9hqHhJfaQ070TVIc Message-ID: Subject: Re: Adding REPACK [concurrently] To: Antonin Houska Cc: Mihail Nikalayeu , Andres Freund , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, May 12, 2026 at 1:00=E2=80=AFAM Antonin Houska wro= te: > > Antonin Houska wrote: > > > Amit Kapila wrote: > > > > > On Tue, May 5, 2026 at 6:17=E2=80=AFPM Antonin Houska wrote: > > > > > > > > Antonin Houska wrote: > > > > > > > > I think the problem is that with database-specific snapshot, > > > > SnapBuildProcessRunningXacts() returns early, w/o adjusting builder= ->xmin > > > > > > > > /* > > > > * Database specific transaction info may exist to reach CO= NSISTENT state > > > > * faster, however the code below makes no use of it. Moreo= ver, such > > > > * record might cause problems because the following normal= (cluster-wide) > > > > * record can have lower value of oldestRunningXid. In that= case, let's > > > > * wait with the cleanup for the next regular cluster-wide = record. > > > > */ > > > > if (OidIsValid(running->dbid)) > > > > return; > > > > > > > > and thus some transactions whose XID is below running->oldestRunnin= gXid may > > > > continue to be incorrectly considered running. > > > > > > > > I originally thought that this should not happen because such trans= actions > > > > will be added to the builder's array of committed transactions by > > > > SnapBuildCommitTxn() anyway. However, I failed to notice that COMMI= T record of > > > > a transaction listed in the xl_running_xacts WAL record is not guar= anteed to > > > > follow the xl_running_xacts record in WAL. In other words, even if > > > > xl_running_xacts is created before a COMMIT record of the contained > > > > transaction, it may end up at higher LSN in WAL. So the cleanup I r= elied on > > > > might not take place. > > > > > > > > > > BTW, is it possible to write a test by using injection_points or via > > > manual steps (by using debugger, etc) so that we can more clearly > > > understand this problem and proposed fix? > > > > So far I could observe the situation in WAL, but have no idea how it ca= n > > happen. For example, transaction 49242 gets committed here > > > > rmgr: Transaction len (rec/tot): 46/ 46, tx: 49242, lsn: 0/18BC28C8, pr= ev > > 0/18BC2890, desc: COMMIT 2026-05-11 16:38:16.603265 CEST > > > > and then it appears in the 'xids' list of RUNNING_XACTS: > > > > rmgr: Standby len (rec/tot): 106/ 106, tx: 0, lsn: > > 0/18BC3140, prev 0/18BC3100, desc: RUNNING_XACTS nextXid 49255 > > latestCompletedXid 49241 oldestRunningXid 49242; 13 xacts: 49248 49249 = 49246 > > 49243 49252 49251 49244 49245 49242 49250 49253 49254 49247; dbid:5 > > > > > > I thought the situation is quite common (and therefore nothing of > > SnapBuildProcessRunningXacts() should be skipped), but when trying to > > reproduce the problem, I noticed that LogStandbySnapshot() shouldn't al= low > > that ordering issue when logical decoding is enabled: > > > > /* > > * GetRunningTransactionData() acquired ProcArrayLock, we must re= lease it. > > * For Hot Standby this can be done before inserting the WAL reco= rd > > * because ProcArrayApplyRecoveryInfo() rechecks the commit statu= s using > > * the clog. For logical decoding, though, the lock can't be rele= ased > > * early because the clog might be "in the future" from the POV o= f the > > * historic snapshot. This would allow for situations where we're= waiting > > * for the end of a transaction listed in the xl_running_xacts re= cord > > * which, according to the WAL, has committed before the xl_runni= ng_xacts > > * record. Fortunately this routine isn't executed frequently, an= d it's > > * only a shared lock. > > */ > > if (!logical_decoding_enabled) > > LWLockRelease(ProcArrayLock); > > > > So I don't have the answer right now. > > I think now that "waiting for the end of a transaction listed in the > xl_running_xacts record" in the comment is about transaction removal from > procarray. However, the COMMIT record can still be ahead of xl_running_xa= cts > because RecordTransactionCommit() is called before > ProcArrayEndTransaction(). > I see your point. Due to this, once the xmin regresses based on cluster-wide running_xact, some transaction could appear to be running when it should have appeared as committed. However, still it is not clear how it could lead to one update in the transaction as successfully decoded and another one to be skipped. One theory could be that before the second update, somehow invalidation happens and when decoding tries to reload the catalog to decode second update, the relation is not visible because xmin has regressed and the update is somehow skipped. I can't see how it can happen in code but something like that is happening. Assuming, the problematic case is something like what I described, even than the fix of skipping cluster-wide running xacts and instead LOG db-specific running_xacts to help updating builder's xmin sounds inelegant and probably inefficient. For example, I think such a dependency means we can never enable db-specific snapshots on standby. --=20 With Regards, Amit Kapila.