Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wLKGm-001htY-2b for pgsql-hackers@arkaria.postgresql.org; Fri, 08 May 2026 12:25:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wLKGl-009PmA-2F for pgsql-hackers@arkaria.postgresql.org; Fri, 08 May 2026 12:25:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wLKGl-009Pm2-1F for pgsql-hackers@lists.postgresql.org; Fri, 08 May 2026 12:25:27 +0000 Received: from mail-lj1-x22e.google.com ([2a00:1450:4864:20::22e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wLKGj-00000001Dwb-1QRN for pgsql-hackers@lists.postgresql.org; Fri, 08 May 2026 12:25:27 +0000 Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-38e7c3a2deaso15627531fa.2 for ; Fri, 08 May 2026 05:25:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1778243124; cv=none; d=google.com; s=arc-20240605; b=Em6Voraq1nWrLJD4/3+ptuAIEBfDgKnyGU0uRNga75sP2esjEFFbxCyPIT7H2pX2hM i4OEP8PwK49l0m+QA7z76k4/EIiLRG6xzmbFvabBC1nouBJVsugub07FS6s3qJxuSff2 nHtSw2rLQo/Hy7ksC1kB1lh6vFG26/JBsaKjyXBAmE7D04pm86NOQWf5YkQIpZ53gw8G 4SL4tIgeWgnyM5SX1rO0m2wSfluk8CSZTiFVj1CwT2Z/yFpiU2lGQvm5lBJFG3Udx4yQ qcgx0RIAfzxV1KemjfMAe8QXmMCf8McmkwAPY/GW97bJj7mUrUeXTooNbTeFvkI0Mgmv TjkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=Jg8W54o43SgUwRqhSaFB1TKxEjYsZmQJe+iCbc0MoK8=; fh=2dhuwoe/vQqktCkAq9sCD6c+ll6xfLUtdNqKC9jfQXU=; b=OJ4DyjwYB4X4SDA0Bk8a1zHGsFZDkV6UcNjBnh8N/cg9LXP51f1H4TGyBwu4uWjMgV 89cGlhD21cf0ZZNO3SrcAlaR7nab/O8oC/S4NHuRFvbK/3+TiDThf+dKWuMi4hPU3Kbf 4ZEkFH9QKzxISGQPxixaDMpkGIIy9qV0KJzY4Pb0FCgZQigm9lq7LCuiEHpf+xIIKopx 9nkRDFqeP5urZsVjUfiiyGQB5iMMBOr2eXbNg3S/hEUfqXqL/fYw0tdVOsTjZ/L9Y79q 6iIcKhgf3fjDmAPm73mlyk53P3DPtEO5hO8PlJVHzXUxNQt/EaSiu3zw4ApxVd+MPlSm wh4g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778243124; x=1778847924; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Jg8W54o43SgUwRqhSaFB1TKxEjYsZmQJe+iCbc0MoK8=; b=GnS6xyDp0XatM5qPF4V9JJpqtUvNRC0DX95naYCvBzjy2nTdnp05fRIPAT4gRNwRQP mzGRJqVzAhZk7t0GgA0hJjMVESS1pvLxSZNU8pszKqc9uwCWvhfnasqFzfoJwUaJMCge SQCR9ZNO364VfbFV0311I/EM0VRcd5DiPjfwu7Gg2za4e/feT/IxFzFbI8/QKv6Z4ZJ0 ckhxrJRQM/Bev0/hYeqSYi3r4ZX6ZBs7E0QX3ACa47J31w3PTOWthKQXNsxNHayEvieE TW01PSCK2Nefoj613evbPqYqvWWCFkIre0dTHny4zYBGDnyZpPH2wi1MbZ3sVb1kbDD7 KsIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778243124; x=1778847924; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Jg8W54o43SgUwRqhSaFB1TKxEjYsZmQJe+iCbc0MoK8=; b=kZm28aUWclFKgyz74czPhL6aUjsHmxvXR2PZL86mCZOcY9nQcWj5LTvYgHF5JBUyWJ PeSktA9U2EtMeSSagOEbzGVXaTTQuD57nOraBCf398tMGgIAJ3ZhBjCgNXwqkoRDvyR0 A1OvQfvFFJr4wg37DBSBpKBvIVSpjdREVt3y4T2bsirrk6tNilbjNcRWLb/JBRQouLHq oeFZsB4X2DaFn1fngTXzDpFbq3HSZhjil2KR55UeThOo0qm6cx2+BAuHWCw2t2Xsr4ht C5FPKpLS7c6Rqn6FbelkR/VmNXzoGTtsMMd2lZBnA+H8GAlKTGIf2B52yV9EaeZZsVVJ lWhw== X-Forwarded-Encrypted: i=1; AFNElJ8lqx7JWuOD2bYmjart+YZpxIVfoaRs0Y9dc3TnMoSLFJwVA38Q5QkRRkdJYQrkqpFanNc2xEnNRPw8xyF8@lists.postgresql.org X-Gm-Message-State: AOJu0YwnbqYu3ck2xWxlFRIc+Xr68x5d/4k2HW6Apd8tC0wKCvG5n4hy y4exvG8tKZwAdNmQaBRp9B+VIYKjmDxKjsgE/8UAxDM4dDQwsb2iCP4UyHUgciVTdU3CJgMv1IN jGpB54sKyCnYE5BfqKig4cZD+hozz3Zw= X-Gm-Gg: Acq92OHYBmd74Adhl8WqHEiXqfquKNMtZPjFZC4LCS1gorv3D/6YG2646UIKVesS/vZ tp/CoCSEEWf/LC+y+3MEAhw+FEvPfYGqLSJJ6LGnvWMVKUGBunsuTQ2lTaSKntD9HV5IDV5dTJe PzZ41ZFNH2j254mLm8swU6TtTwGkcNFQ+DMysl+6Goyipm4QOXwGbP7eze2AG9M/IkAJVWSxkt9 ZD77XEr3lSflQ2842/yv8iGhMlheLffyUOmRDFFIV/MrrUfHdQsr2AJte9d9ZJvPiXqkGfYvxYS mXwrLwZep6kodE8g//6BE+QHSv7HcYh4NLt9X77rJxS5VjTkOn0= X-Received: by 2002:a05:651c:1118:10b0:38e:186e:350e with SMTP id 38308e7fff4ca-393c40d10ecmr34585941fa.7.1778243123978; Fri, 08 May 2026 05:25:23 -0700 (PDT) MIME-Version: 1.0 References: <77611.1778055944@localhost> In-Reply-To: <77611.1778055944@localhost> From: Amit Kapila Date: Fri, 8 May 2026 17:55:12 +0530 X-Gm-Features: AVHnY4KU1ejA4x45CiZB7xNL4SYCVMrAvFmTBN0wg9YJ072p6t5BG-IjiEvJhZM Message-ID: Subject: Re: Adding REPACK [concurrently] To: Antonin Houska Cc: Alvaro Herrera , Mihail Nikalayeu , Andres Freund , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, May 6, 2026 at 1:55=E2=80=AFPM Antonin Houska wrot= e: > > Alvaro Herrera wrote: > > > On 2026-May-05, Antonin Houska wrote: > > > > > However, I failed to notice that COMMIT record of > > > a transaction listed in the xl_running_xacts WAL record is not guaran= teed to > > > follow the xl_running_xacts record in WAL. In other words, even if > > > xl_running_xacts is created before a COMMIT record of the contained > > > transaction, it may end up at higher LSN in WAL. So the cleanup I rel= ied on > > > might not take place. > > > > That's pretty bad news. > > > > > I've got no good idea how to fix that. > > One idea occurred to me yet, effectively it's just a cleanup. Part of it = was > already proposed [1]. > Some issues/inefficiencies regarding this fix and base code related to db-specific snapshots built during decoding: * After fix, whenever a db-specific decoder sees a cluster-wide xl_running_xacts record, it unconditionally calls LogStandbySnapshot(MyDatabaseId) and returns. This triggers for every cluster-wide record the decoder encounters (including post snapbuild's CONSISTENT state) , for the entire duration of the decoding session. LogStandbySnapshot acquires ProcArrayLock + XidGenLock, calls GetRunningTransactionData, and writes WAL. With N active db-specific decoding sessions, each cluster-wide record now triggers N additional WAL writes. Additionally, LogStandbySnapshot also logs AccessExclusiveLocks before the running_xacts record. Physical standbys skip db-specific XLOG_RUNNING_XACTS records via standby_redo(), but they do process the preceding XLOG_STANDBY_LOCK records. The same locks may already have been logged in the most recent cluster-wide snapshot. Physical standbys could end up processing these lock records twice which may not be harmful because I think we avoid re-acquiring the lock but still it is a new overhead in the system. * When a cluster-wide running_xacts record arrives: SnapBuildProcessRunningXacts calls LogStandbySnapshot and returns early. ReorderBufferAbortOld is called, but with the cluster-wide oldestRunningXid, which could lag far behind the db-specific value (due to a long-running transaction in another database). When a db-specific record arrives: SnapBuildProcessRunningXacts processes it and advances builder->xmin with the db-specific (more current) oldestRunningXid. But ReorderBufferAbortOld is NOT called for db-specific records. This means the reorder buffer is cleaned up using a conservative, potentially very old, cluster-wide oldestRunningXid, even though builder->xmin has already advanced much further. The reorder buffer holds stale entries longer than necessary, increasing memory pressure. * I also see a design level problem with plugins that have need_shared_catalogs=3Dfalse and use failover slots. IIUC, the db-specific optimization was designed around a live decoding session on the primary which can emit and immediately read its own db-specific records in the WAL stream to reach consistent state. The LogicalSlotAdvanceAndCheckSnapState path used by slotsync has a bounded WAL window and cannot exploit the feedback loop, making the two mechanisms fundamentally incompatible. I know the slot created by pgrepack doesn't enable failover option but we have not even added any guards or thought about db-specific snapbuilds with other parts of the system that rely on cluster-wide running_xact records, so there could be more problems which we don't see with the current set of tests. --=20 With Regards, Amit Kapila.