Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJvvH-008iho-0A for pgsql-hackers@arkaria.postgresql.org; Fri, 14 Nov 2025 15:41:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vJvvD-0073Ib-2g for pgsql-hackers@arkaria.postgresql.org; Fri, 14 Nov 2025 15:41:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJvvD-0073IS-1Q for pgsql-hackers@lists.postgresql.org; Fri, 14 Nov 2025 15:41:11 +0000 Received: from mail-qt1-x82e.google.com ([2607:f8b0:4864:20::82e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vJvv6-007k4v-16 for pgsql-hackers@lists.postgresql.org; Fri, 14 Nov 2025 15:41:10 +0000 Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-4ed77d1d4a8so1623681cf.1 for ; Fri, 14 Nov 2025 07:41:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763134862; x=1763739662; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qTts6ZxZV0iaevTEK4/gtqfV42IOP6C5Q8g6WvOX+go=; b=PiwNN+x31o0ZktoA/liuOqgM2XeQmHFj/tJNVyi/Ss81bTZfFenwUzm5Z16rwIWkbc La4OdFc7xVZw1ntdD4zjEUJz9t1w8ke6h8O5G2m9Fks0ok6xUFs0J+uKR9cxeG3fSFMB p2ewvbyulav6mzXl3oQpV1AkPO9JwtvoaK+QGH6alCjoQvHUKfEfbIy6KJZeLtp/Ixsd TcbATLjzwR4DWpHQXEWnegg192Xv0URbynYjt5wWBo6hKqu6tEMlvzbbkUCpm12DsDAe rswLvt1VeLJ6JHdLnHk2W6UnMs12eFD/dtCuBFpeLAEPgjNkNzBT9jiHNIYenZwx2sB9 /U/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763134862; x=1763739662; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qTts6ZxZV0iaevTEK4/gtqfV42IOP6C5Q8g6WvOX+go=; b=q2ocCnbv0ip1QARG10JZj2T42NKfCkgf1DY/fUQdbnP1Gh2o6zd111vqY8LFXPy7rN +yRZfsOOL5pNnY9Nw0omjB9oMhPi1v8pzRSmOJUz4Uza/1oMWm3B3fXoFOSX8g3pldva Btwn2CcJ+nxydeiE2aYIeoeK1f2NR7caXjw2jHprwROcSv+8E78OXjgwt3dspnlcDbnE CjGmVSJV731RLpkMl9OXzlE+3zElBbDOTd1OoQwny4yQuq/aFlwmPMJN1PRM+ZATaVxC 6Lu3tmIHA4QRngo/IJLqEyzDxzjwe5+AdrD13bUICJzAb0pU5d0vJY1D8yFFs4rgO94W F5Zg== X-Forwarded-Encrypted: i=1; AJvYcCV0P63tJvejTzTX/0f+MFeRtEimdji1FONsqSP3lNDBGK5aBFFtWw8dor8yuq33wF7ellw5++Y/D/6IW8jl@lists.postgresql.org X-Gm-Message-State: AOJu0YzaSXptH5a1bhkK36Ht7n7ScVCf+4CasriXoYmWKDID9FNgCHnu nkYI59cw5DrTL+3XwDozg/HvIMaG4z1BmUdi7ObZeMvmUDkWiOH2QllwtI7JcjsTHF4ghYKA+De 1V6f595z4jm2baknujmmwTGNOlh/SGR8cJBj1 X-Gm-Gg: ASbGncuFNj7ZDxPTgwkrx4k5U9JcHMZoo4fM/YQMeeXxRrpGRYS6csvnmsXWuli9GNj Qgr//ZCC8BsavN7S2581WinlweBI8l9L7plxhPxuBjVydZulEgmAkLqkosV9GpOvCex9zdlaiNj 8ExtINenH11nSxV5zjFskE4rzof3AaVOMECzsgQ1bkZgaBrniM3gPttksyz4Zp9CPU9FRnyDq80 q/Ln1zHrcMN39YHhpORUnUAvpCG3mxD0YMKjRy6nETOSNh7jKpEbu9zyQjWskd9xUfBl+NaMmbN NmEYXw== X-Google-Smtp-Source: AGHT+IHk2TuYI+cQl/zhka1kR48ecOQ7hgKfaNJMtQ5zaYoMtX1P3mLEp7lshT0pctpx6zkqWQKGzSbcBJRQSGZMCQE= X-Received: by 2002:a05:622a:409:b0:4ed:b409:ca27 with SMTP id d75a77b69052e-4edf36eec92mr28289751cf.10.1763134861962; Fri, 14 Nov 2025 07:41:01 -0800 (PST) MIME-Version: 1.0 References: <4535f3aa-3220-4760-b1f5-2bc91f248e03@iki.fi> <2bc58592-9d74-4af0-bdd1-1a88e8683f7c@iki.fi> <36531c0e-292c-409d-bbc7-a252cf6e910a@iki.fi> <54aa8f65-f0e4-4464-b543-e0399c1cab1e@iki.fi> In-Reply-To: <54aa8f65-f0e4-4464-b543-e0399c1cab1e@iki.fi> From: Maxim Orlov Date: Fri, 14 Nov 2025 18:40:50 +0300 X-Gm-Features: AWmQ_bkjRk9J94rSs6O68Qe2IS7owTARRFUk9YAtW9yHDZIdp0Imc4_yQcL6bsY Message-ID: Subject: Re: POC: make mxidoff 64 bits To: Heikki Linnakangas Cc: wenhui qiu , Alexander Korotkov , Postgres hackers Content-Type: multipart/alternative; boundary="00000000000023700006438fd307" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000023700006438fd307 Content-Type: text/plain; charset="UTF-8" On Wed, 12 Nov 2025 at 16:00, Heikki Linnakangas wrote: > > I added an > inlined fast path to SlruReadSwitchPage and SlruWriteSwitchPage to > eliminate the function call overhead of those in the common case that no > page switch is needed. With that, the 100 million mxid test case I used > went from 1.2 s to 0.9 s. We could optimize this further but I think > this is good enough. > I agree with you. - I added an SlruFileName() helper function to slru_io.c, and support > for reading SLRUs with long_segment_names==true. It's not needed > currently, but it seemed like a weird omission. AllocSlruRead() actually > left 'long_segment_names' uninitialized which is error-prone. We > could've just documented it, but it seems just as easy to support it. > Yeah, I didn't particularly like that place either. But then I decided it was overkill to do it for the sake of symmetry and would raise questions. It turned out much better this way. > I kept all the new test cases for now. We need to decide which ones are > worth keeping, and polish and speed up the ones we decide to keep. > I think of two cases here. A) Upgrade from "new cluster": * created cluster with pre 32-bit overflow mxoff * consume around of 2k of mxacts (1k before 32-bit overflow and 1k after) * run pg_upgrade * check upgraded cluster is working * check data invariant B) Same as A), but for an "old cluster" with oldinstall env. On Thu, 13 Nov 2025 at 19:04, Heikki Linnakangas wrote: > > Here's a new patch version that addresses the above issue. I resurrected > MultiXactMemberFreezeThreshold(), using the same logic as before, just > using pretty arbitrary thresholds of 1 and 2 billion offsets instead of > the safe/danger thresholds derived from MaxMultiOffset. That gives > roughly the same behavior wrt. calculating effective freeze age as before. > Yes, I think it's okay for now. This reflects the existing logic well. I wonder what the alternative solution might be? Can we make a "vacuum freeze" also do pg_multixact segments truncation? In any case, this can be discussed later. -- Best regards, Maxim Orlov. --00000000000023700006438fd307 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, 12 Nov 2025 at 16:00, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I added an
inlined fast path to SlruReadSwitchPage and SlruWriteSwitchPage to
eliminate the function call overhead of those in the common case that no page switch is needed. With that, the 100 million mxid test case I used went from 1.2 s to 0.9 s. We could optimize this further but I think
this is good enough.
I agree with you.
=C2=A0

- I added an SlruFileName() helper function to slru_io.c, and support
for reading SLRUs with long_segment_names=3D=3Dtrue. It's not needed currently, but it seemed like a weird omission. AllocSlruRead() actually left 'long_segment_names' uninitialized which is error-prone. We could've just documented it, but it seems just as easy to support it.
Yeah, I didn't particularly like that place either.= But then I decided it was
overkill to do it for the sake of symm= etry and would raise questions.
It turned out much better this wa= y.

=C2=A0
I kept all the new test cases for now. We need to decide which ones are worth keeping, and polish and speed up the ones we decide to keep.
I think of two cases here.
A) Upgrade from "new= cluster":
=C2=A0 =C2=A0 *=C2=A0created cluster with pre 32-= bit overflow mxoff
=C2=A0 =C2=A0 * consume around of 2k of mxacts= (1k before=C2=A032-bit overflow
=C2=A0 =C2=A0 =C2=A0 and 1k afte= r)
=C2=A0 =C2=A0 * run pg_upgrade
=C2=A0 =C2=A0 * check= upgraded cluster is working
=C2=A0 =C2=A0 * check data invariant=
B)=C2=A0 Same as A), but for an "old cluster" with old= install env.


On Thu, 13 Nov 2025 = at 19:04, Heikki Linnakangas <hlinnak= a@iki.fi> wrote:

Here's a new patch version that addresses the above issue. I resurrecte= d
MultiXactMemberFreezeThreshold(), using the same logic as before, just
using pretty arbitrary thresholds of 1 and 2 billion offsets instead of the safe/danger thresholds derived from MaxMultiOffset. That gives
roughly the same behavior wrt. calculating effective freeze age as before.<= br>
Yes, I think it's okay for now.=C2=A0This reflects= the existing logic well.
I wonder what the alternative solution = might be?=C2=A0Can we make a
"vacuum freeze&= quot; also do=C2=A0pg_multixact=C2=A0segments truncation?
In an= y case, this can be discussed later.

--
Best regards,
Maxim Orlov.
--00000000000023700006438fd307--