Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBz2i-0081sv-JM for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Nov 2024 16:19:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tBz2g-00AjkM-06 for pgsql-hackers@arkaria.postgresql.org; Fri, 15 Nov 2024 16:19:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tBz2f-00AjkD-IA for pgsql-hackers@lists.postgresql.org; Fri, 15 Nov 2024 16:19:30 +0000 Received: from mail-ed1-x529.google.com ([2a00:1450:4864:20::529]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tBz2Y-00231T-U8 for pgsql-hackers@lists.postgresql.org; Fri, 15 Nov 2024 16:19:28 +0000 Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-5cec9ad04aeso332921a12.1 for ; Fri, 15 Nov 2024 08:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731687561; x=1732292361; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=eEI1YWWSqz/fzP4X8ksNfiS/AIaA+zKnIZlNc6pEogI=; b=ZuaecM7E5ms3UcYk1KLeWVgOC1klGewja2drvLC/qmpCn8vL6iwVlabQRZrvKTbYNr mfgDtM8T1kH2JqfMA7j0khr7SsA4ZxraUNumYF3g1qk9tSk2nRvzhEf5M4CeB/rgIdHo ahgkBrpOEdaa4n826bnC93ZEpbRw1H/4igwHYHlzbtCBE8PmLaL6KgppRjGVbwkKeDqS ajrH0GLXP3+V//peiboo+NAp16SObYcJhdDxdAQDtMKa9pr3O/9g5dH3HnA1ECooH/73 kMIlzUJ0w8/eOyLpvKgfj9mnwJltA0w9SSynHA+0qw4hhhnEamSG9pCIzKn6LegJVIJZ kZqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731687561; x=1732292361; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eEI1YWWSqz/fzP4X8ksNfiS/AIaA+zKnIZlNc6pEogI=; b=hAm6DPTypw27w0j25yAQhgw9OiXARx/FRK8JT5PyIwg0LJHThAFOpqSVJbO/4pEAAm /CXURtKpXp1tARq5XLEYvUfbKPIiKBoyNPCe2wwO6IaMTv43M6hJyWGUB4ymZyzvqhie z3BKAn+q/hzpEytVo2jSVlhD657xzd2X4Q5L/MMkjlzPHGNMPvV0hJ6kGjTDGszmkntr 5bqqDLtNaQpni8B/U1+/6Dl31K+pD8i+/NfARmVNkdnmtNsBnjG5V22scaXo1P4vItuC 4kKdp+ziXFT8catfXcv0ajY7qun7ik32/6RZdmKzGS5nWQFjp+yuLFC92N3gFYVLwHTL dztA== X-Forwarded-Encrypted: i=1; AJvYcCW72Jt7xVFIMDrcnTHnAkyKjA5UGv1fD5q2WJ2xigaOW2ZrmqWBs5qCY0XMaRvdqcIUlFe+XaZsbLuMxxrw@lists.postgresql.org X-Gm-Message-State: AOJu0Yxb8lZPCMMyaQnESbNMxe3SU/yIR1w7SkDQIUzWjQMqDYeqi6+z 5sxCvr3D7weaeKU91KyX58r/vFsJZBSaqh0qqbKO+Q30yx3wxOQxSqjxSIB12FWxS15cg8Bawo+ 38sO6Ko4Gpf5IWCBvziMq+W3MDLo= X-Google-Smtp-Source: AGHT+IEvnLKPEsWkeyWiDY1hwqp3uYYRRAS3NJH/PD39DG7qNJzsR1Nas36VjpOLHunT/j9BrhlkcmuZmvLJxyFRkH0= X-Received: by 2002:a05:6402:2794:b0:5ce:ccff:ac34 with SMTP id 4fb4d7f45d1cf-5cf8fce9bffmr956434a12.6.1731687561058; Fri, 15 Nov 2024 08:19:21 -0800 (PST) MIME-Version: 1.0 References: <5ecc35f5-1111-47fc-8a02-36d89490a50d@iki.fi> <24b3deb6-a732-4256-847a-560f4bf39d59@iki.fi> In-Reply-To: From: Maxim Orlov Date: Fri, 15 Nov 2024 19:19:07 +0300 Message-ID: Subject: Re: POC: make mxidoff 64 bits To: Heikki Linnakangas Cc: wenhui qiu , Alexander Korotkov , Postgres hackers Content-Type: multipart/alternative; boundary="000000000000f063850626f5edc1" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f063850626f5edc1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 15 Nov 2024 at 14:06, Heikki Linnakangas wrote: > Hmm, so if I understand correctly, this is related to how we determine > the length of the members array, by looking at the next multixid's > offset. This is explained in GetMultiXactIdMembers: > Correct. > If we accept that, we don't need to worry about case 3 anymore. But if > we upgrade wrapped-around members files by just renaming them, there > could still be a members array where we had skipped offset 0, and > reading that after the upgrade might get confused. We could continue to > ignore a 0 XID in the members array like the comment says; I think that > would be enough. But yeah, maybe it's better to bite the bullet in > pg_upgrade and squeeze those out. > Correct. I couldn't explain this better. I'm more for the squeeze those out. Overwise, we're ending up in adding another hack in multixact, but one of the benefits from switching to 64-bits, it should make XID's logic more straight forward. After all, mxact juggling in pg_upgrade is one time inconvenience. > > Does your upgrade test suite include case 3, where the next multixact's > offset is 1? > Not exactly. simple Latest checkpoint's NextMultiXactId: 119441 Latest checkpoint's NextMultiOffset: 5927049 offset-wrap Latest checkpoint's NextMultiXactId: 119441 Latest checkpoint's NextMultiOffset: 5591183 multi-wrap Latest checkpoint's NextMultiXactId: 82006 Latest checkpoint's NextMultiOffset: 7408811 offset-multi-wrap Latest checkpoint's NextMultiXactId: 52146 Latest checkpoint's NextMultiOffset: 5591183 You want test case where NextMultiOffset will be 1? > Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other > comments and checks that talk about binary-upgraded values too that we > can hopefully clean up now. > Yes, technically we can. But this is kinda unrelated to the offsets and will make the patch set significantly complicated, thus more complicated to review and less likely to be committed. Again, I'm not opposing the idea, I'm not sure if it is worth to do it right now. > > If we are to parse the member segments in detail in upgrade anyway, I'd > be tempted to make some further changes / optimizations: > > - You could leave out all locking XID members in upgrade, because > they're not relevant after upgrade any more (all the XIDs will be > committed or aborted and have released the locks; we require prepared > transactions to be completed before upgrading too). It'd be enough to > include actual UPDATE/DELETE XIDs. > > - The way we determine the length of the members array by looking at the > next multixid's offset is a bit complicated. We could have one extra > flag per XID in the members to indicate "this is the last member of this > multixid". That could either to replace the current mechanism of looking > at the next offset, or be just an additional cross-check. > > - Do we still like the "group" representation, with 4 bytes of flags > followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes per > XID unaligned. > Not really. But I would leave it for next iteration - switching multi to 64 bit. I already have some drafts for this. In any case, we'll must do adjustments in pg_upgrade again. My goal is to move towards 64 XIDs, but with the small steps, and I plan changes in "group" representation in combination with switching multi to 64 bit. This seems a bit more appropriate in my view. As for your optimization suggestions, I like them. I don=E2=80=99t against = them, but I=E2=80=99m afraid to disrupt the clarity of thought, especially since = the algorithm is not the simplest. --=20 Best regards, Maxim Orlov. --000000000000f063850626f5edc1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Fri, 15 Nov 2024 at 14:06, Heikki Linn= akangas <hlinnaka@iki.fi> wrot= e:
Hmm, so if I understand correctly, this is related to how we determine
the length of the members array, by looking at the next multixid's
offset. This is explained in GetMultiXactIdMembers:
Co= rrect.
=C2=A0
If we accept that, we don't need to worry about case 3 anymore. But if =
we upgrade wrapped-around members files by just renaming them, there
could still be a members array where we had skipped offset 0, and
reading that after the upgrade might get confused. We could continue to ignore a 0 XID in the members array like the comment says; I think that would be enough. But yeah, maybe it's better to bite the bullet in
pg_upgrade and squeeze those out.
Correct. I couldn= 9;t explain this better. I'm more for the squeeze those out. Overwise, = we're ending up in adding another hack in multixact, but one of the ben= efits from switching to 64-bits, it should make XID's logic more straig= ht forward. After all, mxact juggling in pg_upgrade is one time inconvenien= ce.
=C2=A0

Does your upgrade test suite include case 3, where the next multixact's=
offset is 1?
Not exactly.

simple
Latest chec= kpoint's NextMultiXactId: =C2=A0119441
Latest checkpoint's NextM= ultiOffset: =C2=A05927049

offset-wrap
Latest checkpoint's Nex= tMultiXactId: =C2=A0119441
Latest checkpoint's NextMultiOffset: =C2= =A05591183

multi-wrap
Latest checkpoint's NextMultiXactId: = =C2=A082006
Latest checkpoint's NextMultiOffset: =C2=A07408811
offset-multi-wrap
Latest checkpoint's NextMultiXactId: =C2=A052146=
Latest checkpoint's NextMultiOffset: =C2=A05591183

You want = test case where NextMultiOffset will be 1?
=C2=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex"> Can we remove MaybeExtendOffsetSlru() now? There are a bunch of other
comments and checks that talk about binary-upgraded values too that we
can hopefully clean up now.
Yes, technically we can. B= ut this is kinda unrelated to the offsets and will make the patch set signi= ficantly complicated, thus more complicated to review and less likely to be= committed. Again, I'm not opposing the idea, I'm not sure if it is= worth to do it right now.
=C2=A0

If we are to parse the member segments in detail in upgrade anyway, I'd=
be tempted to make some further changes / optimizations:

-=C2=A0 You could leave out all locking XID members in upgrade, because they're not relevant after upgrade any more (all the XIDs will be
committed or aborted and have released the locks; we require prepared
transactions to be completed before upgrading too). It'd be enough to <= br> include actual UPDATE/DELETE XIDs.

- The way we determine the length of the members array by looking at the next multixid's offset is a bit complicated. We could have one extra flag per XID in the members to indicate "this is the last member of th= is
multixid". That could either to replace the current mechanism of looki= ng
at the next offset, or be just an additional cross-check.

- Do we still like the "group" representation, with 4 bytes of fl= ags
followed by 4 XIDs? I wonder if it'd be better to just store 5 bytes pe= r
XID unaligned.
Not really. But I would l= eave it for next iteration - switching multi to 64 bit. I already have some= drafts for this. In any case, we'll must do adjustments in pg_upgrade = again. My goal is to move towards 64 XIDs, but with the small steps, and I = plan changes in "group" representation in combination with switch= ing multi to 64 bit. This seems a bit more appropriate in my view.

As for your opti= mization suggestions, I like them. I don=E2=80=99t against them, but I=E2= =80=99m afraid to disrupt the clarity of thought, especially since the algo= rithm is not the simplest.

--
Best regards,
Maxim Orlov.
--000000000000f063850626f5edc1--