Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEOgF-00BEBo-NF for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 09:10:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vEOgE-006jA4-En for pgsql-hackers@arkaria.postgresql.org; Thu, 30 Oct 2025 09:10:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vEOgE-006j9u-2I for pgsql-hackers@lists.postgresql.org; Thu, 30 Oct 2025 09:10:49 +0000 Received: from lahtoruutu.iki.fi ([185.185.170.37]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vEOgA-0052Pi-23 for pgsql-hackers@lists.postgresql.org; Thu, 30 Oct 2025 09:10:48 +0000 Received: from [10.0.2.15] (unknown [130.41.208.2]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: hlinnaka) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id 4cxyyD1GQ1z49Q4d; Thu, 30 Oct 2025 11:10:44 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1761815444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=os5SRDZJqu4YE9ItOs100Ke2ybkuZ571J+zX9iFSvAo=; b=teFmdC1LCpnsb5tBnuj+024Jn7zGfI7mU8B3hirHmF7nlvIy6uW00mG+8bztgMkqQeDllN l1yiO6LWYEhw+cTsfVr+jkGj35bWkUxqQT5McypIC0uU6pTF4TtsPwaOxDAPipjcJlQa7o pBIahjjsw7dhJDCTWuR2xk7rRwVl4XxBWwFgnIkglAaIiJKXphJ4uNORdIJCMiF/rG6LKl WVdgmlbZhb2eTGcMqV9I8wovkdBAwiSAZxatiSKGZ84vjdtk/IZrW0iZ1Lxl6TGEBMZjtj ecQ7scIbtLF8ndVRqYgQxqoUgkts2ET0cQq1EA6JUuJchj90VukhTTmbaRhYfQ== ARC-Seal: i=1; s=lahtoruutu; d=iki.fi; t=1761815444; a=rsa-sha256; cv=none; b=ewjFiwhgUnTuFh3ki9cdF16Y3b8Yeu1Ei+NKmsNVV9cP/zxYUGdvy5rGTKNlYklTxLc1U3 l4m4R33tCq/hLztJQLI/ZH0J4OpWXGF5CMyEwSZAT8MujKLgU+k0PClcBOXRae3fo3x8SY zfb2wzKC0t90hi5QX56FSirJ+fRYp82J5xpSHCZChc3bDY+FnqeFiZyUBK75jp8J52TkM/ TCny0wVUQgqiNTOM+NnrlWtmMTavcVraB0uHB4H2ntm04VMqfREM5gJmjVe16bPY2rVU+z KZpvSlk55+tKW4ezeTAox690xpIycWM+WqdIx5hzV6OxzEtrv3nR53aTlR6Ncw== ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=hlinnaka smtp.mailfrom=hlinnaka@iki.fi ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1761815444; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=os5SRDZJqu4YE9ItOs100Ke2ybkuZ571J+zX9iFSvAo=; b=NXw40ChxGU7ePmPBXQDNA7UfMus5tuHF3vlNEfZHQg47eqeK/JbqGhNUuN6iaFMJiMz+Ye c3IkIwdsRP85avxbug+lfE9lDgPQCDcEiEq5V6DI3QTmuTv0/GWDb9jS7300gz1He1BdXy 43h1pOX+UZbusFiqbKCi2TDtFfu0SWTC751XP/WCnZto+4SW1//BIis9kmWf+4oVnWIHSQ oWX+jTVMhuwjq9u1cfSH4U9LMGIE9idjSYhctExHuU219147hD57dMd+GTfnL5bmKgT6AN vV7MMR0aR67DXALkXZ639nMAEsvryzZHOwHmGGSmP71Z7pQ6CsePvqnzuUtNMQ== Message-ID: <0794857c-aaaf-4cd3-bd99-84c6155bb2f6@iki.fi> Date: Thu, 30 Oct 2025 11:10:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: POC: make mxidoff 64 bits To: Maxim Orlov Cc: wenhui qiu , Alexander Korotkov , Ashutosh Bapat , Postgres hackers References: Content-Language: en-US From: Heikki Linnakangas In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 30/10/2025 08:13, Maxim Orlov wrote: > On Tue, 28 Oct 2025 at 17:17, Heikki Linnakangas > wrote: > > On 27/10/2025 17:54, Maxim Orlov wrote: > > > If backend C looks up multixid 101 in between steps 3 and 4, it would > read the offset incorrectly, because 'base' isn't set yet. > > Hmm, maybe I miss something? We set page base on first write of any > offset on the page, not only the first one. In other words, there > should never be a case when we read an offset without a previously > defined page base. Correct me if I'm wrong: > 1. Backend A assigned mxact=100, offset=1000. > 2. Backend B assigned mxact=101, offset=1010. > 3. Backend B calls RecordNewMultiXact()/MXOffsetWrite() and >     set page base=1010, offset plus 0^0x80000000 bit while >     holding lock on the page. > 4. Backend C looks up for the mxact=101 by calling MXOffsetRead() >     and should get exactly what he's looking for: >     base (1010) + offset (0) minus 0x80000000 bit. > 5. Backend A calls RecordNewMultiXact() and sets his offset using >     existing base from step 3. Oh I see, the 'base' is not necessarily the base offset of the first multixact on the page, it's the base offset of the first multixid that is written to the page. And the (short) offsets can be negative. That's a frighteningly clever encoding scheme. One upshot of that is that WAL redo might get construct the page with a different 'base'. I guess that works, but it scares me. Could we come up with a more deterministic scheme? - Heikki