Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vhi7p-004k0m-0g for pgsql-hackers@arkaria.postgresql.org; Mon, 19 Jan 2026 05:48:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vhi7n-00Bic3-0E for pgsql-hackers@arkaria.postgresql.org; Mon, 19 Jan 2026 05:48:27 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vhi7m-00Bibv-24 for pgsql-hackers@lists.postgresql.org; Mon, 19 Jan 2026 05:48:27 +0000 Received: from mail-dl1-x1233.google.com ([2607:f8b0:4864:20::1233]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vhi7j-001Bej-2w for pgsql-hackers@lists.postgresql.org; Mon, 19 Jan 2026 05:48:26 +0000 Received: by mail-dl1-x1233.google.com with SMTP id a92af1059eb24-12336f33098so2782425c88.0 for ; Sun, 18 Jan 2026 21:48:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768801703; x=1769406503; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VugJqKsKcghKRQRTdUbLSpJyLG79oMqmJLNY9FXkQEw=; b=MbeJ6aEJl7B+Nyy87o29F74buxWwRncuqxqHcLeOTs+09IummBsUPRdBD6mBt8rTe7 p1XZW2+nphOjPMtm3UotFN6hw2AhsN8cBiyEmWocE4JuRFP+JtnO92fsrPYF7lyS9bCD eqFy1trWQT/mEWF3QA38PxXUXcHmnQUFWZwPdLIvsNEsUp8KL3RxGmVcz2yI/hwUmSQt +iF+3YBXrc4148Uov4uUEUsU2DqtyzPHO5XeGVsjGV5pWGIM/HNLB2+HklekvR9kcFIY JX1jQW0apUo/cZEVZGjsnXFeFJV7+8eqRPQY63glJsIziWBzt2LN6qDVkpKgB+mGt1IX 8DAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768801703; x=1769406503; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=VugJqKsKcghKRQRTdUbLSpJyLG79oMqmJLNY9FXkQEw=; b=ulUN1ToqMeDZOlA0f6tlGjWX0Gfo0i1/0x32nL/v8jZ3TUZVpD6LLmdsEvCGDuJxpu 8K9r26gU9xMcldtkq7EX7iQMRKGLAui0pjPpYusgUnpxbl446nMdB1Y5dw+8kDlPsA9s X1PIMgajL/gMyL2cgPvsgga8Z5UM4m66kDsn4+75fk3IIu+dEpfb/vrAMXSId/8aJbtn +cb+mI1zUzMGvEk8vxTbCm42f1vX0DnLmulNnUTN0gCJzTPgyjSdGYTBJO0oIseXFLqb tH29QFzRRmmQ1L7kXkIfDb6wzOWDhr1AuC0OpvE1PEmNyvW4hhmqUT9mIX4veYAgThWL N15g== X-Gm-Message-State: AOJu0YzaNh/xjo9u+0LxwX9m+4Pi60eFLdXPC3CqXbJhBWRhjF/47v6i V3E1NoJIOtQ6eU35m0g2fTJEi8fgLg6a6Th4SzeuWh23NP78oWqTDrxh X-Gm-Gg: AY/fxX66JY+gAs4QmBFAxYyYIg7EjH23cUJjGtmYeLPh4Ssaowq8Lxj6n3qQ1yqxR8n pvMwhbuxlWZUF9LxjRQTlNSbkkqQPNzD6ZSVl4fbn5cmDMwLKc6uXRCs5tUkh5SPlbwU0wRqBSS bbs03sajKiBKm6OSP/c9GDYzzxTs9tj0RO0I7lpx/woQ+zq2VNJCnvSsG7QNX9sMX1NWTFzBNio PoKBYa0pQSgNqJD0WWBMj/Ts0pH8mLFER+pA7J6T42fzyQR9oP8WN14tbFSzzAP22pU5JOdIcVh kzSUmxAEien402f3xMVtxTe1IlDyji6ihRUuh3X+1yH965o7MpURNyPY4LrUyX3gNsyJ2rSNX9z KAcZGMZyJqvfP5OktnijIpvZAb8j3RTMaHjsWNG54OpN020Z1JKd/NF1J9yB3wU+d0hSt8ZleDc dLvcMXftOcBpn0d3TOWw== X-Received: by 2002:a05:7022:6081:b0:11b:c1fb:89a with SMTP id a92af1059eb24-1244b35f486mr6926188c88.32.1768801702831; Sun, 18 Jan 2026 21:48:22 -0800 (PST) Received: from smtpclient.apple ([64.32.14.230]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1244aefa7c5sm15035514c88.10.2026.01.18.21.48.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 18 Jan 2026 21:48:22 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81.1.4\)) Subject: Re: More speedups for tuple deformation From: Chao Li In-Reply-To: Date: Mon, 19 Jan 2026 13:47:47 +0800 Cc: PostgreSQL Developers Content-Transfer-Encoding: quoted-printable Message-Id: <9A17C43D-7A28-4885-8974-555A40C9523E@gmail.com> References: To: David Rowley X-Mailer: Apple Mail (2.3826.700.81.1.4) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Jan 19, 2026, at 06:13, David Rowley wrote: >=20 > On Fri, 2 Jan 2026 at 18:58, David Rowley = wrote: >> Please find attached an updated set of patches. A rebase was needed, >> plus 0003 had a problem with an Assert not handling the bitmap being = a >> NULL pointer. >=20 > Another rebase and updates to some newly created missing calls to > TupleDescFinalize(). >=20 > I've also attached another round of benchmarks after dipping into some > Azure machines to cover my lack of any Intel benchmark results. I > think these are somewhat noisy as I opted for low core-count instances > which will have L3 shared with workloads running for other people. > This is most evident in Xeon_E5-2673 with gcc where the patched run > was nearly twice as fast as unpatched for test 2 on 20 extra columns. > If you look at the raw results from that, you can see the times are > quite unstable between the 3 runs of each test, which makes me believe > that the machine was busy with other work when that test ran on > master. The AMD3990x and M2 machines are all sitting next to me and > were otherwise idle, so they should be much more stable. >=20 > Quite a few machines have a small regression for the 0 extra column > tests. There is a small amount of extra work being done in the > deforming function to check if the attnum < the first attribute > without an attcacheoff. This mostly only affects the tests that don't > do any deforming with a cached attcacheoff, e.g due to NULLs or > varlena types. The only way I've thought about to possibly reduce that > is to invent a new TupleTableSlotOps and pick the one that applies > when creating the TupleTableSlot. This doesn't appeal to me very much > as it requires modifying many callsites. But I do wonder if we should > try to come up with something here as technically we could use this to > eliminate alignment padding out of some MinimalTuples in some cases > where these were not directly derived from pre-formed HeapTuples. That > could allow a more compact tuple representation for sorting and > hashing, allowing us to do more with less memory in some cases. >=20 > The benchmark results also indicated that there wasn't much advantage > to the 0002+0003 patches, so I've removed those from the set. That > reduces some complexity around the benchmarks. I did still keep the > OPTIMIZE_BYVAL loop as separate results. It's not quite clear what's > best there as machines seem to vary on which they prefer. >=20 > Benchmark results attached in the bz2 file both in spreadsheet form > and the raw results pg_dumped. >=20 > David > = Hi David, I reviewed the patch and traced some basic workflows. But I haven=E2=80=99= t done a load test to compare performance differences with and without = this patch, I will do that if I get some bandwidth later. Here comes = some review comments: 1 - tupmacs.h ``` + /* Create a mask with all bits beyond natts's bit set to off */ + mask =3D 0xFF & ((((uint8) 1) << (natts & 7)) - 1); + byte =3D (~bits[lastByte]) & mask; ``` When I read the code, I got an impression bits[lastByte] might overflow = when natts % 8 =3D=3D 0, so I traced the code, then I realized that, = this function is only called when a row has null values, so that, when = reaching here, natts % 8 !=3D 0, otherwise it should return earlier = within the for loop. So, to avoid future reader=E2=80=99s same confusion, can we add a brief = comment to explain that no overflow should happen here. 2 - After this patch, nocachegetattr() and nocache_index_getattr() = strictly rely on tupleDesc->firstNonCachedOffAttr to work: ``` if (tupleDesc->firstNonCachedOffAttr >=3D 0) { startAttr =3D Min(tupleDesc->firstNonCachedOffAttr - 1, = firstnullattr); off =3D TupleDescCompactAttr(tupleDesc, = startAttr)->attcacheoff; } else { startAttr =3D 0; off =3D 0; } ``` And tupleDesc->firstNonCachedOffAttr is only set by TupleDescFinalize(). = So, assuming some code misses to call TupleDescFinalize(), looking at = how TupleDesc is created, for example CreateTemplateTupleDesc(): ``` desc =3D (TupleDesc) palloc(offsetof(struct TupleDescData, = compact_attrs) + natts * = sizeof(CompactAttribute) + natts * = sizeof(FormData_pg_attribute)); /* * Initialize other fields of the tupdesc. */ desc->natts =3D natts; desc->constr =3D NULL; desc->tdtypeid =3D RECORDOID; desc->tdtypmod =3D -1; desc->tdrefcount =3D -1; /* assume not = reference-counted */ return desc; ``` It=E2=80=99s palloc and not palloc0, so desc->firstNonCachedOffAttr will = initially hold a random value. As long as TupleDescFinalize() is missed, = then that=E2=80=99s a bug. =46rom this perspective, I think we can set firstNonCachedOffAttr to -2 = when in CreateTemplateTupleDesc() as well as other functions that create = a TupleDesc. Then in nocachegetattr() and nocache_index_getattr(), we = can Assert(desc->firstNonCachedOffAttr > -2). 3 ``` + firstNonCachedOffAttr =3D i + 1; ``` In TupleDescFinalize(), given firstNonCachedOffAttr =3D i + 1, = firstNonCachedOffAttr will never be 0. But in nocachegetattr(), it checks firstNonCachedOffAttr >=3D 0: ``` if (tupleDesc->firstNonCachedOffAttr >=3D 0) { startAttr =3D Min(tupleDesc->firstNonCachedOffAttr - 1, = firstnullattr); off =3D TupleDescCompactAttr(tupleDesc, = startAttr)->attcacheoff; } ``` This is kinda inconsistent, and may potentially lead to some confusion = to code readers. =46rom the meaning of the variable name =E2=80=9CfirstNonCachedOffAttr=E2=80= =9D, when there is no cached attribute, firstNonCachedOffAttr feels = better to be 0 rather than-1. =46rom this perspective, = TupleDescFinalize() can initialize desc->firstNonCachedOffAttr to 0. And = for my comment 2, we can use -1 instead of -2, so that -1 indicates = TupleDescFinalize() is not called, 0 means no cached attribute, >0 means = some cached attributes.=20 4 ``` + * possibily could cache the first attlen =3D=3D -2 = attr. Worthwhile? ``` Typo: possibily -> possibly Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/