Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vi3Qu-00DkCk-0x for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Jan 2026 04:33:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vi3Qt-00Ft1j-1I for pgsql-hackers@arkaria.postgresql.org; Tue, 20 Jan 2026 04:33:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vi3Qt-00Ft1a-0H for pgsql-hackers@lists.postgresql.org; Tue, 20 Jan 2026 04:33:35 +0000 Received: from mail-dl1-x1236.google.com ([2607:f8b0:4864:20::1236]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vi3Qr-001S7a-0V for pgsql-hackers@lists.postgresql.org; Tue, 20 Jan 2026 04:33:35 +0000 Received: by mail-dl1-x1236.google.com with SMTP id a92af1059eb24-1232d9f25e9so9667095c88.0 for ; Mon, 19 Jan 2026 20:33:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768883610; x=1769488410; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ksqFMXzy4MZJ0ZAu2NGoc2uHloqpes4o5EhUG9i0jzA=; b=eUJePQZv4gz9UnrqiyVnZwIOzQFT4CCjrGY9mie+M0/+mP8zzoP5B3ZTbQsyrIELIz HwH8gixyAvjr/dQVgkkdERyT/7bfQYUXDsaDH7kNcRDCc7so2Y1wl6mSLBeclJwat2Tg /n8VqaVG9EGfi2MZJB/pwHceOhOXs06PxZlAon7Sv4eFNw9jAZK0U3yuJZzE6VNaNRlj H0ngF8kTvNEAbJglLOQMWD7udlXN8qEdKQUBWKp8VA9MsbpNvKg8CDQWozx0FBX4F99G tJYVkMSVAXrMBygiXsS2tnmA+IEze7ZgWBdgLnbWnaRb2jwBenarSvHujI1wHnaC5r++ fHRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768883610; x=1769488410; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ksqFMXzy4MZJ0ZAu2NGoc2uHloqpes4o5EhUG9i0jzA=; b=MygduXdcjWqBaGFD9yntTh5gBenN669HOEud57gUHI62SAyOeUyfhdDjBoCiQsM4g6 FvnJ2LSMQtByr0TmkJthXN/iW7wjWe0qdPJpDTulvWW2JeVoQzG+PGMRzSOLnHH8+mSY 3s2My5bZYDnQ7dYs7fzQWXbbOWmtwwEonyjbz2HEv8q+M6oO4VAADJtsykEBpKqq2k6n Xzx5LWgAN02bsF8K7RfaG+Jq4wudKUT3YWtLMwgFth+zjjnGIhg0fuFVgHIBm76ADEnO cozlm0hlPP6nCps71bULtz3AkFzfmcxcG2M8xnbMf3AwV2UohYVPNTdGyoiRDsRHq6Fp kfZA== X-Gm-Message-State: AOJu0Yx/xDXnS821c7egZqJdz5/2b0PCCuGDLjrHlJxBZja7IOwBRz4j +Sldvrh532wQB80k+WksVYsVwpurmDTAqHDy9b4Kj0fpCV5YABJOb1/1 X-Gm-Gg: AY/fxX6R3doLTEM/+Zp9B3zX8866swuvkzs70V6IAkpSM31UhP5f+29J9NeCwjBGuLn zOE8X5YR/gb8idjXVR4F2XlcEZvbVJ6FEFlW81buLUxnx2RRam7wrsxmOHEc2aEMnqlVDzj6fxb egeDdyzsu5BRjpotOwAoB112s+fXhc0jD6VtTA8SXtiFsT8mW8l6gX4DbBBqEhnxlWWfsjrEENo IaGw0GznnQ30XuW/G5+nfesqB90DKakNHHIo5N9Iwr4G/H9zEDFskTrvZ3hAhyu9Pulvxab07sc ph7FdVjI3jdfOHFVnHA4fNC4OVT7+E9CYlq41PP3IE+HSKFXKw0PlnUUKJtaYdCS+f3fi/BVNOq IZDuNp+3EwdpDFHFpuzn3fEYXgG247P+e2O/E3h/dJSt6oVkQxSE9gPnG1yimH/FWb0RRJoR6V9 Mt+Un87fv+wrtp+MyiNfE4YsBLtgJP X-Received: by 2002:a05:7022:6b86:b0:11a:fe6f:806a with SMTP id a92af1059eb24-1244a769e0emr9272997c88.31.1768883610379; Mon, 19 Jan 2026 20:33:30 -0800 (PST) Received: from smtpclient.apple ([64.32.14.230]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-1244ac57fd0sm18164005c88.3.2026.01.19.20.33.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 19 Jan 2026 20:33:29 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81.1.4\)) Subject: Re: More speedups for tuple deformation From: Chao Li In-Reply-To: Date: Tue, 20 Jan 2026 12:32:55 +0800 Cc: PostgreSQL Developers Content-Transfer-Encoding: quoted-printable Message-Id: <0663AA4F-74FB-48A5-B77B-3C150445FF2B@gmail.com> References: <9A17C43D-7A28-4885-8974-555A40C9523E@gmail.com> To: David Rowley X-Mailer: Apple Mail (2.3826.700.81.1.4) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Jan 20, 2026, at 08:11, David Rowley wrote: >=20 > On Mon, 19 Jan 2026 at 18:48, Chao Li wrote: >> I reviewed the patch and traced some basic workflows. But I haven=E2=80= =99t done a load test to compare performance differences with and = without this patch, I will do that if I get some bandwidth later. Here = comes some review comments: >>=20 >> 1 - tupmacs.h >> ``` >> + /* Create a mask with all bits beyond natts's bit set to off = */ >> + mask =3D 0xFF & ((((uint8) 1) << (natts & 7)) - 1); >> + byte =3D (~bits[lastByte]) & mask; >> ``` >>=20 >> When I read the code, I got an impression bits[lastByte] might = overflow when natts % 8 =3D=3D 0, so I traced the code, then I realized = that, this function is only called when a row has null values, so that, = when reaching here, natts % 8 !=3D 0, otherwise it should return earlier = within the for loop. >=20 > It certainly is possible to get to that part of the code when natts is > a multiple of 8 and the tuple contains NULLs after that (we may not be > deforming the entire tuple). The code you quoted that's setting "mask" > in that case will produce a zero mask, resulting in not finding any > NULLs. I don't quite see any risk of overflowing any of the types > here. If natts is 16 then effectively the code does 0xFF & ((1 << 0) > - 1); so no overflow. Just left shift by 0 bits and bitwise AND with > zero, resulting in the mask becoming zero. >=20 > How about if I write the comment as follows? >=20 > /* > * Create a mask with all bits beyond natts's bit set to off. The code > * below will generate a zero mask when natts & 7 =3D=3D 0. When that = happens > * all bytes that need to be checked were done so in the loop above. = The > * code below will create an empty mask and end up returning natts. = This > * has been done to avoid having to write a special case to check if = we've > * covered all bytes already. > */ >=20 I=E2=80=99m sorry I didn=E2=80=99t express myself clearly, maybe I = should have used =E2=80=9COOB=E2=80=9D rather than =E2=80=9Coverflow". = My real concern is about out-of-boundary read of bits[lastByte] when = natts&7=3D=3D0. Say, natts is 16, then bits is 2 bytes long; lastByte =3D 16>>3 =3D 2, = so bits[2] is a OOB read. If first_null_attr() is only called when hasnulls=3D=3Dtrue, then it = will never hit the OOB point, because it will return early from the = =E2=80=9Cfor=E2=80=9D loop. In the current patch, which is true, so the = OOB should never happen. However, I don=E2=80=99t see any comment mentions something like = =E2=80=9Cfirst_null_attr() should only be called when hasnulls is true. = If in future one calls first_null_attr() in a situation where hasnulls = =3D=3D false, then the OOB will be triggered. The comment you added explains that even if OOB happens, no matter what = value is hold by bits[lastByte], because mask is 0, the final result is = still correct, which is true, but OOB is still a concern. If the bits = array happens to end exactly at the edge of a memory page, the OOB read = bits[lastByte] may trigger a segment fault; and valgrind may detect the = OOB and complain about it. So, my original comment was that, we should at least add something to = the header comment to mention =E2=80=9Cfirst_null_attr() should only be = called when hasnulls is true. If we can add an Assert to ensure hasnulls = is true, that would be even better. But if we want first_null_attr() to be safe no matter hasnulls is true = or false, I think we should avoid the OOB. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/