Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFqpa-005czA-3D for pgsql-bugs@arkaria.postgresql.org; Thu, 23 Apr 2026 09:58:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFqpa-0019MO-0y for pgsql-bugs@arkaria.postgresql.org; Thu, 23 Apr 2026 09:58:46 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFqpZ-0019MF-2s for pgsql-bugs@lists.postgresql.org; Thu, 23 Apr 2026 09:58:46 +0000 Received: from forwardcorp1a.mail.yandex.net ([178.154.239.72]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wFqpX-00000002d8D-20Uc for pgsql-bugs@lists.postgresql.org; Thu, 23 Apr 2026 09:58:45 +0000 Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c2d:3530:0:640:eca4:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id DA0ECC0159; Thu, 23 Apr 2026 12:58:41 +0300 (MSK) Received: from smtpclient.apple (unknown [2a02:6bf:8080:48::1:35]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp) with ESMTPSA id ewQEo20M5Cg0-sPn5nHpR; Thu, 23 Apr 2026 12:58:41 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1776938321; bh=b6XsdpwXht90ix2H6QnVe8FtGJihxhtGkmDTAO5PKkY=; h=Message-Id:To:Date:References:Cc:In-Reply-To:From:Subject; b=0oJ2o+rW+OZzUf4jsWGOVZ1BGUPRyp+qIkvulUHh2hkT/B8u1kOhJLl0TnbJ9AVGQ CrcmUkJsj8iq9Rlt3/tSInybUfsXTv+QPtatNmC9eIVhOGlbHGKCEQaZbOEabsOyNi y9N/MfocGGIHd+VXZXUWwvhGmch1vMjRn/d02Y6g= Authentication-Results: mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net; dkim=pass header.i=@yandex-team.ru Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.500.181\)) Subject: Re: Potential buffer overrun in spell.c's CheckAffix() From: Andrey Borodin In-Reply-To: <959933.1776865480@sss.pgh.pa.us> Date: Thu, 23 Apr 2026 14:58:30 +0500 Cc: PostgreSQL mailing lists Content-Transfer-Encoding: quoted-printable Message-Id: References: <641711.1776792744@sss.pgh.pa.us> <959933.1776865480@sss.pgh.pa.us> To: Tom Lane X-Mailer: Apple Mail (2.3864.500.181) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On 22 Apr 2026, at 18:44, Tom Lane wrote: >=20 > Andrey Borodin writes: >>> On 21 Apr 2026, at 22:32, Tom Lane wrote: >>> + /* protect against buffer overrun */ >>> + if (len < Affix->replen || len >=3D 2 * MAXNORMLEN || >>> + len - Affix->replen + findlen >=3D 2 * MAXNORMLEN) >>> + return NULL; >>> + >>> strcpy(newword, word); >>> strcpy(newword + len - Affix->replen, Affix->find); >=20 >> Is there a reason for an asymmetric check "len >=3D 2 * MAXNORMLEN = ||=E2=80=9D? >=20 > Yes. Because of that initial "strcpy(newword, word);", we do actually > need "word" to fit in the output buffer, even if the final output > string is shorter. The other path does not require that. >=20 > I suppose we could replace the strcpy with >=20 > memcpy(newword, word, len - Affix->replen); >=20 > and then we would not need the "len >=3D 2 * MAXNORMLEN" test > and both paths could share the same check. There's something > to be said for that, though it would be changing the logic to > a greater extent than just "add some safety checks". The argument about not changing behavior in back branches is very = convincing. But IMO v2 of the patch is better. Also I think changes are correct. >=20 >>> I chose to do this by silently truncating the input before it can >>> overrun the buffer, using logic comparable to the existing logic in >>> get_nextfield(). Certainly there's at least as good an argument for >>> raising an error, but for now let's follow the existing precedent. >=20 >> Is there a reason not to emit WARNING? The data is obviously = suspicious=E2=80=A6 >> Perhaps, there=E2=80=99s a reason, so maybe just document it then. >=20 > I could agree with changing all of these cases (including the existing > get_nextfield checks) to throw ERROR; but I don't think that's > something to do in a back-patched bug fix. Makes sense. >=20 > Another thing I don't love, but wouldn't change in back branches, > is the use of BUFSIZ for the string lengths. That's far more than > necessary (why not just MAXNORMLEN?), causing these functions to eat > much more stack space than they need. Also it seems like an > unnecessary platform dependency. Maybe BUFSIZ is 8K everywhere, > but I'm not sure of that. On my machine (MacOS) BUFSIZ is 1Kb. Yes, 40Kb in NIImportOOAffixes() is a lot. But is it important in grand = scheme of things? Minimum max_stack_depth is 100Kb, ought to be enough=E2=80=A6 Perhaps, =E2=80=9Creplen" and =E2=80=9Cfind" should not exceed = MAXNORMLEN. My limited understanding of affixes is not enough to confidently tell = that MAXNORMLEN is the limit. e.g. I see this: * newword: output buffer (MUST be of length 2 * MAXNORMLEN) So general rule =E2=80=9CMAXNORMLEN is an upper bound everywhere=E2=80=9D = is not uphold. I=E2=80=99m still under impression of understanding why NUM_MAX_ITEM_SIZ = =3D=3D 8 in the nearby thread. Now I know a lot more about Roman numbers. Digging deeper here might be a similar rabbit hole :) Best regards, Andrey Borodin.=