Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFXsq-005Hhh-1p for pgsql-bugs@arkaria.postgresql.org; Wed, 22 Apr 2026 13:44:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFXsn-00DrZE-2U for pgsql-bugs@arkaria.postgresql.org; Wed, 22 Apr 2026 13:44:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFXsn-00DrYy-1f for pgsql-bugs@lists.postgresql.org; Wed, 22 Apr 2026 13:44:49 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wFXsl-00000002G8T-0SYB for pgsql-bugs@lists.postgresql.org; Wed, 22 Apr 2026 13:44:48 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 63MDieGp959934; Wed, 22 Apr 2026 09:44:41 -0400 From: Tom Lane To: Andrey Borodin cc: PostgreSQL mailing lists Subject: Re: Potential buffer overrun in spell.c's CheckAffix() In-reply-to: References: <641711.1776792744@sss.pgh.pa.us> Comments: In-reply-to Andrey Borodin message dated "Wed, 22 Apr 2026 16:57:26 +0500" MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-ID: <959932.1776865480.1@sss.pgh.pa.us> Content-Transfer-Encoding: 8bit Date: Wed, 22 Apr 2026 09:44:40 -0400 Message-ID: <959933.1776865480@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Andrey Borodin writes: >> On 21 Apr 2026, at 22:32, Tom Lane wrote: >> + /* protect against buffer overrun */ >> + if (len < Affix->replen || len >= 2 * MAXNORMLEN || >> + len - Affix->replen + findlen >= 2 * MAXNORMLEN) >> + return NULL; >> + >> strcpy(newword, word); >> strcpy(newword + len - Affix->replen, Affix->find); > Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”? Yes. Because of that initial "strcpy(newword, word);", we do actually need "word" to fit in the output buffer, even if the final output string is shorter. The other path does not require that. I suppose we could replace the strcpy with memcpy(newword, word, len - Affix->replen); and then we would not need the "len >= 2 * MAXNORMLEN" test and both paths could share the same check. There's something to be said for that, though it would be changing the logic to a greater extent than just "add some safety checks". >> I chose to do this by silently truncating the input before it can >> overrun the buffer, using logic comparable to the existing logic in >> get_nextfield(). Certainly there's at least as good an argument for >> raising an error, but for now let's follow the existing precedent. > Is there a reason not to emit WARNING? The data is obviously suspicious… > Perhaps, there’s a reason, so maybe just document it then. I could agree with changing all of these cases (including the existing get_nextfield checks) to throw ERROR; but I don't think that's something to do in a back-patched bug fix. Another thing I don't love, but wouldn't change in back branches, is the use of BUFSIZ for the string lengths. That's far more than necessary (why not just MAXNORMLEN?), causing these functions to eat much more stack space than they need. Also it seems like an unnecessary platform dependency. Maybe BUFSIZ is 8K everywhere, but I'm not sure of that. regards, tom lane