Potential buffer overrun in spell.c's CheckAffix()

public inbox for [email protected]  
help / color / mirror / Atom feed

Potential buffer overrun in spell.c's CheckAffix()
5+ messages / 2 participants
[nested] [flat]

* Potential buffer overrun in spell.c's CheckAffix()
@ 2026-04-21 17:32  Tom Lane <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Tom Lane @ 2026-04-21 17:32 UTC (permalink / raw)
  To: [email protected]

CheckAffix is used by our ispell text search dictionaries to attach a
prefix or suffix to a given base word.  The input word is known to be
no longer than MAXNORMLEN (256), and an output buffer of size
MAXNORMLEN * 2 is provided.  But there's not any a-priori limit on the
length of a prefix or suffix string, so in principle a buffer overflow
could occur.

In practice these limits seem like more than plenty for any real-world
word, so I think it's sufficient to just reject the prefix or suffix
if an overflow would occur, as attached.

This bug was reported to pgsql-security by Xint Code as a potential
security issue.  However we decided it doesn't seem worth the CVE
treatment, because exploiting it would require getting a malicious
ispell dictionary installed in a PG server.  Putting the .dict file
into the installation's file tree would require superuser privileges,
and so would creating a text dictionary SQL object that references it.
Maybe an attacker could persuade a gullible DBA to do that, but there
are plenty of other attack pathways available if you're that
persuasive.

Despite that, it seems worth fixing as a run-of-the-mill bug.
Any objections to the attached?

			regards, tom lane

Attachments:

  [text/x-diff] v1-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patch (3.3K, 2-v1-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patch)
  download | inline diff:
From 977c82d2501465789ccda052f0b718183e89d816 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Sun, 12 Apr 2026 20:42:15 -0400
Subject: [PATCH v1] Prevent buffer overrun in spell.c's CheckAffix().

This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that.  Defend by just rejecting the match
if it would overrun the buffer.  I also inserted checks of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <[email protected]>
Backpatch-through: 14
---
 src/backend/tsearch/spell.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index a1bfd2a9f9b..79a2a459406 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -2065,9 +2065,29 @@ FindAffixes(AffixNode *node, const char *word, int wrdlen, int *level, int type)
 	return NULL;
 }

+/*
+ * Checks to see if affix applies to word, transforms word if so.
+ *
+ * word: input word
+ * len: length of input word
+ * Affix: affix to consider
+ * flagflags: context flags showing whether we are handling a compound word
+ * newword: output buffer (MUST be of length 2 * MAXNORMLEN)
+ * baselen: input/output argument
+ *
+ * If baselen isn't NULL, then *baselen is used to return the length of
+ * the non-changed part of the word when applying a suffix, and is used
+ * to detect whether the input contained only a prefix and suffix when
+ * later applying a prefix.
+ *
+ * Returns newword on success, or NULL if the affix can't be applied.
+ * On success, the modified word is stored into newword.
+ */
 static char *
 CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *newword, int *baselen)
 {
+	size_t		findlen;
+
 	/*
 	 * Check compound allow flags
 	 */
@@ -2103,8 +2123,15 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 	/*
 	 * make replace pattern of affix
 	 */
+	Assert(len == strlen(word));
+	findlen = strlen(Affix->find);
 	if (Affix->type == FF_SUFFIX)
 	{
+		/* protect against buffer overrun */
+		if (len < Affix->replen || len >= 2 * MAXNORMLEN ||
+			len - Affix->replen + findlen >= 2 * MAXNORMLEN)
+			return NULL;
+
 		strcpy(newword, word);
 		strcpy(newword + len - Affix->replen, Affix->find);
 		if (baselen)			/* store length of non-changed part of word */
@@ -2112,11 +2139,16 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 	}
 	else
 	{
+		/* protect against buffer overrun */
+		if (len < Affix->replen ||
+			findlen + len - Affix->replen >= 2 * MAXNORMLEN)
+			return NULL;
+
 		/*
 		 * if prefix is an all non-changed part's length then all word
 		 * contains only prefix and suffix, so out
 		 */
-		if (baselen && *baselen + strlen(Affix->find) <= Affix->replen)
+		if (baselen && *baselen + findlen <= Affix->replen)
 			return NULL;
 		strcpy(newword, Affix->find);
 		strcat(newword, word + Affix->replen);
-- 
2.43.7

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Potential buffer overrun in spell.c's CheckAffix()
@ 2026-04-21 22:35  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Tom Lane @ 2026-04-21 22:35 UTC (permalink / raw)
  To: [email protected]

Further to that ... I found another item in the pgsql-security
archives concerning a buffer overrun in ispell affix-file parsing,
which we had likewise deemed not a security vulnerability because
text search configuration files are assumed trustworthy.
But if we're going to tighten up CheckAffix() then it's pretty
silly not to fix these issues too.

			regards, tom lane



Attachments:

  [text/x-diff] v1-0001-Prevent-some-buffer-overruns-in-spell.c-s-parsing.patch (4.7K, 2-v1-0001-Prevent-some-buffer-overruns-in-spell.c-s-parsing.patch)
  download | inline diff:
From 740e9b9887dc47d8f12745ade91839ffe27e40d2 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Tue, 21 Apr 2026 18:07:23 -0400
Subject: [PATCH v1] Prevent some buffer overruns in spell.c's parsing of affix
 files.

parse_affentry() and addCompoundAffixFlagValue() each collect fields
from an affix file into working buffers of size BUFSIZ.  They failed
to defend against overlength fields, so that a malicious affix file
could cause a stack smash.  BUFSIZ (typically 8K) is certainly way
longer than any reasonable affix field, but let's fix this while
we're closing holes in this area.

I chose to do this by silently truncating the input before it can
overrun the buffer, using logic comparable to the existing logic in
get_nextfield().  Certainly there's at least as good an argument for
raising an error, but for now let's follow the existing precedent.

Reported-by: Igor Stepansky <[email protected]>
Author: Tom Lane <[email protected]>
Backpatch-through: 14
---
 src/backend/tsearch/spell.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index a1bfd2a9f9b..dced5c444e0 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -909,14 +909,20 @@ parse_ooaffentry(char *str, char *type, char *flag, char *find,
  *
  * An .affix file entry has the following format:
  * <mask>  >  [-<find>,]<replace>
+ *
+ * Output buffers mask, find, repl must be of length BUFSIZ;
+ * we truncate the input to fit.
  */
 static bool
-parse_affentry(char *str, char *mask, char *find, char *repl)
+parse_affentry(const char *str, char *mask, char *find, char *repl)
 {
 	int			state = PAE_WAIT_MASK;
 	char	   *pmask = mask,
 			   *pfind = find,
 			   *prepl = repl;
+	char	   *emask = mask + BUFSIZ;
+	char	   *efind = find + BUFSIZ;
+	char	   *erepl = repl + BUFSIZ;
 
 	*mask = *find = *repl = '\0';
 
@@ -930,7 +936,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 				return false;
 			else if (!isspace((unsigned char) *str))
 			{
-				pmask += ts_copychar_with_len(pmask, str, clen);
+				if (pmask < emask - clen)
+					pmask += ts_copychar_with_len(pmask, str, clen);
 				state = PAE_INMASK;
 			}
 		}
@@ -943,7 +950,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 			}
 			else if (!isspace((unsigned char) *str))
 			{
-				pmask += ts_copychar_with_len(pmask, str, clen);
+				if (pmask < emask - clen)
+					pmask += ts_copychar_with_len(pmask, str, clen);
 			}
 		}
 		else if (state == PAE_WAIT_FIND)
@@ -954,7 +962,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 			}
 			else if (t_isalpha_cstr(str) || t_iseq(str, '\'') /* english 's */ )
 			{
-				prepl += ts_copychar_with_len(prepl, str, clen);
+				if (prepl < erepl - clen)
+					prepl += ts_copychar_with_len(prepl, str, clen);
 				state = PAE_INREPL;
 			}
 			else if (!isspace((unsigned char) *str))
@@ -971,7 +980,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 			}
 			else if (t_isalpha_cstr(str))
 			{
-				pfind += ts_copychar_with_len(pfind, str, clen);
+				if (pfind < efind - clen)
+					pfind += ts_copychar_with_len(pfind, str, clen);
 			}
 			else if (!isspace((unsigned char) *str))
 				ereport(ERROR,
@@ -986,7 +996,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 			}
 			else if (t_isalpha_cstr(str))
 			{
-				prepl += ts_copychar_with_len(prepl, str, clen);
+				if (prepl < erepl - clen)
+					prepl += ts_copychar_with_len(prepl, str, clen);
 				state = PAE_INREPL;
 			}
 			else if (!isspace((unsigned char) *str))
@@ -1003,7 +1014,8 @@ parse_affentry(char *str, char *mask, char *find, char *repl)
 			}
 			else if (t_isalpha_cstr(str))
 			{
-				prepl += ts_copychar_with_len(prepl, str, clen);
+				if (prepl < erepl - clen)
+					prepl += ts_copychar_with_len(prepl, str, clen);
 			}
 			else if (!isspace((unsigned char) *str))
 				ereport(ERROR,
@@ -1061,7 +1073,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry,
  * val: affix parameter.
  */
 static void
-addCompoundAffixFlagValue(IspellDict *Conf, char *s, uint32 val)
+addCompoundAffixFlagValue(IspellDict *Conf, const char *s, uint32 val)
 {
 	CompoundAffixFlag *newValue;
 	char		sbuf[BUFSIZ];
@@ -1079,9 +1091,11 @@ addCompoundAffixFlagValue(IspellDict *Conf, char *s, uint32 val)
 	sflag = sbuf;
 	while (*s && !isspace((unsigned char) *s) && *s != '\n')
 	{
-		int			clen = ts_copychar_cstr(sflag, s);
+		int			clen = pg_mblen_cstr(s);
 
-		sflag += clen;
+		/* Truncate the input to fit in BUFSIZ */
+		if (sflag < sbuf + BUFSIZ - clen)
+			sflag += ts_copychar_with_len(sflag, s, clen);
 		s += clen;
 	}
 	*sflag = '\0';
-- 
2.43.7



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Potential buffer overrun in spell.c's CheckAffix()
@ 2026-04-22 11:57  Andrey Borodin <[email protected]>
  parent: Tom Lane <[email protected]>
  1 sibling, 1 reply; 5+ messages in thread

From: Andrey Borodin @ 2026-04-22 11:57 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>



> On 21 Apr 2026, at 22:32, Tom Lane <[email protected]> wrote:
> 
> if (Affix->type == FF_SUFFIX)
> {
> + /* protect against buffer overrun */
> + if (len < Affix->replen || len >= 2 * MAXNORMLEN ||
> + len - Affix->replen + findlen >= 2 * MAXNORMLEN)
> + return NULL;
> +
> strcpy(newword, word);
> strcpy(newword + len - Affix->replen, Affix->find);
> if (baselen) /* store length of non-changed part of word */
> @@ -2112,11 +2139,16 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
> }
> else
> {
> + /* protect against buffer overrun */
> + if (len < Affix->replen ||
> + findlen + len - Affix->replen >= 2 * MAXNORMLEN)
> + return NULL;

Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”?
Both cases seem symmetrical and we could move it out of “if".


> On 22 Apr 2026, at 03:35, Tom Lane <[email protected]> wrote:
> 
> I chose to do this by silently truncating the input before it can
> overrun the buffer, using logic comparable to the existing logic in
> get_nextfield().  Certainly there's at least as good an argument for
> raising an error, but for now let's follow the existing precedent.

Is there a reason not to emit WARNING? The data is obviously suspicious…
Perhaps, there’s a reason, so maybe just document it then.

Both patches look good to me, AFAICT.


Best regards, Andrey Borodin.





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Potential buffer overrun in spell.c's CheckAffix()
@ 2026-04-22 13:44  Tom Lane <[email protected]>
  parent: Andrey Borodin <[email protected]>
  0 siblings, 1 reply; 5+ messages in thread

From: Tom Lane @ 2026-04-22 13:44 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>

Andrey Borodin <[email protected]> writes:
>> On 21 Apr 2026, at 22:32, Tom Lane <[email protected]> wrote:
>> + /* protect against buffer overrun */
>> + if (len < Affix->replen || len >= 2 * MAXNORMLEN ||
>> + len - Affix->replen + findlen >= 2 * MAXNORMLEN)
>> + return NULL;
>> +
>> strcpy(newword, word);
>> strcpy(newword + len - Affix->replen, Affix->find);

> Is there a reason for an asymmetric check "len >= 2 * MAXNORMLEN ||”?

Yes.  Because of that initial "strcpy(newword, word);", we do actually
need "word" to fit in the output buffer, even if the final output
string is shorter.  The other path does not require that.

I suppose we could replace the strcpy with

	memcpy(newword, word, len - Affix->replen);

and then we would not need the "len >= 2 * MAXNORMLEN" test
and both paths could share the same check.  There's something
to be said for that, though it would be changing the logic to
a greater extent than just "add some safety checks".

>> I chose to do this by silently truncating the input before it can
>> overrun the buffer, using logic comparable to the existing logic in
>> get_nextfield().  Certainly there's at least as good an argument for
>> raising an error, but for now let's follow the existing precedent.

> Is there a reason not to emit WARNING? The data is obviously suspicious…
> Perhaps, there’s a reason, so maybe just document it then.

I could agree with changing all of these cases (including the existing
get_nextfield checks) to throw ERROR; but I don't think that's
something to do in a back-patched bug fix.

Another thing I don't love, but wouldn't change in back branches,
is the use of BUFSIZ for the string lengths.  That's far more than
necessary (why not just MAXNORMLEN?), causing these functions to eat
much more stack space than they need.  Also it seems like an
unnecessary platform dependency.  Maybe BUFSIZ is 8K everywhere,
but I'm not sure of that.

			regards, tom lane

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Potential buffer overrun in spell.c's CheckAffix()
@ 2026-04-22 14:50  Tom Lane <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 5+ messages in thread

From: Tom Lane @ 2026-04-22 14:50 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: PostgreSQL mailing lists <[email protected]>

I wrote:
> I suppose we could replace the strcpy with
> 	memcpy(newword, word, len - Affix->replen);
> and then we would not need the "len >= 2 * MAXNORMLEN" test
> and both paths could share the same check.  There's something
> to be said for that, though it would be changing the logic to
> a greater extent than just "add some safety checks".

Concretely, about like this, where I also tried to make the actual
byte-copying steps a bit more uniform.

			regards, tom lane



Attachments:

  [text/x-diff] v2-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patch (4.0K, 2-v2-0001-Prevent-buffer-overrun-in-spell.c-s-CheckAffix.patch)
  download | inline diff:
From 844bb90d49f78c44c6ed395d245ff8a500b16395 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Wed, 22 Apr 2026 10:47:56 -0400
Subject: [PATCH v2] Prevent buffer overrun in spell.c's CheckAffix().

This function writes into a caller-supplied buffer of length
2 * MAXNORMLEN, which should be plenty in real-world cases.
However a malicious affix file could supply an affix long
enough to overrun that.  Defend by just rejecting the match
if it would overrun the buffer.  I also inserted a check of
the input word length against Affix->replen, just to be sure
we won't index off the buffer, though it would be caller error
for that not to be true.

Also make the actual copying steps a bit more readable, and remove
an unnecessary requirement for the whole input word to fit into the
output buffer (even though it always will with the current caller).

The lack of documentation in this code makes my head hurt, so
I also reverse-engineered a basic header comment for CheckAffix.

Reported-by: Xint Code
Author: Tom Lane <[email protected]>
Reviewed-by: Andrey Borodin <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 14
---
 src/backend/tsearch/spell.c | 47 ++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index a1bfd2a9f9b..abaa18c21c4 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -2065,9 +2065,32 @@ FindAffixes(AffixNode *node, const char *word, int wrdlen, int *level, int type)
 	return NULL;
 }
 
+/*
+ * Checks to see if affix applies to word, transforms word if so.
+ * The transformation consists of replacing Affix->replen leading or
+ * trailing bytes with the Affix->find string.
+ *
+ * word: input word
+ * len: length of input word
+ * Affix: affix to consider
+ * flagflags: context flags showing whether we are handling a compound word
+ * newword: output buffer (MUST be of length 2 * MAXNORMLEN)
+ * baselen: input/output argument
+ *
+ * If baselen isn't NULL, then *baselen is used to return the length of
+ * the non-changed part of the word when applying a suffix, and is used
+ * to detect whether the input contained only a prefix and suffix when
+ * later applying a prefix.
+ *
+ * Returns newword on success, or NULL if the affix can't be applied.
+ * On success, the modified word is stored into newword.
+ */
 static char *
 CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *newword, int *baselen)
 {
+	size_t		keeplen,
+				findlen;
+
 	/*
 	 * Check compound allow flags
 	 */
@@ -2100,15 +2123,27 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 				return NULL;
 	}
 
+	/*
+	 * Protect against output buffer overrun (len < Affix->replen would be
+	 * caller error, but check anyway)
+	 */
+	Assert(len == strlen(word));
+	if (len < Affix->replen)
+		return NULL;
+	keeplen = len - Affix->replen;	/* how much of word we will keep */
+	findlen = strlen(Affix->find);
+	if (keeplen + findlen >= 2 * MAXNORMLEN)
+		return NULL;
+
 	/*
 	 * make replace pattern of affix
 	 */
 	if (Affix->type == FF_SUFFIX)
 	{
-		strcpy(newword, word);
-		strcpy(newword + len - Affix->replen, Affix->find);
+		memcpy(newword, word, keeplen);
+		strcpy(newword + keeplen, Affix->find);
 		if (baselen)			/* store length of non-changed part of word */
-			*baselen = len - Affix->replen;
+			*baselen = keeplen;
 	}
 	else
 	{
@@ -2116,10 +2151,10 @@ CheckAffix(const char *word, size_t len, AFFIX *Affix, int flagflags, char *neww
 		 * if prefix is an all non-changed part's length then all word
 		 * contains only prefix and suffix, so out
 		 */
-		if (baselen && *baselen + strlen(Affix->find) <= Affix->replen)
+		if (baselen && *baselen + findlen <= Affix->replen)
 			return NULL;
-		strcpy(newword, Affix->find);
-		strcat(newword, word + Affix->replen);
+		memcpy(newword, Affix->find, findlen);
+		strcpy(newword + findlen, word + Affix->replen);
 	}
 
 	/*
-- 
2.43.7



^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2026-04-22 14:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-21 17:32 Potential buffer overrun in spell.c's CheckAffix() Tom Lane <[email protected]>
2026-04-21 22:35 ` Tom Lane <[email protected]>
2026-04-22 11:57 ` Andrey Borodin <[email protected]>
2026-04-22 13:44   ` Tom Lane <[email protected]>
2026-04-22 14:50     ` Tom Lane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox