Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVbQe-0024ef-0M for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 20:46:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVbQc-00EgI5-2r for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 20:46:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVbQc-00EgHw-1g for pgsql-hackers@lists.postgresql.org; Fri, 05 Jun 2026 20:46:06 +0000 Received: from fhigh-b6-smtp.messagingengine.com ([202.12.124.157]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wVbQZ-00000001UNf-0WkW for pgsql-hackers@postgresql.org; Fri, 05 Jun 2026 20:46:06 +0000 Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailfhigh.stl.internal (Postfix) with ESMTP id 599187A00C8; Fri, 5 Jun 2026 16:46:01 -0400 (EDT) Received: from phl-imap-15 ([10.202.2.104]) by phl-compute-05.internal (MEProxy); Fri, 05 Jun 2026 16:46:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=partin.io; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1780692361; x=1780778761; bh=C3e/W5mBpLvpEgwayU8PvETRVpl6Rq0ZmfSvhLe9pOo=; b= rjMEpQAuNqLD564SO/i8aXlsAEk10FLZn2vVuGffUGTF3Cjui49wY8apOBAHd3P7 4kqtCBgkTPngLrsD7WheKhruiL94enCYatgyuVyUdDZ2h+Kq8dE1hbW7nCM7UQpl +ejNfwrHVdjkfoOUr04uU0xPuzdl5q0Y4yMh1uR2+dkHr3+FUvC8cdTqRdlkEIqj s1JgVNavK/omsBp6L50djQM291LD7iTTGch6UVzFEIlUZKBXzFbLafEuT7F/Hnp3 K9sm5X7dxJ2eHGBWShFmsZmvHtrudvKnxiIa6qkrEQLt7khGudMcON9mYSfUmv3S /2hKEkKG8QiE5tBxBT6VcA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1780692361; x= 1780778761; bh=C3e/W5mBpLvpEgwayU8PvETRVpl6Rq0ZmfSvhLe9pOo=; b=d TGxO1hT3jtqudyflHB98OqIo0be+QLWWZNG+jvJ+xeC3bsTzQsQrx/qJqodq+aaB 8mGYOH1jT/rKLHh0GNq44TjFpHz3K3A4M2X7WYIWAXjzveEuF14sn8J6lek2Y29+ G8kpVfARUUs1fcOLg5NzGgFLXZNeQ+Im6XBS70JR5K/M/HkBLpmrOf8SBJcywDC+ i/YBpoabRERYXuitwEYrtjWR0FexR1FAw08hgbfDPrPe7lZ0UnFjTqGYtvuW3Wpg qtXRhRRcMx7sgBX95mFdEB7vSosWuFGoIvT0liOniintrdLzzu71B/uURZkZxVfW E0Trg4x8JAQyTYIChMbww== X-ME-Sender: X-ME-Proxy-Cause: dmFkZTFiQ5hnBTPDMHeFWdyyum01S7C4Fr0k1Cfdc/B6rqYX87GqBqiKyU8+fH5sL+D9wv B3+h3KVegVmRrHHIT4LXEFXLuVKPjrT7vFvW6gOeZnz9vMTM94BDte1oceHI1wke0Y9Qj1 3gc+8yJaZ63lvGR3E/GWYEHZYJ3YrwZj1f32imfOk/c1NbZKfm3fgnaVQj1/8/bQ5QI8yf NMCUp9sdIWuF1WF+LUCVb6FmM+K6JiArNeN0Yexyr/zHdR6c/nTnGXL9/XE5/zQci9qi7L LeoBx5ChtzX9J/zPE7emoE1jmxclXgJtzaO64RrYTSn2typ365zKIgFiW1xTa+yxNq6UUC DIvgBR0gAlP3V40rMGsH/V5zrkdB0DCicni2pp2ili/2wnP7m7Uwj+qRUNA5PfEMnnOlSb 37/0Q66pWr1fha90HgwUi0oIZv/v9wDW9TTMTviN9mG8SUHx48DMhfq31F8vWkwDuBKoeb Tj55oMjKKsYWSZaGh7Vd4POFLd5KU9ZcrdOfQa5F1MF0RNjorzyBSik1HodEiDLObKTIyD PUJpHQGqR9TFO+orZRN1dZEr2IhVXaYpT3x75PTizlPAgnMh0t9gLE11PNxEQHF1mc7mEw vpMQUHXwwCCeSRwzQa2mwDjpKlAKjXiOcBvw1Sb09z+3a5d3448DQT8UCQRw X-ME-Proxy: Feedback-ID: idd01497b:Fastmail Received: by mailuser.phl.internal (Postfix, from userid 501) id D5605780075; Fri, 5 Jun 2026 16:46:00 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Fri, 05 Jun 2026 20:46:00 +0000 Message-Id: Cc: "pgsql-hackers" Subject: Re: dict_synonym.c: fix truncation of multibyte sequence To: "Jeff Davis" From: "Tristan Partin" X-Mailer: aerc 0.21.0 References: <1101e1a3afbbabb503317069c40374b82e6f4cac.camel@j-davis.com> <8cf296c265a367e08bf221781c4ba6c3f3726fda.camel@j-davis.com> In-Reply-To: <8cf296c265a367e08bf221781c4ba6c3f3726fda.camel@j-davis.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri Jun 5, 2026 at 5:37 PM UTC, Jeff Davis wrote: > On Fri, 2026-06-05 at 15:57 +0000, Tristan Partin wrote: >> > In any case, the input comes from a trusted >> > source (dictionary configuration), so it's not very serious. >>=20 >> The fix looks and sounds good. Do we have any way to test this, so it >> doesn't regress in the future? > > -- =C8=BA is 2 bytes, '=E2=B1=A5' is 3 bytes > $ echo "foo bar=C8=BA" > /path/to/postgres/share/tsearch_data/mbtest.sy= n > > CREATE TEXT SEARCH DICTIONARY mb_syn ( > TEMPLATE =3D synonym, > SYNONYMS =3D mbtest); > > SELECT ts_lexize('mb_syn', 'foo'); > > =3D# SELECT ts_lexize('mb_syn', 'foo'); -- before patch > ts_lexize=20 > ----------- > {bar} > (1 row) > > =3D# SELECT ts_lexize('mb_syn', 'foo'); -- after patch > ts_lexize=20 > ----------- > {bar=E2=B1=A5} > (1 row) > > It requires a specially-crafted synonym file, and I'm not sure it's > worth much effort to add a test for this specific path. If we see > similar bugs, it's more likely to be somewhere else that makes the same > faulty assumption. > > If you do think we should add tests, we should probably add a set of > dictionary-related files (.syn, .dict, .ths, etc.) that contain a > variety of adversarial Unicode cases. > > I'd be inclined to just commit this fix for now. It needs backpatching, > and I don't think we want to backpatch a large set of tests with it. I would say proceed as you see fit. I guess I am generally of the=20 opinion that additional testing is generally always better, but I don't=20 want to push for something if others don't see the same value. I'd be=20 happy to provide a patch for the test in a subsequent discussion if that=20 is a good middle ground? --=20 Tristan Partin PostgreSQL Contributors Team AWS (https://aws.amazon.com)