Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVYTn-0022iS-36 for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 17:37:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVYTm-00DRcW-0V for pgsql-hackers@arkaria.postgresql.org; Fri, 05 Jun 2026 17:37:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVYTl-00DRbs-2L for pgsql-hackers@lists.postgresql.org; Fri, 05 Jun 2026 17:37:09 +0000 Received: from mail-dy1-x1332.google.com ([2607:f8b0:4864:20::1332]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wVYTj-00000001Sjh-0rSi for pgsql-hackers@postgresql.org; Fri, 05 Jun 2026 17:37:09 +0000 Received: by mail-dy1-x1332.google.com with SMTP id 5a478bee46e88-30749947917so4449503eec.1 for ; Fri, 05 Jun 2026 10:37:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=j-davis-com.20251104.gappssmtp.com; s=20251104; t=1780681025; x=1781285825; darn=postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=31ydpKNwHGMnTfXkUPG8Ls5W1/NaxQGmPr+SGeOrk0A=; b=BiPY10v5zPnQ2JuMMJr7mgVBn2WKs89UTUWUe6Rv+uS9LEbnX1JMQqP0tEvXLSd4og jrqkfj3LYauh2T9Ed0on0i1UWi1Kd9Z3gPipF6RVrxcFlAd3vzghwsbB+wAokO7yhq1X 3WUtP+32zH0/eowI6dunbFZyaEEiNNWZFeoN7aTyOFZy4umPxfq6O4UZsIMFEmSSSYOi UofyT/zP5VsqxTZZPUoLcoQRAz/EFl2GlRiz6B07gE76jzH+/UH3OZ6cHoQZa4Ar8Bmn CZ5k7OxdhQuqB8c0G3bE3IB4K8zP6JisGs9ebr21YdVP/d0Ie6kar6ihFk+VAsLvEjNx +p+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780681025; x=1781285825; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=31ydpKNwHGMnTfXkUPG8Ls5W1/NaxQGmPr+SGeOrk0A=; b=JzB702go75kToUpn0DgKMAH/NqbnoBL1KvHz0upsDn5kvpQA2pB/BmL20dZWXsSIc3 a1tSbWhNqIdQSC1Z68hOfmSG1gxfRpQBqPnXHVz9FtbUTre9v06PO9iplRJUkgx/AqIz Co0L4e2DGj9PDVDC3kR8mtw9ZaZXMSBEhdpWVpjoiKrlwrvltrM6CjbJ+BEQgYnfopT5 CgItOd7hQeVGFU0rVGsVCAU9Eh5i97yBB+aDCf1p/Nh4EnX3ldmLyKsQplkQKFeqRHKw SJY5bVDYPXjnB0OCJdH5QSPMJaTHs2lAxAqJ3qF3PkkymKph63aWHxme2s3DudipROGO iBvw== X-Gm-Message-State: AOJu0YzRgES5UcudpD7PqhIJ0AOnW/g/u2u6pCai9d1QQ7anTAmfM6r/ cnJy03hfW4cLS1sTkd56QchanjcBJq/8Lhqlxwkt9XsrGHU4uWfFgQekG1dMxlvf+g== X-Gm-Gg: Acq92OHNqrs9bxI9a44NFvgeWqJDGxCIdiYIGDdxp4O5S/6pn4NdOAuVu2E9bWlSMIo zmudnzWeVy3l5A7T5ASn3ouUfJBqol0XXdCkMDmhyfQK7TjK1hYaoHHZyQ5f3yy5mnSiJRc/X8t PuzGUPP8/Y8vzFkxIp9uCVrDKlLBs6BRy+niKA/Aw+gh9u1Ew85lOsVPKaRkfRQ+cKiLQoGtiMG kI6yzRgOKzorae7qRx7BSaPArBRL8/FWFjuUO0SEpRWbliwV+oIHZ2k40FrLirYeLK55wA7rZ/A pSKExFUK6VPy9BUqCliF1MoK89DJi1gDnwuC7J2iE8VxyM+h+dSMh8aqtXUnBTNs7W7njT/R8ts 0+qcgdnhM8BGTpI3TADeUKBoPMKRCxwUCIC2AY1j4R6snSc+ulbZ9XYd+MRTH1mBAtxaba8ST1e pcW6cxHeSGaeDotBOCwoE4urAjEvrykEbXJ3SH61rGAcVYBfNgt6dFhRnAyNF9QJqNs+NnfpypH EPoNA== X-Received: by 2002:a05:7300:2146:b0:304:3c33:7ad4 with SMTP id 5a478bee46e88-3077b400576mr2245336eec.13.1780681024580; Fri, 05 Jun 2026 10:37:04 -0700 (PDT) Received: from jeff-ws-bridge.lan (c-24-7-19-3.hsd1.ca.comcast.net. [24.7.19.3]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-3074dfa3a2asm11335478eec.31.2026.06.05.10.37.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jun 2026 10:37:04 -0700 (PDT) Message-ID: <8cf296c265a367e08bf221781c4ba6c3f3726fda.camel@j-davis.com> Subject: Re: dict_synonym.c: fix truncation of multibyte sequence From: Jeff Davis To: Tristan Partin Cc: pgsql-hackers Date: Fri, 05 Jun 2026 10:37:03 -0700 In-Reply-To: References: <1101e1a3afbbabb503317069c40374b82e6f4cac.camel@j-davis.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.3-0ubuntu1.1 MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 2026-06-05 at 15:57 +0000, Tristan Partin wrote: > > In any case, the input comes from a trusted > > source (dictionary configuration), so it's not very serious. >=20 > The fix looks and sounds good. Do we have any way to test this, so it > doesn't regress in the future? -- =C8=BA is 2 bytes, '=E2=B1=A5' is 3 bytes $ echo "foo bar=C8=BA" > /path/to/postgres/share/tsearch_data/mbtest.syn CREATE TEXT SEARCH DICTIONARY mb_syn ( TEMPLATE =3D synonym, SYNONYMS =3D mbtest); SELECT ts_lexize('mb_syn', 'foo'); =3D# SELECT ts_lexize('mb_syn', 'foo'); -- before patch ts_lexize=20 ----------- {bar} (1 row) =3D# SELECT ts_lexize('mb_syn', 'foo'); -- after patch ts_lexize=20 ----------- {bar=E2=B1=A5} (1 row) It requires a specially-crafted synonym file, and I'm not sure it's worth much effort to add a test for this specific path. If we see similar bugs, it's more likely to be somewhere else that makes the same faulty assumption. If you do think we should add tests, we should probably add a set of dictionary-related files (.syn, .dict, .ths, etc.) that contain a variety of adversarial Unicode cases. I'd be inclined to just commit this fix for now. It needs backpatching, and I don't think we want to backpatch a large set of tests with it. Regards, Jeff Davis