Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUxtH-001Zc6-2i for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Jun 2026 02:33:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUxtG-004Tfx-1x for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Jun 2026 02:33:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUxtG-004Tfn-13 for pgsql-hackers@lists.postgresql.org; Thu, 04 Jun 2026 02:33:02 +0000 Received: from mail-lf1-x135.google.com ([2a00:1450:4864:20::135]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wUxtE-000000010IV-20Bu for pgsql-hackers@lists.postgresql.org; Thu, 04 Jun 2026 02:33:01 +0000 Received: by mail-lf1-x135.google.com with SMTP id 2adb3069b0e04-5aa68cf9123so165835e87.0 for ; Wed, 03 Jun 2026 19:33:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1780540379; cv=none; d=google.com; s=arc-20240605; b=Y8y7Hro6rgmbqNZfh2dT/fUoZNXVJ8DCABR3yBJkDcqygaJmQYFfpXyer525G2wZcf OMT8gVkIktaP0v6fI+jAFewEuJOldzZUw97Q9geKWnP96waPgS6Iwp6yRKd73fqiYN35 tO4xVBeN83da9KNYgCJWLTL4hOqRY1lRi8SdJYKJaFv6/MiRJ9PSe82flHi51k4RbhcT 23Ylhrky674G8sGTf5sOlTOemxah9fwncFCwr/ARF2F2j/tkJFXp+O7jNT2530KAJL8N uls3PJRYyD5DA9UJFILDzAIEmTVPdqnSQYZi5Y2SGY8wU102Bj3wnqqZa7QKrX14yTc1 kLXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=fm7JpHi1b4GdjnDw8ehQZM/zHkQZGS5N6yOQPy3QOJU=; fh=6G5g16lsDi0X+hxvcD2bOlyUxIK+XbNYpIhqexjLe2M=; b=F8sUtsKrxLJQtJefI/0ILSBoUuZRcydI/hc61/hkpFTWDjNaaRS99oglmP5AvGvC16 kJg9gC0Qe0whoWqEGQ5M9SW4V00v/Nj2z6a/BXbTHcnWnxcYpuIxn+1GFVb/EO3WW+ag +nAyl0JY1D8F8sqR/OsGd95eTQluOEGSYsRobloCkYdsACQqX7EU9njobgsE3tYWAF9I TtSTFe1BJb9sOiLgIwspaOoe6JL7vuKuRU2sEL155cgK9Nm5FIDLO0OSFRutF5S02Hzv 0v5R1jd+ga1y45dLXzS1XTVNB5Z8pXEbWr1c8iaLydImxLoqcQBcJyG4hohL+cNDrrHI SStA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780540379; x=1781145179; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fm7JpHi1b4GdjnDw8ehQZM/zHkQZGS5N6yOQPy3QOJU=; b=HpZ7byVTtVKVVerD1FeQLhbCxMUn8/fwEKktwPjxpx+mOwMSYweDOtfVltiHlw1UPV xDFx4lY3Gc5ikcmuQ5JPqaoWUr8LNibc8tL4WumZTG8c8aij2EKMyeZKWk+5bNiYWFmK RgHzS6IFJ29j0y/rCejhClsxbdWmNCP9dUI9INUhZCjeUQLHRt8k6wWHkBmMq37IkBvd ovWH0DeSSfazzNGSWnM1xi5QGTMBbyDcXBdnepSlBt65Sf0HjOTY7aB4fN3uOC+oq2Lb 8fBg53rw5rFEJQGf4jynhkcAXC8+ySK3LieDWFzsZwIwRE5AwkrT46VHOWa/RPVNHHZ+ EyKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780540379; x=1781145179; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fm7JpHi1b4GdjnDw8ehQZM/zHkQZGS5N6yOQPy3QOJU=; b=HNBD7YoxMYduM9gn3J4oaW5gfP9OHllaJYLMcBWtUeH/c+6MoERNaEOi5/tY0O+BzB ThZeO4drqpRWntAg2HUiIElT5R8wg/BlsHKYEExMq3tWSvr9YEfZmlhd0rL8ja0Stj6g H1h1rCAe/6Nnjqkl/ckLexS+T7gWFoXmD6ahLFi7xsIBfCtT72aR3ZGjYfYcxd5HB5Sc 3bGSXOQsBJAFg1hRX/KYDYIuT9TushrffLBNo1TZIHhzMBk38HgvFdz0sswf2YoIbUsh Tz578TCci8+IU6fxPtxL/GI2BwKPSUj8p+wrQYfj02SQHBfSYSeKyqz0yuhovRFFNoON N+RA== X-Gm-Message-State: AOJu0Ywa/F2alFLdti8crvTKF+O0RVLgsUjcZgCRJXBABpNz9FdDMS/q jq8TGU79SnQlH4PjtLzchA5LOJtHRIsYz0HXN/MgO+KD2kCqyom+iB2WsBA8jKkvqbMK5De6S9H w6ULp4oMkWvGAxx1VmsOHlt81vviyC6Ge+X0XrwLWih9WmZU= X-Gm-Gg: Acq92OHRJf2Ha3EJUx/K68GgLzxeJwX3occdq7bWv2vs8ll2PMCXE5FjU/Pr23W9nBn bT851+0apc2gT1ehQz9a8y0EIwGIWbkwjhYqQjynbilFivWB4EzhSA0S5d95WJl5ePYQJA0EDBu 1AhhErz6e/939Kdiav5idjc/xmyLY6hsJbUo9bGebKszWfZVDTj6d7cEQzETcNkcpqcqhNEhXII cRwSK0YynmldfNvV9lyNytvnfNll7+BryHNbjp++SWQ35K+pqYqNvB3xcJSp1Wib8J7ZB6j73Z/ s3N57sYoaLdZ6Gro9zdFUvWJxaW6l8TrYVVCeHtrhyDN3BPIe52J X-Received: by 2002:a05:6512:61c2:10b0:5a8:88f8:9ed4 with SMTP id 2adb3069b0e04-5aa7c0c0773mr1347879e87.30.1780540378537; Wed, 03 Jun 2026 19:32:58 -0700 (PDT) MIME-Version: 1.0 References: <3096050.1780510673@sss.pgh.pa.us> In-Reply-To: <3096050.1780510673@sss.pgh.pa.us> From: Ewan Young Date: Thu, 4 Jun 2026 10:32:45 +0800 X-Gm-Features: AVHnY4KnGwq8AVIW-nOaSdPIE84OkMA-WUOJ2GYaCnZ1dXin1kwQrmXJBk2_Dnc Message-ID: Subject: Re: Use ereport() instead of elog() for invalid weights in setweight() To: Tom Lane Cc: PostgreSQL Hackers , Michael Paquier Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Jun 4, 2026 at 2:17=E2=80=AFAM Tom Lane wrote: > > Ewan Young writes: > > I noticed that setweight() reports an internal error (SQLSTATE XX000) > > when the weight argument is not one of A/a, B/b, C/c, D/d, even though > > the weight comes directly from user input. The two-argument variant > > also prints the weight as a raw ASCII code, which is a bit unfriendly: > > > =3D# SELECT setweight('cat:1'::tsvector, 'p'); > > ERROR: unrecognized weight: 112 > > I agree that these ought to be ereport()s. However, I suspect that > the reason for printing bogus weights numerically was to avoid the > risk of generating encoding-incorrect strings if the given char > value has its high bit set. The existing code in tsvector_filter > is failing to consider that hazard. Ah, I hadn't considered that. You're right: in a multibyte encoding the bogus byte could well be a fragment of a multibyte character, so printing it with %c would inject an invalidly-encoded byte into the error message. The style used in v2 (matching charout()) keeps the message pure ASCII, which seems clearly safer. > > I experimented with making the error messages print non-ASCII > characters differently, and soon decided that that added enough > complexity that we shouldn't have three copies of it. So the > attached proposed v2 also factors the code out into a new > function parse_weight (maybe a different name would be better?). Factoring it out looks like a clear improvement. parse_weight reads fine to me; I don't think it's worth bikeshedding. I tested v2 on top of current master: - applies cleanly, builds without new warnings - core regression suite passes - manually exercised the error paths, and it works: =3D# \set VERBOSITY verbose =3D# SELECT setweight('cat:1'::tsvector, 'p'); ERROR: 22023: unrecognized weight: "p" LOCATION: parse_weight, tsvector_op.c:236 =3D# SELECT setweight('cat:1'::tsvector, '\200'); ERROR: 22023: unrecognized weight: "\200" LOCATION: parse_weight, tsvector_op.c:240 > > I'm unconvinced that we really need a regression test case for > this ... Agreed, no objection to dropping it. The behavior worth checking is the message formatting, which is easy enough to verify by hand. > > regards, tom lane > So v2 looks good to me. Thanks for improving the patch! Best regards, Ewan Young