Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUqAJ-001U6K-28 for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Jun 2026 18:18:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wUqAI-002ZjE-1N for pgsql-hackers@arkaria.postgresql.org; Wed, 03 Jun 2026 18:18:06 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUqAI-002Zj5-0R for pgsql-hackers@lists.postgresql.org; Wed, 03 Jun 2026 18:18:06 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wUqAF-00000000wv1-2ZOQ for pgsql-hackers@lists.postgresql.org; Wed, 03 Jun 2026 18:18:05 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.18.1/8.18.1) with ESMTP id 653IHrrW3096051; Wed, 3 Jun 2026 14:17:53 -0400 From: Tom Lane To: Ewan Young cc: PostgreSQL Hackers , Michael Paquier Subject: Re: Use ereport() instead of elog() for invalid weights in setweight() In-reply-to: References: Comments: In-reply-to Ewan Young message dated "Wed, 03 Jun 2026 23:39:05 +0800" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <3096030.1780510651.0@sss.pgh.pa.us> Date: Wed, 03 Jun 2026 14:17:53 -0400 Message-ID: <3096050.1780510673@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3096030.1780510651.1@sss.pgh.pa.us> Ewan Young writes: > I noticed that setweight() reports an internal error (SQLSTATE XX000) > when the weight argument is not one of A/a, B/b, C/c, D/d, even though > the weight comes directly from user input. The two-argument variant > also prints the weight as a raw ASCII code, which is a bit unfriendly: > =# SELECT setweight('cat:1'::tsvector, 'p'); > ERROR: unrecognized weight: 112 I agree that these ought to be ereport()s. However, I suspect that the reason for printing bogus weights numerically was to avoid the risk of generating encoding-incorrect strings if the given char value has its high bit set. The existing code in tsvector_filter is failing to consider that hazard. I experimented with making the error messages print non-ASCII characters differently, and soon decided that that added enough complexity that we shouldn't have three copies of it. So the attached proposed v2 also factors the code out into a new function parse_weight (maybe a different name would be better?). I'm unconvinced that we really need a regression test case for this ... regards, tom lane ------- =_aaaaaaaaaa0 Content-Type: text/x-diff; name*0="v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweig"; name*1="ht.patch"; charset="us-ascii" Content-ID: <3096030.1780510651.2@sss.pgh.pa.us> Content-Description: v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patch Content-Transfer-Encoding: quoted-printable diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/t= svector_op.c index d8dece42b9b..53a9541e89f 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -207,17 +207,10 @@ tsvector_length(PG_FUNCTION_ARGS) PG_RETURN_INT32(ret); } = -Datum -tsvector_setweight(PG_FUNCTION_ARGS) +static int +parse_weight(char cw) { - TSVector in =3D PG_GETARG_TSVECTOR(0); - char cw =3D PG_GETARG_CHAR(1); - TSVector out; - int i, - j; - WordEntry *entry; - WordEntryPos *p; - int w =3D 0; + int w; = switch (cw) { @@ -238,9 +231,32 @@ tsvector_setweight(PG_FUNCTION_ARGS) w =3D 0; break; default: - /* internal error */ - elog(ERROR, "unrecognized weight: %d", cw); + /* Avoid printing non-ASCII bytes, else we have encoding issues */ + if (cw >=3D ' ' && cw < 0x7f) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized weight: \"%c\"", cw))); + else /* use \ooo format, like charout() */ + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized weight: \"\\%03o\"", + (unsigned char) cw))); } + return w; +} + + +Datum +tsvector_setweight(PG_FUNCTION_ARGS) +{ + TSVector in =3D PG_GETARG_TSVECTOR(0); + char cw =3D PG_GETARG_CHAR(1); + TSVector out; + int i, + j; + WordEntry *entry; + WordEntryPos *p; + int w =3D parse_weight(cw); = out =3D (TSVector) palloc(VARSIZE(in)); memcpy(out, in, VARSIZE(in)); @@ -285,28 +301,7 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) Datum *dlexemes; bool *nulls; = - switch (char_weight) - { - case 'A': - case 'a': - weight =3D 3; - break; - case 'B': - case 'b': - weight =3D 2; - break; - case 'C': - case 'c': - weight =3D 1; - break; - case 'D': - case 'd': - weight =3D 0; - break; - default: - /* internal error */ - elog(ERROR, "unrecognized weight: %c", char_weight); - } + weight =3D parse_weight(char_weight); = tsout =3D (TSVector) palloc(VARSIZE(tsin)); memcpy(tsout, tsin, VARSIZE(tsin)); @@ -846,29 +841,7 @@ tsvector_filter(PG_FUNCTION_ARGS) errmsg("weight array may not contain nulls"))); = char_weight =3D DatumGetChar(dweights[i]); - switch (char_weight) - { - case 'A': - case 'a': - mask =3D mask | 8; - break; - case 'B': - case 'b': - mask =3D mask | 4; - break; - case 'C': - case 'c': - mask =3D mask | 2; - break; - case 'D': - case 'd': - mask =3D mask | 1; - break; - default: - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unrecognized weight: \"%c\"", char_weight))); - } + mask |=3D 1 << parse_weight(char_weight); } = tsout =3D (TSVector) palloc0(VARSIZE(tsin)); ------- =_aaaaaaaaaa0--