public inbox for [email protected]
help / color / mirror / Atom feedUse ereport() instead of elog() for invalid weights in setweight()
4+ messages / 2 participants
[nested] [flat]
* Use ereport() instead of elog() for invalid weights in setweight()
@ 2026-06-03 15:39 Ewan Young <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Ewan Young @ 2026-06-03 15:39 UTC (permalink / raw)
To: PostgreSQL Hackers <[email protected]>; +Cc: Michael Paquier <[email protected]>
Hi hackers,
I noticed that setweight() reports an internal error (SQLSTATE XX000)
when the weight argument is not one of A/a, B/b, C/c, D/d, even though
the weight comes directly from user input. The two-argument variant
also prints the weight as a raw ASCII code, which is a bit unfriendly:
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: unrecognized weight: 112
ts_filter() in the same file (tsvector_op.c) already handles the
equivalent case with ereport() and ERRCODE_INVALID_PARAMETER_VALUE,
so the attached patch simply makes tsvector_setweight() and
tsvector_setweight_by_filter() do the same, and adds regression tests
covering the three error paths (none of which were covered before).
This seems to be in the same spirit as the earlier cleanup of
user-reachable internal error codes [1]; these two sites appear to
have been missed there.
The patch is against the master and passes make check. Please let me know
if I've missed anything -- I'd be happy to revise.
[1] https://postgr.es/m/[email protected]
Best regards,
Ewan Young
Attachments:
[application/octet-stream] v1-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patch (4.1K, 2-v1-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patch)
download | inline diff:
From 71f205c364404d2db244b2c6de4cc3b047fe08c8 Mon Sep 17 00:00:00 2001
From: Ewan Young <[email protected]>
Date: Thu, 4 Jun 2026 07:25:03 +0800
Subject: [PATCH] Use ereport(), not elog(), for invalid weights in
setweight().
tsvector_setweight() and tsvector_setweight_by_filter() raised an
internal error (elog, SQLSTATE XX000) when the weight argument was
not one of A/a, B/b, C/c, D/d, even though the weight comes directly
from user input. Worse, the two-argument variant printed the weight
as a raw ASCII code:
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: unrecognized weight: 112
Convert both to ereport() with ERRCODE_INVALID_PARAMETER_VALUE,
matching what tsvector_filter() in the same file already does, and
add regression tests covering all three error paths.
---
src/backend/utils/adt/tsvector_op.c | 10 ++++++----
src/test/regress/expected/tstypes.out | 7 +++++++
src/test/regress/sql/tstypes.sql | 4 ++++
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index d8dece42b9b..9a83658b601 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -238,8 +238,9 @@ tsvector_setweight(PG_FUNCTION_ARGS)
w = 0;
break;
default:
- /* internal error */
- elog(ERROR, "unrecognized weight: %d", cw);
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized weight: \"%c\"", cw)));
}
out = (TSVector) palloc(VARSIZE(in));
@@ -304,8 +305,9 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
weight = 0;
break;
default:
- /* internal error */
- elog(ERROR, "unrecognized weight: %c", char_weight);
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized weight: \"%c\"", char_weight)));
}
tsout = (TSVector) palloc(VARSIZE(tsin));
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 4cfc3b9dc04..a9c6465021f 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -1428,6 +1428,11 @@ SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', '',
'a' 'asd' 'w':5,6,12B,13A 'zxc'
(1 row)
+-- invalid weights are disallowed
+SELECT setweight('w:5A'::tsvector, 'x');
+ERROR: unrecognized weight: "x"
+SELECT setweight('w:5A'::tsvector, 'x', '{w}');
+ERROR: unrecognized weight: "x"
SELECT ts_filter('base:7A empir:17 evil:15 first:11 galact:16 hidden:6A rebel:1A spaceship:2A strike:3A victori:12 won:9'::tsvector, '{a}');
ts_filter
-------------------------------------------------------------
@@ -1442,3 +1447,5 @@ SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{a}');
SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{a,b,NULL}');
ERROR: weight array may not contain nulls
+SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{x}');
+ERROR: unrecognized weight: "x"
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index dfb7a1466e0..4261abed670 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -275,7 +275,11 @@ SELECT setweight('a:1,3A asd:1C w:5,6,12B,13A zxc:81,222A,567'::tsvector, 'c', '
SELECT setweight('a:1,3A asd:1C w:5,6,12B,13A zxc:81,222A,567'::tsvector, 'c', '{a}');
SELECT setweight('a:1,3A asd:1C w:5,6,12B,13A zxc:81,222A,567'::tsvector, 'c', '{a,zxc}');
SELECT setweight('a asd w:5,6,12B,13A zxc'::tsvector, 'c', ARRAY['a', 'zxc', '', NULL]);
+-- invalid weights are disallowed
+SELECT setweight('w:5A'::tsvector, 'x');
+SELECT setweight('w:5A'::tsvector, 'x', '{w}');
SELECT ts_filter('base:7A empir:17 evil:15 first:11 galact:16 hidden:6A rebel:1A spaceship:2A strike:3A victori:12 won:9'::tsvector, '{a}');
SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{a}');
SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{a,b,NULL}');
+SELECT ts_filter('base hidden rebel spaceship strike'::tsvector, '{x}');
--
2.47.3
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Use ereport() instead of elog() for invalid weights in setweight()
@ 2026-06-03 18:17 Tom Lane <[email protected]>
parent: Ewan Young <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Tom Lane @ 2026-06-03 18:17 UTC (permalink / raw)
To: Ewan Young <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Michael Paquier <[email protected]>
Ewan Young <[email protected]> writes:
> I noticed that setweight() reports an internal error (SQLSTATE XX000)
> when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> the weight comes directly from user input. The two-argument variant
> also prints the weight as a raw ASCII code, which is a bit unfriendly:
> =# SELECT setweight('cat:1'::tsvector, 'p');
> ERROR: unrecognized weight: 112
I agree that these ought to be ereport()s. However, I suspect that
the reason for printing bogus weights numerically was to avoid the
risk of generating encoding-incorrect strings if the given char
value has its high bit set. The existing code in tsvector_filter
is failing to consider that hazard.
I experimented with making the error messages print non-ASCII
characters differently, and soon decided that that added enough
complexity that we shouldn't have three copies of it. So the
attached proposed v2 also factors the code out into a new
function parse_weight (maybe a different name would be better?).
I'm unconvinced that we really need a regression test case for
this ...
regards, tom lane
Attachments:
[text/x-diff] v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patch (2.7K, 2-v2-0001-Use-ereport-not-elog-for-invalid-weights-in-setweight.patch)
download | inline diff:
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index d8dece42b9b..53a9541e89f 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -207,17 +207,10 @@ tsvector_length(PG_FUNCTION_ARGS)
PG_RETURN_INT32(ret);
}
-Datum
-tsvector_setweight(PG_FUNCTION_ARGS)
+static int
+parse_weight(char cw)
{
- TSVector in = PG_GETARG_TSVECTOR(0);
- char cw = PG_GETARG_CHAR(1);
- TSVector out;
- int i,
- j;
- WordEntry *entry;
- WordEntryPos *p;
- int w = 0;
+ int w;
switch (cw)
{
@@ -238,9 +231,32 @@ tsvector_setweight(PG_FUNCTION_ARGS)
w = 0;
break;
default:
- /* internal error */
- elog(ERROR, "unrecognized weight: %d", cw);
+ /* Avoid printing non-ASCII bytes, else we have encoding issues */
+ if (cw >= ' ' && cw < 0x7f)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized weight: \"%c\"", cw)));
+ else /* use \ooo format, like charout() */
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized weight: \"\\%03o\"",
+ (unsigned char) cw)));
}
+ return w;
+}
+
+
+Datum
+tsvector_setweight(PG_FUNCTION_ARGS)
+{
+ TSVector in = PG_GETARG_TSVECTOR(0);
+ char cw = PG_GETARG_CHAR(1);
+ TSVector out;
+ int i,
+ j;
+ WordEntry *entry;
+ WordEntryPos *p;
+ int w = parse_weight(cw);
out = (TSVector) palloc(VARSIZE(in));
memcpy(out, in, VARSIZE(in));
@@ -285,28 +301,7 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS)
Datum *dlexemes;
bool *nulls;
- switch (char_weight)
- {
- case 'A':
- case 'a':
- weight = 3;
- break;
- case 'B':
- case 'b':
- weight = 2;
- break;
- case 'C':
- case 'c':
- weight = 1;
- break;
- case 'D':
- case 'd':
- weight = 0;
- break;
- default:
- /* internal error */
- elog(ERROR, "unrecognized weight: %c", char_weight);
- }
+ weight = parse_weight(char_weight);
tsout = (TSVector) palloc(VARSIZE(tsin));
memcpy(tsout, tsin, VARSIZE(tsin));
@@ -846,29 +841,7 @@ tsvector_filter(PG_FUNCTION_ARGS)
errmsg("weight array may not contain nulls")));
char_weight = DatumGetChar(dweights[i]);
- switch (char_weight)
- {
- case 'A':
- case 'a':
- mask = mask | 8;
- break;
- case 'B':
- case 'b':
- mask = mask | 4;
- break;
- case 'C':
- case 'c':
- mask = mask | 2;
- break;
- case 'D':
- case 'd':
- mask = mask | 1;
- break;
- default:
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized weight: \"%c\"", char_weight)));
- }
+ mask |= 1 << parse_weight(char_weight);
}
tsout = (TSVector) palloc0(VARSIZE(tsin));
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Use ereport() instead of elog() for invalid weights in setweight()
@ 2026-06-04 02:32 Ewan Young <[email protected]>
parent: Tom Lane <[email protected]>
0 siblings, 1 reply; 4+ messages in thread
From: Ewan Young @ 2026-06-04 02:32 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Michael Paquier <[email protected]>
On Thu, Jun 4, 2026 at 2:17 AM Tom Lane <[email protected]> wrote:
>
> Ewan Young <[email protected]> writes:
> > I noticed that setweight() reports an internal error (SQLSTATE XX000)
> > when the weight argument is not one of A/a, B/b, C/c, D/d, even though
> > the weight comes directly from user input. The two-argument variant
> > also prints the weight as a raw ASCII code, which is a bit unfriendly:
>
> > =# SELECT setweight('cat:1'::tsvector, 'p');
> > ERROR: unrecognized weight: 112
>
> I agree that these ought to be ereport()s. However, I suspect that
> the reason for printing bogus weights numerically was to avoid the
> risk of generating encoding-incorrect strings if the given char
> value has its high bit set. The existing code in tsvector_filter
> is failing to consider that hazard.
Ah, I hadn't considered that. You're right: in a multibyte encoding
the bogus byte could well be a fragment of a multibyte character, so
printing it with %c would inject an invalidly-encoded byte into the
error message. The style used in v2 (matching charout()) keeps
the message pure ASCII, which seems clearly safer.
>
> I experimented with making the error messages print non-ASCII
> characters differently, and soon decided that that added enough
> complexity that we shouldn't have three copies of it. So the
> attached proposed v2 also factors the code out into a new
> function parse_weight (maybe a different name would be better?).
Factoring it out looks like a clear improvement. parse_weight reads
fine to me; I don't think it's worth bikeshedding.
I tested v2 on top of current master:
- applies cleanly, builds without new warnings
- core regression suite passes
- manually exercised the error paths, and it works:
=# \set VERBOSITY verbose
=# SELECT setweight('cat:1'::tsvector, 'p');
ERROR: 22023: unrecognized weight: "p"
LOCATION: parse_weight, tsvector_op.c:236
=# SELECT setweight('cat:1'::tsvector, '\200');
ERROR: 22023: unrecognized weight: "\200"
LOCATION: parse_weight, tsvector_op.c:240
>
> I'm unconvinced that we really need a regression test case for
> this ...
Agreed, no objection to dropping it. The behavior worth checking is
the message formatting, which is easy enough to verify by hand.
>
> regards, tom lane
>
So v2 looks good to me. Thanks for improving the patch!
Best regards,
Ewan Young
^ permalink raw reply [nested|flat] 4+ messages in thread
* Re: Use ereport() instead of elog() for invalid weights in setweight()
@ 2026-06-04 16:25 Tom Lane <[email protected]>
parent: Ewan Young <[email protected]>
0 siblings, 0 replies; 4+ messages in thread
From: Tom Lane @ 2026-06-04 16:25 UTC (permalink / raw)
To: Ewan Young <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Michael Paquier <[email protected]>
Ewan Young <[email protected]> writes:
> So v2 looks good to me. Thanks for improving the patch!
Sounds good, pushed.
regards, tom lane
^ permalink raw reply [nested|flat] 4+ messages in thread
end of thread, other threads:[~2026-06-04 16:25 UTC | newest]
Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-03 15:39 Use ereport() instead of elog() for invalid weights in setweight() Ewan Young <[email protected]>
2026-06-03 18:17 ` Tom Lane <[email protected]>
2026-06-04 02:32 ` Ewan Young <[email protected]>
2026-06-04 16:25 ` Tom Lane <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox