public inbox for [email protected]help / color / mirror / Atom feed
Overcoming Initcap Function limitations? 7+ messages / 4 participants [nested] [flat]
* Overcoming Initcap Function limitations? @ 2023-12-02 19:03 Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Bo Guo @ 2023-12-02 19:03 UTC (permalink / raw) To: [email protected] Hi there, The Initcap function does not have the option to support the title casing where articles/prepositions are left in lowercase. Neither does it handle cultural names such as "MacDonald." Please kindly advise any options we may have. Cheers, Bo ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> @ 2023-12-04 18:08 ` Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Greg Sabino Mullane @ 2023-12-04 18:08 UTC (permalink / raw) To: Bo Guo <[email protected]>; +Cc: [email protected] It's not clear exactly what you are trying to achieve, but you can use Postgres' built-in text searching system to exclude stopwords. For example: CREATE FUNCTION initcap_realword(myword TEXT) returns TEXT language SQL AS $$ SELECT CASE WHEN length(to_tsvector(myword)) < 1 THEN myword ELSE initcap(myword) END; $$; You could extend that to multi-word strings with a little effort. However, knowing that macdonald should be MacDonald requires a lot more intelligence than is provided by any Postgres built-in system or extension that I know of. What you are looking at is the field of science known as Natural Language Processing, which can get very complex very quickly. But for a Postgres answer, you might combine plpython3u with spacy ( https://spacy.io/usage/spacy-101). Cheers, Greg ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> @ 2023-12-04 18:39 ` Steve Midgley <[email protected]> 2023-12-05 01:39 ` Re: Overcoming Initcap Function limitations? Bo Guo <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Steve Midgley @ 2023-12-04 18:39 UTC (permalink / raw) To: Greg Sabino Mullane <[email protected]>; +Cc: Bo Guo <[email protected]>; [email protected] On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <[email protected]> wrote: > It's not clear exactly what you are trying to achieve, but you can use > Postgres' built-in text searching system to exclude stopwords. For example: > > CREATE FUNCTION initcap_realword(myword TEXT) > returns TEXT language SQL AS > $$ > SELECT CASE WHEN length(to_tsvector(myword)) < 1 > THEN myword ELSE initcap(myword) END; > $$; > > You could extend that to multi-word strings with a little effort. However, > knowing that macdonald should be MacDonald requires a lot more intelligence > than is provided by any Postgres built-in system or extension that I know > of. What you are looking at is the field of science known as Natural > Language Processing, which can get very complex very quickly. But for a > Postgres answer, you might combine plpython3u with spacy ( > https://spacy.io/usage/spacy-101). > > Cheers, > Greg > > I've been having some pretty good experiences with "hard" text transformations such as correct capitalization of names like MacDonald using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've been working with.. Seems like Spacy might do similar things, and if it can be run locally, might be much cheaper than a rental API.. Steve ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> @ 2023-12-05 01:39 ` Bo Guo <[email protected]> 2023-12-05 04:12 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Bo Guo @ 2023-12-05 01:39 UTC (permalink / raw) To: Steve Midgley <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; [email protected] Thank you, Steve and Greg! Your suggestions open up new potentials for me to explore. At this moment, I lean towards normalizing the database column values in upper case, thereby out-sourcing the case-changing responsibility to the front end. I would love to hear from your thoughts on this pattern. Cheers, Bo On Mon, Dec 4, 2023 at 11:39 AM Steve Midgley <[email protected]> wrote: > > > On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <[email protected]> > wrote: > >> It's not clear exactly what you are trying to achieve, but you can use >> Postgres' built-in text searching system to exclude stopwords. For example: >> >> CREATE FUNCTION initcap_realword(myword TEXT) >> returns TEXT language SQL AS >> $$ >> SELECT CASE WHEN length(to_tsvector(myword)) < 1 >> THEN myword ELSE initcap(myword) END; >> $$; >> >> You could extend that to multi-word strings with a little effort. >> However, knowing that macdonald should be MacDonald requires a lot more >> intelligence than is provided by any Postgres built-in system or extension >> that I know of. What you are looking at is the field of science known as >> Natural Language Processing, which can get very complex very quickly. But >> for a Postgres answer, you might combine plpython3u with spacy ( >> https://spacy.io/usage/spacy-101). >> >> Cheers, >> Greg >> >> I've been having some pretty good experiences with "hard" text > transformations such as correct capitalization of names like MacDonald > using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've > been working with.. Seems like Spacy might do similar things, and if it can > be run locally, might be much cheaper than a rental API.. > > Steve > ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 2023-12-05 01:39 ` Re: Overcoming Initcap Function limitations? Bo Guo <[email protected]> @ 2023-12-05 04:12 ` Steve Midgley <[email protected]> 2023-12-05 04:23 ` Re: Overcoming Initcap Function limitations? Tom Lane <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Steve Midgley @ 2023-12-05 04:12 UTC (permalink / raw) To: Bo Guo <[email protected]>; +Cc: Greg Sabino Mullane <[email protected]>; pgsql-sql <[email protected]> On Mon, Dec 4, 2023, 5:39 PM Bo Guo <[email protected]> wrote: > Thank you, Steve and Greg! > > Your suggestions open up new potentials for me to explore. At this > moment, I lean towards normalizing the database column values in upper > case, thereby out-sourcing the case-changing responsibility to the front > end. I would love to hear from your thoughts on this pattern. > > Cheers, > > Bo > > On Mon, Dec 4, 2023 at 11:39 AM Steve Midgley <[email protected]> wrote: > >> >> >> On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <[email protected]> >> wrote: >> >>> It's not clear exactly what you are trying to achieve, but you can use >>> Postgres' built-in text searching system to exclude stopwords. For example: >>> >>> CREATE FUNCTION initcap_realword(myword TEXT) >>> returns TEXT language SQL AS >>> $$ >>> SELECT CASE WHEN length(to_tsvector(myword)) < 1 >>> THEN myword ELSE initcap(myword) END; >>> $$; >>> >>> You could extend that to multi-word strings with a little effort. >>> However, knowing that macdonald should be MacDonald requires a lot more >>> intelligence than is provided by any Postgres built-in system or extension >>> that I know of. What you are looking at is the field of science known as >>> Natural Language Processing, which can get very complex very quickly. But >>> for a Postgres answer, you might combine plpython3u with spacy ( >>> https://spacy.io/usage/spacy-101). >>> >>> Cheers, >>> Greg >>> >>> I've been having some pretty good experiences with "hard" text >> transformations such as correct capitalization of names like MacDonald >> using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've >> been working with.. Seems like Spacy might do similar things, and if it can >> be run locally, might be much cheaper than a rental API.. >> >> Steve >> > It really depends on the use case. If your users are happy with all uppercase, that seems like a great solution: fast, cheap, and reliable! Note that this list asks posters to add their reply comments at the bottom of each message, rather than the type which is more typical.. Welcome! Steve > ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 2023-12-05 01:39 ` Re: Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-05 04:12 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> @ 2023-12-05 04:23 ` Tom Lane <[email protected]> 2023-12-05 20:54 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Tom Lane @ 2023-12-05 04:23 UTC (permalink / raw) To: Steve Midgley <[email protected]>; +Cc: Bo Guo <[email protected]>; Greg Sabino Mullane <[email protected]>; pgsql-sql <[email protected]> Steve Midgley <[email protected]> writes: > On Mon, Dec 4, 2023, 5:39 PM Bo Guo <[email protected]> wrote: >> Your suggestions open up new potentials for me to explore. At this >> moment, I lean towards normalizing the database column values in upper >> case, thereby out-sourcing the case-changing responsibility to the front >> end. I would love to hear from your thoughts on this pattern. > It really depends on the use case. If your users are happy with all > uppercase, that seems like a great solution: fast, cheap, and reliable! FWIW, as an old database geek I'd lean more towards "store the original input data". If you are expecting the client-side display code to transform the casing anyway, I don't see any advantage in smashing to upper case beforehand. And keeping the original casing could have value down the road, if only for forensic purposes. regards, tom lane ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Overcoming Initcap Function limitations? 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Re: Overcoming Initcap Function limitations? Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 2023-12-05 01:39 ` Re: Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-05 04:12 ` Re: Overcoming Initcap Function limitations? Steve Midgley <[email protected]> 2023-12-05 04:23 ` Re: Overcoming Initcap Function limitations? Tom Lane <[email protected]> @ 2023-12-05 20:54 ` Greg Sabino Mullane <[email protected]> 0 siblings, 0 replies; 7+ messages in thread From: Greg Sabino Mullane @ 2023-12-05 20:54 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Steve Midgley <[email protected]>; Bo Guo <[email protected]>; pgsql-sql <[email protected]> I agree with Tom. Store the original: let the frontend uppercase themselves if they want it that way. If they decide to force the transforms back down to you, you won't have spoiled your input. If they are querying on that column, a functional index on lower(col) would be nice. To answer your original question, yes, let the front end deal with the problem, leaving you time to worry about more traditional database-related problems. :) Cheers, Greg ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2023-12-05 20:54 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2023-12-02 19:03 Overcoming Initcap Function limitations? Bo Guo <[email protected]> 2023-12-04 18:08 ` Greg Sabino Mullane <[email protected]> 2023-12-04 18:39 ` Steve Midgley <[email protected]> 2023-12-05 01:39 ` Bo Guo <[email protected]> 2023-12-05 04:12 ` Steve Midgley <[email protected]> 2023-12-05 04:23 ` Tom Lane <[email protected]> 2023-12-05 20:54 ` Greg Sabino Mullane <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox