public inbox for [email protected]  
help / color / mirror / Atom feed
From: Steve Midgley <[email protected]>
To: Greg Sabino Mullane <[email protected]>
Cc: Bo Guo <[email protected]>
Cc: [email protected]
Subject: Re: Overcoming Initcap Function limitations?
Date: Mon, 4 Dec 2023 10:39:12 -0800
Message-ID: <CAJexoSK3grH+4khLYMDH=dBwbZxx=F-5GUM1rzHh5fzT_GGuwg@mail.gmail.com> (raw)
In-Reply-To: <CAKAnmmKhepxPzLWiU2JMNWsNiwLmQAyKCgV-f2dMot1hKVa-hA@mail.gmail.com>
References: <CADHFRch1=ND1TpiOidJELBdJJSugn-H1d3vyjg9gh9Nw1dZCgg@mail.gmail.com>
	<CAKAnmmKhepxPzLWiU2JMNWsNiwLmQAyKCgV-f2dMot1hKVa-hA@mail.gmail.com>

On Mon, Dec 4, 2023 at 10:09 AM Greg Sabino Mullane <[email protected]>
wrote:

> It's not clear exactly what you are trying to achieve, but you can use
> Postgres' built-in text searching system to exclude stopwords. For example:
>
> CREATE FUNCTION initcap_realword(myword TEXT)
>   returns TEXT language SQL AS
> $$
> SELECT CASE WHEN length(to_tsvector(myword)) < 1
>   THEN myword ELSE initcap(myword) END;
> $$;
>
> You could extend that to multi-word strings with a little effort. However,
> knowing that macdonald should be MacDonald requires a lot more intelligence
> than is provided by any Postgres built-in system or extension that I know
> of. What you are looking at is the field of science known as Natural
> Language Processing, which can get very complex very quickly. But for a
> Postgres answer, you might combine plpython3u with spacy (
> https://spacy.io/usage/spacy-101).
>
> Cheers,
> Greg
>
> I've been having some pretty good experiences with "hard" text
transformations such as correct capitalization of names like MacDonald
using GPT 3.5 Turbo API which is pretty cheap for the volume of data I've
been working with.. Seems like Spacy might do similar things, and if it can
be run locally, might be much cheaper than a rental API..

Steve


view thread (7+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Overcoming Initcap Function limitations?
  In-Reply-To: <CAJexoSK3grH+4khLYMDH=dBwbZxx=F-5GUM1rzHh5fzT_GGuwg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox