Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rADN9-005SQ6-Ni for pgsql-sql@arkaria.postgresql.org; Mon, 04 Dec 2023 18:08:47 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rADN7-000gsV-Se for pgsql-sql@arkaria.postgresql.org; Mon, 04 Dec 2023 18:08:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rADN7-000gsN-JB for pgsql-sql@lists.postgresql.org; Mon, 04 Dec 2023 18:08:45 +0000 Received: from mail-lj1-x232.google.com ([2a00:1450:4864:20::232]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rADN4-008n00-Tn for pgsql-sql@lists.postgresql.org; Mon, 04 Dec 2023 18:08:44 +0000 Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2c9f8faf57bso23069891fa.3 for ; Mon, 04 Dec 2023 10:08:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701713320; x=1702318120; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8JtRpBOsIh0Gr/2wiJACnCsFQMuz4/EaRXLQOPvF5MM=; b=kaWZXW2ooZor7VoCgWbw44iQi9aTcIuh4tfBEQmK7+pih8SGBzhdDRgBimCR/Ak05a CpD1ylbBgcdUFHGYl8DGKzb5Pu1wBIaErpgZti9wt9CH0xvVRRazpwTuyM8QIiZX23zR L6fyepQ6l+85qZwzxOVGdnLyHIlkPvo3bMJA3JJVwdYt9Uww0Ail9NADXakdJHYzaiKL /s/kZmnQTRBdInuRJe0atqbTmoirNbGnUyLSJWZEHOSaevfInsMWPVF6a/BDu8Tu2ubV WxhZEU1TINt8WbxM10zosnELYvl0IBS0e5VChVHbaCERLqk1K17wgmNYLAkVpRId2mkH Dv3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701713320; x=1702318120; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8JtRpBOsIh0Gr/2wiJACnCsFQMuz4/EaRXLQOPvF5MM=; b=BuOoKAT0hPp5497+D6LdfsGl1kjZxJQeQn55hVekWQFrRZDVkFTOo7KrXemVpgYV9G lEIbhm+ZCWW5cDetyayxaxD7Mac2xH4KY7gFMwgGx6K7zIctaIIITdBE0ZSZQ7is0KCi MAKfDesaJVZv97c58XgtwGh7Ley1yH9iY/FoZeUUbM4f0jiQ2/RCspZXmt14i8zu5+0S poivg7mDf/Anz8ClWzqhvS07DJdebUID9/AKeBCHmlG2gSy3lFQrs1OWbHlwuUjIraqy chLmKIM8I8gbchhU/HMtssNy364/Wld1TPaEb+NUG4aueBu9lnA0uEj+2H3LjSlj/9uH iVFg== X-Gm-Message-State: AOJu0YxEqAXn37rDAb/vXI5Ws+mvk9RzzvIj+EIdg1O9dz/dVqkeqUEW KjOXwOnVNmbcvIb1SrclCDxr5Euu2l0xh7sU8g0= X-Google-Smtp-Source: AGHT+IGq8Xwji5HXSQlEVVbiwUxJg/6QoHQhmrWbT99cWl+kWnHBDENWD6jry92BZKiJMFs+fy2w5qSOO+yHjJYvLgI= X-Received: by 2002:a05:6512:31c4:b0:50b:d764:9682 with SMTP id j4-20020a05651231c400b0050bd7649682mr4004247lfe.102.1701713319831; Mon, 04 Dec 2023 10:08:39 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Greg Sabino Mullane Date: Mon, 4 Dec 2023 13:08:03 -0500 Message-ID: Subject: Re: Overcoming Initcap Function limitations? To: Bo Guo Cc: pgsql-sql@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000f02fe5060bb30142" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f02fe5060bb30142 Content-Type: text/plain; charset="UTF-8" It's not clear exactly what you are trying to achieve, but you can use Postgres' built-in text searching system to exclude stopwords. For example: CREATE FUNCTION initcap_realword(myword TEXT) returns TEXT language SQL AS $$ SELECT CASE WHEN length(to_tsvector(myword)) < 1 THEN myword ELSE initcap(myword) END; $$; You could extend that to multi-word strings with a little effort. However, knowing that macdonald should be MacDonald requires a lot more intelligence than is provided by any Postgres built-in system or extension that I know of. What you are looking at is the field of science known as Natural Language Processing, which can get very complex very quickly. But for a Postgres answer, you might combine plpython3u with spacy ( https://spacy.io/usage/spacy-101). Cheers, Greg --000000000000f02fe5060bb30142 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It's not clear exactly what you are trying to achieve,= but you can use Postgres' built-in text searching system to exclude st= opwords. For example:

CREATE FUNCTION initcap_realword(m= yword TEXT)
=C2=A0 returns TEXT language SQL AS=C2=A0
$= $
SELECT CASE WHEN length(to_tsvector(myword)) < 1
= =C2=A0 THEN myword=C2=A0ELSE initcap(myword) END;
$$;

You could extend that to multi-word strings with a little= effort. However, knowing that macdonald should be MacDonald requires a lot= more intelligence than is provided by any Postgres built-in system or exte= nsion that I know of. What you are looking at is the field of science known= as Natural Language Processing, which can get very complex very quickly. B= ut for a Postgres answer, you might combine plpython3u with spacy (https://spacy.io/usage/spacy-101)= .

Cheers,
Greg

--000000000000f02fe5060bb30142--