Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOz5a-00G9YO-1u for pgsql-general@arkaria.postgresql.org; Fri, 28 Nov 2025 14:04:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vOz5Z-00Bkep-0c for pgsql-general@arkaria.postgresql.org; Fri, 28 Nov 2025 14:04:45 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOva3-00AkMj-39 for pgsql-general@lists.postgresql.org; Fri, 28 Nov 2025 10:20:00 +0000 Received: from mail-ed1-x534.google.com ([2a00:1450:4864:20::534]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vOva1-001uES-2J for pgsql-general@lists.postgresql.org; Fri, 28 Nov 2025 10:19:59 +0000 Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-640aaa89697so2566680a12.3 for ; Fri, 28 Nov 2025 02:19:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20230601; t=1764325197; x=1764929997; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MYpSDAbyLqe2iI4rB+NQbEXWnAXypQP9QbJuKcNV3YQ=; b=Xz+84MZWI3Aj+fsNNVSFZw8rXvSPpEFRR/hru3e+DL7XRScCFpLCrkEnuN+koVbjfF nz3jo2dzZn061yKh4XN/+tdL1o1tz9CBqkQoMfiy945DPAnHfCwJdK38cRz57+CZhQv0 ds0dNiNPJrjN4/b81EkhJVncoTeZdOhMbRnkRcFr5QTYXPg0k/nM5tVNbIj/5hFGNSWv XfKVWorfy5C8aCiylSpCemrRvQAlLEoeibScFCnM8nYn0bXS9cXJGlFpIUuxG56iMkf/ oNFvFF3bDVpD5VHAcSefaQNeC5gzgBQYEIBgTUOxIaw0quk0LV93thXTKTW925CmuUF0 GkoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764325197; x=1764929997; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=MYpSDAbyLqe2iI4rB+NQbEXWnAXypQP9QbJuKcNV3YQ=; b=RjgNmMaMtcSJLnmW4ljwu4qwOfwDxxf7LWMXN5Ggwzi68l22p8iAXyLCKPmDAMK0p4 OqaReYhNV2Ez0vhsN7zbbQ6skp8pY0znw2y+XsgP769EHsVmCVDFhLxl3MC0rE4nvxGv 8Lkw5s4dYUnSCqkLbk1lUUWM1ZxA2tg2jHohAZwaS6dcE5vl5oIyecrz8+HkfHOhYFd/ ukopflEkY+QU4vp14iWqJ9PXgY418m50/j2jBbSsTrckoXi1ZSZadJjZPWdM9kSU2AGC geYBZ5QaHs5PbRADENABcVqlDmHlDdL1viL2OXbjq25zcJug1VJZCsaMaEBqS/UlPtgq ZGpw== X-Gm-Message-State: AOJu0YwMo7QymSYNOxvuSRKf77/CDGBGYIwX24/8NXo9R3RrNpgCllmi gbyIjnwStbyif9cpYamNYNufu4tcTkNVegK69hwoIrumd/AXSoDjA8tYF32uPAC+if9sGrmHd1F 4KpyVEJ9R/+/2YXhcNEEZ2L9aYHRqhtQ= X-Gm-Gg: ASbGncvC6npcchXDdpRXwdctbi/LGcGJ+OuYhKvtlx+bmNa/qowpS1WU5poYHruKLe+ W2OO5l0DffwMBGoGkAnnEQediSvcNtRGVJO5QpXSYmPPGlN5biBDNijQNe7+DKwhEioZPLOHGz4 kFlJ2qtXG+9za9RFVkt590D6Kk0GQEz8IMNZ1qEEzami966w1/YEkHI3LzkkPr2A9/jmCktGhG7 1loJwUkjlkHUiZmQxKj0juKOo2GoAUtmnm3DdXE4YAFW0jvdfuYCVFyIPz0s501NbK8Kw0J6QYX UTgYty4= X-Google-Smtp-Source: AGHT+IF/X6K+UC2YAYZdZD3LX3vKR2wsUOj/QJFBiOwy+OMXga5NusDdt1H0b5gTHA1+Ie2hK/wDC5qDk2C0OHGRnZY= X-Received: by 2002:a05:6402:1d50:b0:640:a03a:af98 with SMTP id 4fb4d7f45d1cf-64554685738mr24136928a12.18.1764325196476; Fri, 28 Nov 2025 02:19:56 -0800 (PST) MIME-Version: 1.0 References: <20251126154844.e6e3d5c7f476cc966fc201f4@wanadoo.fr> <29ffee7e-40a8-f911-4419-24ba8558f5db@appl-ecosys.com> <20251126164356.9a3b7fc25d5118e28175f9a4@wanadoo.fr> <989b6fd-2979-8ac6-13b-3916d2e47de8@appl-ecosys.com> In-Reply-To: From: Jacqui Caren Date: Fri, 28 Nov 2025 10:19:43 +0000 X-Gm-Features: AWmQ_bl4XZ6UsgARFcnR0uWREK-rIANGTznhQRrJM4MYpIHO5PSqGqln4xfUhRU Message-ID: Subject: Re: Selecting all variations of job title in a list To: Bryan Sayer Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000009ab9090644a4f89e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009ab9090644a4f89e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Many years ago I used a weak precedence engine to categorize this form of job title script to a job code. The regexp did not work because we had asst to gen mgr The wpe tokenized the words then relaxed token patterns with other token codes with cumulative context The final token reduction resulted in a final code or if in my example we had a modifier to role code (asst to xxxx role) The entire engine was created in Oracle but would be easy to implement in pgsql. Back then neural nets were only just appearing in finance and llms were non existent. Old 1960's tech saved the day On Wed, Nov 26, 2025, 17:02 Bryan Sayer wrote: > I am not very skilled at Postgresql specifically, but when I was doing SQ= L > in another environment I would just do > > select distinct (or unique) jobtitle > > usually getting a count of how many times each title occurred. Then I > would create a mapping to standardize the the job titles. > *Bryan Sayer* > Retired Demographer/Statistician > *In a world in which you can be anything, be kind* > On 11/26/2025 11:10 AM, Rich Shepard wrote: > > On Wed, 26 Nov 2025, David G. Johnston wrote: > > I was using this tool a while back when I was doing heavy regex work. > > https://www.regexbuddy.com/ > > Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl. > > But I=E2=80=99d still say regexp is not the best solution here - unless y= ou > encapsulate the logic in a function. I suspect you=E2=80=99ll want to us= e this > logic in more than just a single query and with a literal regexp you have > to rely on manual synchronization. Note, you could combine the lookup > table with regexes. Though beware of ensure you don=E2=80=99t produce du= plicate > matches if you go that route. > > > David, > > Thanks, > > Rich > > > --0000000000009ab9090644a4f89e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Many years ago I used a weak precedence engine to categor= ize this form of job title script to a job code.

The regexp did not work because we had asst to gen mgr
The wpe tokenized the words then relaxed token patter= ns with other token codes with cumulative context
Th= e final token reduction resulted in a final code or if in my example we had= a modifier to role code (asst to xxxx role)

The entire engine was created in Oracle but would be e= asy to implement in pgsql. Back then neural nets were only just appearing i= n finance and llms were non existent.

Old 1960's tech saved the day

On Wed, Nov 26, 2025, 17:02 Bryan Sayer <blslists@gmail.com> wrote:
=20 =20 =20

I am not very skilled at Postgresql specifically, but when I was doing SQL in another environment I would just do

select distinct (or unique) jobtitle

usually getting a count of how many times each title occurred. Then I would create a mapping to standardize the the job titles.

Bryan Sayer
Retired Demographer/Statistician
In a world in which you can be anything, be kind
On 11/26/2025 11:10 AM, Rich Shepard wrote:
On Wed, 26 Nov 2025, David G. Johnston wrote:

I was using this tool a while back when I was doing heavy regex work.

https://www.regexbuddy.com/

Keep in mind the native flavor of regex in PostgreSQL is TCL, not Perl.

But I=E2=80=99d still say regexp is not the best solution here - un= less you
encapsulate the logic in a function.=C2=A0 I suspect you=E2=80=99ll= want to use this
logic in more than just a single query and with a literal regexp you have
to rely on manual synchronization.=C2=A0 Note, you could combine th= e lookup
table with regexes.=C2=A0 Though beware of ensure you don=E2=80=99t= produce duplicate
matches if you go that route.

David,

Thanks,

Rich


--0000000000009ab9090644a4f89e--