MIME-Version: 1.0
References: <CAJyyjtA_MwGY+_TgmixjJ8-pkTAUSnUwS3j7cJ2UMSbDjFmbjw@mail.gmail.com>
 <CAJyyjtA2=kVgcnZtYYinLqAyOQEmjX=rALM0LNPrh633M-FsvQ@mail.gmail.com>
In-Reply-To: <CAJyyjtA2=kVgcnZtYYinLqAyOQEmjX=rALM0LNPrh633M-FsvQ@mail.gmail.com>
From: lakshmi <lakshmigcdac@gmail.com>
Date: Mon, 13 Apr 2026 12:27:08 +0530
Message-ID: <CAEvyyTiqBtpvDHOZePdMPQNoXx6RQvuggAYCD1CO9HKiBtBBQg@mail.gmail.com>
Subject: Re: Extension - multilingual_fuzzy_match : Multilingual phonetic
 matching extension for PostgreSQL
To: Blessy Thomas <blessy456bthomas@gmail.com>
Cc: pgsql-general@postgresql.org
Content-Type: multipart/alternative; boundary="000000000000cafb4b064f51ef94"
Archived-At: <https://www.postgresql.org/message-id/CAEvyyTiqBtpvDHOZePdMPQNoXx6RQvuggAYCD1CO9HKiBtBBQg%40mail.gmail.com>
Precedence: bulk

--000000000000cafb4b064f51ef94
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hello all,

I hope this mail finds you well.

I would like to inform you that as my friend has moved forward with another
offer I will be taking over her work related to the
multilingual_fuzzy_match extension going forward. My name is Lakshmi, and I
will be handling this work from now on. Please feel free to reach out to me
for any queries, discussions or updates.

Looking forward to working with you all.

Thank you.

Regards,
Lakshmi

On Mon, Apr 13, 2026 at 11:22=E2=80=AFAM Blessy Thomas <blessy456bthomas@gm=
ail.com>
wrote:

>
>
> ---------- Forwarded message ---------
> From: Blessy Thomas <blessy456bthomas@gmail.com>
> Date: Mon, 2 Mar 2026 at 12:55
> Subject: Extension - multilingual_fuzzy_match : Multilingual phonetic
> matching extension for PostgreSQL
>
>
> Hello PostgreSQL Community,
>
> I would like to introduce a PostgreSQL extension called
> multilingual_fuzzy_match. This extension enables multilingual name
> normalization, transliteration, and fuzzy phonetic matching directly insi=
de
> PostgreSQL at query time.
>
> 1. What Problem It Solves:
> In multilingual datasets (especially Indian language datasets), the same
> name may appear in:
> - Different scripts
> - Different transliterations
> - Slight spelling variations
> - Multiple languages
>
> For example:
> =E0=A4=B0=E0=A4=BE=E0=A4=AE =E2=89=88 Raam =E2=89=88 =D8=B1=D9=8E=D8=A7=
=D9=85 =E2=89=88 =E0=AE=B0=E0=AE=BE=E0=AE=AE=E0=AF=8D
> Traditional equality or LIKE queries fail in such cases. Even trigram
> matching doesn=E2=80=99t fully address cross-script phonetic similarity.
>
> 2. What This Extension Does
>
> - Detects the script of the input text
> - Performs transliteration and normalization
> - Generates a phonetic key
> - Uses Levenshtein distance (via python-Levenshtein)
> - Returns similarity-scored results
> All of this happens inside PostgreSQL using PL/Python (plpython3u).
>
> 3. Key Features
> - No schema changes required
> - Query-level matching
> - Supports 11 major Indian scripts:
> Devanagari, Tamil, Telugu, Bengali, Urdu, Malayalam, Kannada, Odia,
> Gujarati, Punjabi
> - Works on existing tables
>
> 4. Requirements
> - PostgreSQL 17 (compiled with Python support)
> - Python 3.12+
> - plpython3u
> - Python packages:
>    pip install indic-transliteration python-Levenshtein
>
> 5. Example Usage
>
> -------------------------------------------------------------------------=
----------------------------------------------------
> postgres=3D#
> SELECT * FROM fuzzy_match('names_native_dist', 'name', 'Rahul')
> WHERE distance <=3D 1;
>  id | name  | translit | normalized | fuzzy | distance
> ----+-------+----------+------------+-------+----------
>   1 | =E0=A4=B0=E0=A4=BE=E0=A4=B9=E0=A5=81=E0=A4=B2  | rAhula   | rahul  =
    | rahul |        0
>   2 | =E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81=E0=A6=B2  | rAhula   | rahul  =
    | rahul |        0
>   4 | =E0=B2=B0=E0=B2=BE=E0=B2=B9=E0=B3=81=E0=B2=B2=E0=B3=8D | rAhul    |=
 rahul      | rahul |        0
>   5 | Rahul | Rahul    | rahul      | rahul |        0
> (4 rows)
>
> -------------------------------------------------------------------------=
-------------------------------------------------------
>
> 6. Feedback Requested
>
> I would really appreciate feedback from the community on:
> - Extension design approach
> - Performance considerations
> - Suitability for PGXN submission
> I would love suggestions, improvements, and any guidance on making this
> production-ready. I=E2=80=99m sharing this not just as a project, but as =
a starting
> point for discussion about multilingual data handling inside PostgreSQL.
>
> Looking forward to your thoughts and critiques.
> Thank you!
>
> Regards
> Blessy Thomas
>

--000000000000cafb4b064f51ef94
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Hello all,<br><br>I hope this mail finds =
you well.<br><br>I would like to inform you that as my friend has moved for=
ward with another offer I will be taking over her work related to the multi=
lingual_fuzzy_match extension going forward. My name is Lakshmi, and I will=
 be handling this work from now on. Please feel free to reach out to me for=
 any queries, discussions or updates.<br><br>Looking forward to working wit=
h you all.<br><br>Thank you.<br><br>Regards,<br>Lakshmi</div><br><div class=
=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr=
">On Mon, Apr 13, 2026 at 11:22=E2=80=AFAM Blessy Thomas &lt;<a href=3D"mai=
lto:blessy456bthomas@gmail.com">blessy456bthomas@gmail.com</a>&gt; wrote:<b=
r></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">=
<br><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">--=
-------- Forwarded message ---------<br>From: <strong class=3D"gmail_sender=
name" dir=3D"auto">Blessy Thomas</strong> <span dir=3D"auto">&lt;<a href=3D=
"mailto:blessy456bthomas@gmail.com" target=3D"_blank">blessy456bthomas@gmai=
l.com</a>&gt;</span><br>Date: Mon, 2 Mar 2026 at 12:55<br>Subject: Extensio=
n - multilingual_fuzzy_match : Multilingual phonetic matching extension for=
 PostgreSQL<br></div><br><br><div dir=3D"ltr">Hello PostgreSQL Community,<b=
r><br>I would like to introduce a PostgreSQL extension called multilingual_=
fuzzy_match. This extension enables multilingual name normalization, transl=
iteration, and fuzzy phonetic matching directly inside PostgreSQL at query =
time.<br><br>1. What Problem It Solves:<br>In multilingual datasets (especi=
ally Indian language datasets), the same name may appear in:<br>- Different=
 scripts<br>- Different transliterations<br>- Slight spelling variations<br=
>- Multiple languages<br><br>For example:<br>=E0=A4=B0=E0=A4=BE=E0=A4=AE =
=E2=89=88 Raam =E2=89=88 =D8=B1=D9=8E=D8=A7=D9=85 =E2=89=88 =E0=AE=B0=E0=AE=
=BE=E0=AE=AE=E0=AF=8D<br>Traditional equality or LIKE queries fail in such =
cases. Even trigram matching doesn=E2=80=99t fully address cross-script pho=
netic similarity.<br><br>2. What This Extension Does<br><br>- Detects the s=
cript of the input text<br>- Performs transliteration and normalization<br>=
- Generates a phonetic key<br>- Uses Levenshtein distance (via python-Leven=
shtein)<br>- Returns similarity-scored results<br>All of this happens insid=
e PostgreSQL using PL/Python (plpython3u).<br><br>3. Key Features<br>- No s=
chema changes required<br>- Query-level matching<br>- Supports 11 major Ind=
ian scripts:<br>Devanagari, Tamil, Telugu, Bengali, Urdu, Malayalam, Kannad=
a, Odia, Gujarati, Punjabi<br>- Works on existing tables<br><br>4. Requirem=
ents<br>- PostgreSQL 17 (compiled with Python support)<br>- Python 3.12+<br=
>- plpython3u<br>- Python packages:<br>=C2=A0 =C2=A0pip install indic-trans=
literation python-Levenshtein<br><br>5. Example Usage<br>------------------=
---------------------------------------------------------------------------=
--------------------------------<br>postgres=3D#<br>SELECT * FROM fuzzy_mat=
ch(&#39;names_native_dist&#39;, &#39;name&#39;, &#39;Rahul&#39;)<br>WHERE d=
istance &lt;=3D 1;<br>=C2=A0id | name =C2=A0| translit | normalized | fuzzy=
 | distance<br>----+-------+----------+------------+-------+----------<br>=
=C2=A0 1 | =E0=A4=B0=E0=A4=BE=E0=A4=B9=E0=A5=81=E0=A4=B2 =C2=A0| rAhula =C2=
=A0 | rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00<br>=
=C2=A0 2 | =E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81=E0=A6=B2 =C2=A0| rAhula =C2=
=A0 | rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00<br>=
=C2=A0 4 | =E0=B2=B0=E0=B2=BE=E0=B2=B9=E0=B3=81=E0=B2=B2=E0=B3=8D | rAhul =
=C2=A0 =C2=A0| rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2=
=A00<br>=C2=A0 5 | Rahul | Rahul =C2=A0 =C2=A0| rahul =C2=A0 =C2=A0 =C2=A0|=
 rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00<br>(4 rows)<br>----------------------=
---------------------------------------------------------------------------=
-------------------------------<br><br>6. Feedback Requested<br><br>I would=
 really appreciate feedback from the community on:<br>- Extension design ap=
proach<br>- Performance considerations<br>- Suitability for PGXN submission=
<br>I would love suggestions, improvements, and any guidance on making this=
 production-ready. I=E2=80=99m sharing this not just as a project, but as a=
 starting point for discussion about multilingual data handling inside Post=
greSQL.<br><br>Looking forward to your thoughts and critiques.<br>Thank you=
!<br><br>Regards<br>Blessy Thomas</div>
</div></div>
</blockquote></div></div>

--000000000000cafb4b064f51ef94--