Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCBAd-001jhL-0U for pgsql-general@arkaria.postgresql.org; Mon, 13 Apr 2026 06:53:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wCBAa-00570b-38 for pgsql-general@arkaria.postgresql.org; Mon, 13 Apr 2026 06:53:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wCBAa-00570T-1d for pgsql-general@lists.postgresql.org; Mon, 13 Apr 2026 06:53:17 +0000 Received: from mail-dl1-x1242.google.com ([2607:f8b0:4864:20::1242]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wCBAZ-00000000kth-0Bhd for pgsql-general@postgresql.org; Mon, 13 Apr 2026 06:53:16 +0000 Received: by mail-dl1-x1242.google.com with SMTP id a92af1059eb24-12c1fcce8f8so6814146c88.1 for ; Sun, 12 Apr 2026 23:53:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776063194; cv=none; d=google.com; s=arc-20240605; b=U6g2a5daebvm3lsMFd5dvxLQz6SbuCjoKGulcLoFYd1rDixQHOwhB4PRD+qAXmpHFV R6xHtyklNbB/Zvp1CR1CuY+EYQuMO53GGZ+8SCGkKa9EJBLQDKlmdUKwBm+m6krgAPI+ cd26qr5byoLAfmUdLfxSxLJcoqoRmyvxHFwjyJt0C3vuMhO4xRv8q0WBReL7zD4pMCWK tFtT0TkCMPXmfhdhtI+AK13EPYthjQNHefCKOQ4nfyFBQKyNV0j2l9oqNSqQBD2HIhoI q1HPr9//j6Fc4iQxW8j+J5PCD9EULmPucDttkP0P00BI9XUFsaXAYPV13jH8hMEPbYVJ iyNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=wn1DpC2WgrWh+kzojWOoVwQB+2hfRZvfHxmMqyuveW0=; fh=HiUkeEfsyu0zQNInMjzxKgg1g3JCInTEWfqFkLb03mI=; b=Vl44z1TZOzv/lKQ7XC8dKfsdGLB63yrY1bX6B8TMANrtPyZGqGXdiHvsS68C4og00q Vw7+XBbt3HQzActpxsjILRVaxmO0JJPnFxqOo3pQ2tZmtgi0gg3NHETY0A4SrTLnelux YrzrqRoBhxhtTUF3V4O6RKvmCSZ+U9hPQ0EvIKbWyLx2U+hEBrnAjTQ1slfLl8ii6/XQ FcjoOEb5nQ+8ymKZXaDM/vRAK7YfCNoaqS9l1fm2UuC1Bs0yLd+Q/nhlUAAGr0zdaBtI zD/fI7bKqKPe3VQJBJ1c98WxMwlJMWYTo63RvwoIF42PXtYGxdOUtc8nxdKsK7zo5S7C bcXw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776063194; x=1776667994; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wn1DpC2WgrWh+kzojWOoVwQB+2hfRZvfHxmMqyuveW0=; b=mEXMHeG69UJVZB5lbEmOTWcgFxh1RAh4V3Vd1fUg2YwhRMuV/RbkgqyfMf2l2OGpxu arUJ1U7rgdGVdrzCxhd2GaerSntcIUYKYYsZVG2TOP4VI8c9HA1ZQwfn/ycqQ7GQU36x BG4bqNGMfLMC+cuSZG/HaHrC1WsLLH+ENPp/VILWktCzHlDXRfgzhmj/QLNZ9hhhSUli BT4pHjesmaFnAvqFJGdT7DO9X2JdYIoSBP3Q8RC8fz4n69qH2RMt54JXisrfbf867iAF klS9yrd9IGV3FynNgrLFtsUT7BPGyObXSC9dBg53HOme7/kesjdCieQh4nbOwsnIFRMG qNIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776063194; x=1776667994; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=wn1DpC2WgrWh+kzojWOoVwQB+2hfRZvfHxmMqyuveW0=; b=Fl0e/NVFDVZg6aYZ/2iDLFvgWKAPLr+ffdifBWSPLmrBruYm6Hv2EhP0RgDY2RJVTy Qf/yUENz0+3zUrRjiqGk4JtWJLdFn4NiAouTBk6lk1/+91OGR4jEQaYD0nYAgVQ5KSXp 84umnVUzrEVakoUEngW/ghCqUe5aAjgw0fDTzOoCTOqQTcT6IghknQnacmogsHcGuFJX cUQSiYkR3Dlm9UWT56wvi/wgmbDT9KCFB2rVgjI9/44VWobmQH/qfYBe+WF9Yax8LpEO gvRlGspgiJvJZ69MdrYcEAfXiukdMpPRft9wJvEx+FPUvJ6OqzbF3II1H2uq+9oZJZ7S nyJw== X-Gm-Message-State: AOJu0YxHg0HnvYi5x86g2w/e8ErV/tq/oUTouy1VyAjVVJ/vmJQklyMd 6GxQHmQvYRhi+E0XslNlqt2ZeKhBbQ52wlrRtWpJdNI/XahNziyxHqW/ESngx76wG0GIpMIiXw3 dfizu+gJ/uDIH2H8ijt946iQq2foZAU1mYi5gV90= X-Gm-Gg: AeBDietiByUaa4BVwEoTKDNSfDACEFj/JIMYtfEh1cShESODyOKswNEFRC1wrA9XKIE d01OL+niPQsZzy4eYiFS+CknFBl2+W8FhRKIW7zzFb+oD9vUmw9Uo5tcB5So59CFIJmfPUoJxfy 8a0dr2ogSNjaUwt01+8nLx23q+B/8hW7XnwNT6QFOp7KrxVKh/t0qyyOV/LPwjjLU8puUTViv2c wS6KuA5IjBoWzTjAv1dNzGcGmoosFqBmjjT4UK4eeEuvEdRZHBGru99IvXmP8Q8aAvwJDOoMC47 1HbiTAg= X-Received: by 2002:a05:7301:1003:b0:2c5:50fe:c795 with SMTP id 5a478bee46e88-2d58a788298mr7386232eec.29.1776063194265; Sun, 12 Apr 2026 23:53:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: lakshmi Date: Mon, 13 Apr 2026 12:27:08 +0530 X-Gm-Features: AQROBzB92wTaSI31HHN35qiHd61pjz0-KA2-5HqJ3eTrUO4CR8KCLvAMgFo6MUk Message-ID: Subject: Re: Extension - multilingual_fuzzy_match : Multilingual phonetic matching extension for PostgreSQL To: Blessy Thomas Cc: pgsql-general@postgresql.org Content-Type: multipart/alternative; boundary="000000000000cafb4b064f51ef94" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000cafb4b064f51ef94 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello all, I hope this mail finds you well. I would like to inform you that as my friend has moved forward with another offer I will be taking over her work related to the multilingual_fuzzy_match extension going forward. My name is Lakshmi, and I will be handling this work from now on. Please feel free to reach out to me for any queries, discussions or updates. Looking forward to working with you all. Thank you. Regards, Lakshmi On Mon, Apr 13, 2026 at 11:22=E2=80=AFAM Blessy Thomas wrote: > > > ---------- Forwarded message --------- > From: Blessy Thomas > Date: Mon, 2 Mar 2026 at 12:55 > Subject: Extension - multilingual_fuzzy_match : Multilingual phonetic > matching extension for PostgreSQL > > > Hello PostgreSQL Community, > > I would like to introduce a PostgreSQL extension called > multilingual_fuzzy_match. This extension enables multilingual name > normalization, transliteration, and fuzzy phonetic matching directly insi= de > PostgreSQL at query time. > > 1. What Problem It Solves: > In multilingual datasets (especially Indian language datasets), the same > name may appear in: > - Different scripts > - Different transliterations > - Slight spelling variations > - Multiple languages > > For example: > =E0=A4=B0=E0=A4=BE=E0=A4=AE =E2=89=88 Raam =E2=89=88 =D8=B1=D9=8E=D8=A7= =D9=85 =E2=89=88 =E0=AE=B0=E0=AE=BE=E0=AE=AE=E0=AF=8D > Traditional equality or LIKE queries fail in such cases. Even trigram > matching doesn=E2=80=99t fully address cross-script phonetic similarity. > > 2. What This Extension Does > > - Detects the script of the input text > - Performs transliteration and normalization > - Generates a phonetic key > - Uses Levenshtein distance (via python-Levenshtein) > - Returns similarity-scored results > All of this happens inside PostgreSQL using PL/Python (plpython3u). > > 3. Key Features > - No schema changes required > - Query-level matching > - Supports 11 major Indian scripts: > Devanagari, Tamil, Telugu, Bengali, Urdu, Malayalam, Kannada, Odia, > Gujarati, Punjabi > - Works on existing tables > > 4. Requirements > - PostgreSQL 17 (compiled with Python support) > - Python 3.12+ > - plpython3u > - Python packages: > pip install indic-transliteration python-Levenshtein > > 5. Example Usage > > -------------------------------------------------------------------------= ---------------------------------------------------- > postgres=3D# > SELECT * FROM fuzzy_match('names_native_dist', 'name', 'Rahul') > WHERE distance <=3D 1; > id | name | translit | normalized | fuzzy | distance > ----+-------+----------+------------+-------+---------- > 1 | =E0=A4=B0=E0=A4=BE=E0=A4=B9=E0=A5=81=E0=A4=B2 | rAhula | rahul = | rahul | 0 > 2 | =E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81=E0=A6=B2 | rAhula | rahul = | rahul | 0 > 4 | =E0=B2=B0=E0=B2=BE=E0=B2=B9=E0=B3=81=E0=B2=B2=E0=B3=8D | rAhul |= rahul | rahul | 0 > 5 | Rahul | Rahul | rahul | rahul | 0 > (4 rows) > > -------------------------------------------------------------------------= ------------------------------------------------------- > > 6. Feedback Requested > > I would really appreciate feedback from the community on: > - Extension design approach > - Performance considerations > - Suitability for PGXN submission > I would love suggestions, improvements, and any guidance on making this > production-ready. I=E2=80=99m sharing this not just as a project, but as = a starting > point for discussion about multilingual data handling inside PostgreSQL. > > Looking forward to your thoughts and critiques. > Thank you! > > Regards > Blessy Thomas > --000000000000cafb4b064f51ef94 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello all,

I hope this mail finds = you well.

I would like to inform you that as my friend has moved for= ward with another offer I will be taking over her work related to the multi= lingual_fuzzy_match extension going forward. My name is Lakshmi, and I will= be handling this work from now on. Please feel free to reach out to me for= any queries, discussions or updates.

Looking forward to working wit= h you all.

Thank you.

Regards,
Lakshmi

On Mon, Apr 13, 2026 at 11:22=E2=80=AFAM Blessy Thomas <blessy456bthomas@gmail.com> wrote:
=

--= -------- Forwarded message ---------
From: Blessy Thomas <blessy456bthomas@gmai= l.com>
Date: Mon, 2 Mar 2026 at 12:55
Subject: Extensio= n - multilingual_fuzzy_match : Multilingual phonetic matching extension for= PostgreSQL


Hello PostgreSQL Community,
I would like to introduce a PostgreSQL extension called multilingual_= fuzzy_match. This extension enables multilingual name normalization, transl= iteration, and fuzzy phonetic matching directly inside PostgreSQL at query = time.

1. What Problem It Solves:
In multilingual datasets (especi= ally Indian language datasets), the same name may appear in:
- Different= scripts
- Different transliterations
- Slight spelling variations- Multiple languages

For example:
=E0=A4=B0=E0=A4=BE=E0=A4=AE = =E2=89=88 Raam =E2=89=88 =D8=B1=D9=8E=D8=A7=D9=85 =E2=89=88 =E0=AE=B0=E0=AE= =BE=E0=AE=AE=E0=AF=8D
Traditional equality or LIKE queries fail in such = cases. Even trigram matching doesn=E2=80=99t fully address cross-script pho= netic similarity.

2. What This Extension Does

- Detects the s= cript of the input text
- Performs transliteration and normalization
= - Generates a phonetic key
- Uses Levenshtein distance (via python-Leven= shtein)
- Returns similarity-scored results
All of this happens insid= e PostgreSQL using PL/Python (plpython3u).

3. Key Features
- No s= chema changes required
- Query-level matching
- Supports 11 major Ind= ian scripts:
Devanagari, Tamil, Telugu, Bengali, Urdu, Malayalam, Kannad= a, Odia, Gujarati, Punjabi
- Works on existing tables

4. Requirem= ents
- PostgreSQL 17 (compiled with Python support)
- Python 3.12+- plpython3u
- Python packages:
=C2=A0 =C2=A0pip install indic-trans= literation python-Levenshtein

5. Example Usage
------------------= ---------------------------------------------------------------------------= --------------------------------
postgres=3D#
SELECT * FROM fuzzy_mat= ch('names_native_dist', 'name', 'Rahul')
WHERE d= istance <=3D 1;
=C2=A0id | name =C2=A0| translit | normalized | fuzzy= | distance
----+-------+----------+------------+-------+----------
= =C2=A0 1 | =E0=A4=B0=E0=A4=BE=E0=A4=B9=E0=A5=81=E0=A4=B2 =C2=A0| rAhula =C2= =A0 | rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00
= =C2=A0 2 | =E0=A6=B0=E0=A6=BE=E0=A6=B9=E0=A7=81=E0=A6=B2 =C2=A0| rAhula =C2= =A0 | rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00
= =C2=A0 4 | =E0=B2=B0=E0=B2=BE=E0=B2=B9=E0=B3=81=E0=B2=B2=E0=B3=8D | rAhul = =C2=A0 =C2=A0| rahul =C2=A0 =C2=A0 =C2=A0| rahul | =C2=A0 =C2=A0 =C2=A0 =C2= =A00
=C2=A0 5 | Rahul | Rahul =C2=A0 =C2=A0| rahul =C2=A0 =C2=A0 =C2=A0|= rahul | =C2=A0 =C2=A0 =C2=A0 =C2=A00
(4 rows)
----------------------= ---------------------------------------------------------------------------= -------------------------------

6. Feedback Requested

I would= really appreciate feedback from the community on:
- Extension design ap= proach
- Performance considerations
- Suitability for PGXN submission=
I would love suggestions, improvements, and any guidance on making this= production-ready. I=E2=80=99m sharing this not just as a project, but as a= starting point for discussion about multilingual data handling inside Post= greSQL.

Looking forward to your thoughts and critiques.
Thank you= !

Regards
Blessy Thomas
--000000000000cafb4b064f51ef94--