public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrien Nayrat <[email protected]>
To: Darkhan <[email protected]>
To: [email protected]
Subject: Re: pg_kazsearch: Full-text search extension for Kazakh language
Date: Wed, 8 Apr 2026 16:42:21 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAOW9cEpjUV0fG6u6m86vt8RJOBLymys=k33DWzgEP+0SnXhZGA@mail.gmail.com>
References: <CAOW9cEpjUV0fG6u6m86vt8RJOBLymys=k33DWzgEP+0SnXhZGA@mail.gmail.com>

On 4/5/26 3:32 PM, Darkhan wrote:
> Hi all,
> 
> I built pg_kazsearch, a PostgreSQL extension that adds full-text search
> support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or
> stop word list available in PostgreSQL, so anyone searching Kazakh text is
> stuck with trigram matching or application-level workarounds.
> 
> Kazakh is agglutinative — a single word can carry 5-6 suffixes, which makes
> standard search approaches miss most relevant results. pg_kazsearch
> provides a custom Kazakh stemmer (core written in Rust), a stop word list,
> and a text search dictionary that plugs into the standard PostgreSQL FTS
> infrastructure — GIN indexes, ts_rank, phrase search all work out of the
> box.
> 
> I tested it on a dataset of 3,000 real Kazakh news articles. On the same
> query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search,
> with a 23% improvement in recall overall.
> 
> You can install it with a single command via deb package or Docker image,
> no compilation needed.
> 
> Repo: https://github.com/darkhanakh/pg-kazsearch
> 
> I'd appreciate any feedback, especially from anyone working on text search
> internals or with experience supporting non-Latin or agglutinative
> languages in PostgreSQL.
> 
> Thanks, Darkhan
> 

Hello,

Thanks for your work.
I don't know anything about Kazakh.

But have you try to add it to Snowball stemmer [1] ?
As Postgres uses it, you have more chances to have Kazakh
supported in future versions.


1: https://github.com/snowballstem/snowball

-- 
Adrien NAYRAT
https://pro.anayrat.info






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: pg_kazsearch: Full-text search extension for Kazakh language
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox