public inbox for [email protected]
help / color / mirror / Atom feedFrom: Darkhan <[email protected]>
To: [email protected]
Subject: pg_kazsearch: Full-text search extension for Kazakh language
Date: Sun, 5 Apr 2026 18:32:37 +0500
Message-ID: <CAOW9cEpjUV0fG6u6m86vt8RJOBLymys=k33DWzgEP+0SnXhZGA@mail.gmail.com> (raw)
Hi all,
I built pg_kazsearch, a PostgreSQL extension that adds full-text search
support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or
stop word list available in PostgreSQL, so anyone searching Kazakh text is
stuck with trigram matching or application-level workarounds.
Kazakh is agglutinative — a single word can carry 5-6 suffixes, which makes
standard search approaches miss most relevant results. pg_kazsearch
provides a custom Kazakh stemmer (core written in Rust), a stop word list,
and a text search dictionary that plugs into the standard PostgreSQL FTS
infrastructure — GIN indexes, ts_rank, phrase search all work out of the
box.
I tested it on a dataset of 3,000 real Kazakh news articles. On the same
query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search,
with a 23% improvement in recall overall.
You can install it with a single command via deb package or Docker image,
no compilation needed.
Repo: https://github.com/darkhanakh/pg-kazsearch
I'd appreciate any feedback, especially from anyone working on text search
internals or with experience supporting non-Latin or agglutinative
languages in PostgreSQL.
Thanks, Darkhan
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: pg_kazsearch: Full-text search extension for Kazakh language
In-Reply-To: <CAOW9cEpjUV0fG6u6m86vt8RJOBLymys=k33DWzgEP+0SnXhZGA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox