Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9Naz-001OCy-2I for pgsql-general@arkaria.postgresql.org; Sun, 05 Apr 2026 13:32:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w9Naw-002sB7-2q for pgsql-general@arkaria.postgresql.org; Sun, 05 Apr 2026 13:32:55 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w9Naw-002sAy-1O for pgsql-general@lists.postgresql.org; Sun, 05 Apr 2026 13:32:54 +0000 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w9Nau-00000000gUS-2KFe for pgsql-general@lists.postgresql.org; Sun, 05 Apr 2026 13:32:53 +0000 Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-5a10d130b37so3078643e87.0 for ; Sun, 05 Apr 2026 06:32:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775395969; cv=none; d=google.com; s=arc-20240605; b=JHhZ/xgp3PxwLKsJzqOkadvDPIwO2Ve60wzSdwWlmfIkxV76ARpSPeml++jbGTui4J 9b+K655OtYbonR09Q6gnHzuOWwo/5wa9THri1WUS+34y1tYnB3AHJPlIQtwtciI/MMu7 GEO8tgtxZ/HTF2G9rqo/LNM3Kdq/DQkAjk6dlFixRIEQoA0r8Z1rmvOSj+xgKOeQRv+C XO9a2GNlgHOIA/BXdxWUvohK1QRYkDEiMuaJR9TuEP9qNPi1tghzzEIXssBAnLyVQYC4 xXxozhQdD3IiBJ2RJSKp8bUsjKLHRYRhGJ9MWnSp+56cb9BzPnDx4hnqFNlhJVvrqJ+0 g+JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=rCy2QxYzCvx+wXbqselB8swNC4NaNqJbmWLgM06gtrs=; fh=4rCG0PM8n0FOokGy8sSWDJpgPdCgp6yIXcpABJ7tUh0=; b=aDY67OEeyXectyPP4SIC/RUBUWEF05C7j55oVsRR2a3+MQVnyt+EqrrjXdmrEDQdio uI4tvuaedIo/eO1/886ySrcY0YGFXDOgNzpgBV5wxPa3f/TQ2iuMiAPdJ4wle/NVZkK7 LVJHXXV4/VQDwvEEob4F4M8rDZmfg1Zr5BDuQ4uXQtnywuUfgsTAw1fIO7GCOSxy8W6I WK37RjfmKFy1Jdh/T1myWTPmLJ8nV12nYDhesFWGd2I+Wskx2QP3qLvCr0uQSpaZUGAb Yf15XeiiuTxXme8iRhbVUfxkjlwor+gzmaQ4e9cNxLhzAsL0k6ssZs/Pw8zGFlYTsH2W TilQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775395969; x=1776000769; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=rCy2QxYzCvx+wXbqselB8swNC4NaNqJbmWLgM06gtrs=; b=YXYCpG6msvNEEViM/wbFVy4oLToEZzqG8Cy/xR3T8beVb2RHnzoksnaoShWO35KVaU hlFsvBozvuaYwqMAvLGi0l9OapyUU0C53D9TcqQmq1Z0NM/hoYgpHbLS6At9Qv1VFtRu F31Zc5eNUvIHCQgFg4Wx8YpNXxIKUM7DOvlB56QTgV3zhaiQfz1sq/eCBLWFDgXJ1iCe EIq7cY5oTTf3kjCyE9d+klQP4o7ZxhaGRqBTRTeX05zic5bRMUyLC9ch+PmaQL6GaNSj 2SVu7AOPmZ8XWqdexo1H0XaWqWZbNbImuwXsNkKVIXLBVsUSym5Yi0t52O1JPIInbIcE i+TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775395969; x=1776000769; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rCy2QxYzCvx+wXbqselB8swNC4NaNqJbmWLgM06gtrs=; b=G8N/ImRp/y+f7EnthMs7TgLf7pnDrZ1aNtNCA1UNk+p33+TcQSztqm6uoIbSjM+4q3 OByRj44jFa/5cJhz5JTX6Qh7KlqW+gukUlsrZSEPJtA01FHC59EHlB8flvCGt8DE5y+Y JDaXPW8RmH7/D+G8l/xJrBl3qhL+aBlp9KIBc+knbjQPGcqX5EwjCc5cu+ckdmMS7pVO LtOerbZD9UkRVQNKY4ie1tDZD+81EmsS0ePHlesezzSsQ2nPoJNmoHi8cp36na1m+l07 3DwzLE7xoGXtKj9vAuQzuXiYCG0U8Odq34WLY+aiWJkivDBUpk2qv3RMK02N3wK9YvpU UpBg== X-Gm-Message-State: AOJu0YxV8KAFRHOb4HVO4hrVlg370TobX06qXsVBhX9Wm2LJVZzgR89p 6e199ZPaiemUGbtMXTxOiRGZ1ABXixCS67XIzrvXlp/yEZ/wW4b8AL8J3Rf9QaQ/Rb9EoBPj9Ep 4fBTu4Hh1f0TEAi3e1CHmVANbS6IPLmJQGRTS X-Gm-Gg: AeBDiesSbGsSan3e3Gd+qbErKliBudWKUuWTWtPZSh1GmWDsrPOEn//3N7D5N2YwHGu TpnS7+gXWlzy3Dvcssba2cVND4oK7R5wkpVmNxDZqVrll6jhmmlpxYs3MZSZQKxFJbU6WwYzMRZ cU9WQio9DP5T/GpPbO8ep7HTVZyYCH10a/btnYoAOvlII4xKJbepnAZNf6EBdypAtrPbr6DBRwr V4hwcS4bXf5PYrVYLboMVxqo9IYuFxiTjhwKwqTZ3KI027CBCW831Mk4n55aK2eODDyq08LUYvU GLhhwVq2 X-Received: by 2002:a05:6512:1287:b0:5a2:a3da:fad7 with SMTP id 2adb3069b0e04-5a32f6c057dmr2497875e87.2.1775395968837; Sun, 05 Apr 2026 06:32:48 -0700 (PDT) MIME-Version: 1.0 From: Darkhan Date: Sun, 5 Apr 2026 18:32:37 +0500 X-Gm-Features: AQROBzB07LUgj03HZI5ewGLhp-6saU-Qpyw1UR9kRaY3H6I34ykkawOjsTzuJCQ Message-ID: Subject: pg_kazsearch: Full-text search extension for Kazakh language To: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000000eefe5064eb696d8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000eefe5064eb696d8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, I built pg_kazsearch, a PostgreSQL extension that adds full-text search support for Kazakh. Currently there's no Kazakh dictionary, stemmer, or stop word list available in PostgreSQL, so anyone searching Kazakh text is stuck with trigram matching or application-level workarounds. Kazakh is agglutinative =E2=80=94 a single word can carry 5-6 suffixes, whi= ch makes standard search approaches miss most relevant results. pg_kazsearch provides a custom Kazakh stemmer (core written in Rust), a stop word list, and a text search dictionary that plugs into the standard PostgreSQL FTS infrastructure =E2=80=94 GIN indexes, ts_rank, phrase search all work out o= f the box. I tested it on a dataset of 3,000 real Kazakh news articles. On the same query, pg_kazsearch returns 61 relevant articles vs 1 with trigram search, with a 23% improvement in recall overall. You can install it with a single command via deb package or Docker image, no compilation needed. Repo: https://github.com/darkhanakh/pg-kazsearch I'd appreciate any feedback, especially from anyone working on text search internals or with experience supporting non-Latin or agglutinative languages in PostgreSQL. Thanks, Darkhan --0000000000000eefe5064eb696d8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi all,

I built pg_kazsearch, a PostgreSQL extensio= n that adds full-text search support for Kazakh. Currently there's no K= azakh dictionary, stemmer, or stop word list available in PostgreSQL, so an= yone searching Kazakh text is stuck with trigram matching or application-le= vel workarounds.

Kazakh is agglutinative =E2=80=94 a single = word can carry 5-6 suffixes, which makes standard search approaches miss mo= st relevant results. pg_kazsearch provides a custom Kazakh stemmer (core wr= itten in Rust), a stop word list, and a text search dictionary that plugs i= nto the standard PostgreSQL FTS infrastructure =E2=80=94 GIN indexes, ts_ra= nk, phrase search all work out of the box.

I tested it on a dataset of 3,000 real Kaza= kh news articles. On the same query, pg_kazsearch returns 61 relevant artic= les vs 1 with trigram search, with a 23% improvement in recall overall.

You can install it with a single command vi= a deb package or Docker image, no compilation needed.

Repo: https://github.com/darkhanakh/p= g-kazsearch

I'd appreciate any feedback, especially= from anyone working on text search internals or with experience supporting= non-Latin or agglutinative languages in PostgreSQL.

Thanks, Darkhan

--0000000000000eefe5064eb696d8--