Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUyg-001b9X-0N for pgsql-announce@arkaria.postgresql.org; Fri, 22 May 2026 18:52:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wQUyd-00ERVN-2C for pgsql-announce@arkaria.postgresql.org; Fri, 22 May 2026 18:52:08 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUyc-00ERV5-20 for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:52:07 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wQUyb-00000000JKP-1nMg for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:52:06 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:From:To:Subject: MIME-Version:Content-Type:Sender:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=LmhqyYuN1A+tF/jmiE7EDIwRqwudaAdJL6d3RKRh918=; b=pfK6OSF6ftVM4GsyjZWz1jPXUJ R4R4jjOsHeCfTitrqaJ1VkhDxiKaDTCTATexFjSX83OELRhU9MSQibrzLGuxUAl6nE9bMrJpu6RTw q4L0MiCMhP/2POCrRiEoTNX3RiIAnREgJfzq6WEvy+PbG/oLtfQClj3m7sa//jeAMdBRH2JbYyqUa un7cEzi+gcWaHxZS+LuEUlAx3XAM8Mj/6mMw9rGJ8VZZs2QmDShCSh0XPJUYetEz0Q7DgrWGmGrEW 1UxcO8qT+7Guj9lT5huVoa17rixVYzBPdzFRkKTMtudSUPYXERIwtSXwe9KKWpQIozqcR6RR76qKl MTCTj3GA==; Received: from wrigleys.postgresql.org ([217.196.149.60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUya-000udD-2q for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:52:05 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wQUyZ-005AzQ-2F for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:52:03 +0000 Content-Type: multipart/alternative; boundary="===============2621389286423571655==" MIME-Version: 1.0 Subject: pg_tre 1.1.1 released -- an approximate-REGEX index AM for PostgreSQL 18+ To: PostgreSQL Announce From: Greg Burd via PostgreSQL Announce Reply-To: greg@burd.me Date: Fri, 22 May 2026 18:51:48 +0000 Message-ID: <177947590891.802.11629769603682771181@wrigleys.postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-pglister-tags: community,related X-pglister-tagsig: ffa0ae267ddfc03c7e7e99d74a948b982d7899c7fd34d4138e1d2830ab101c42 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --===============2621389286423571655== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable I am pleased to announce the first public release of [pg_tre] (https://code= berg.org/gregburd/pg_tre), a native PostgreSQL 18+ index access method for = approximate-regex matching. pg_tre indexes text columns through a three-tier filter funnel (BRIN-style = range bloom -> sparsemap trigram postings -> per-tuple bloom) backed by Vil= le Laurikari's TRE library for the heap recheck. The result is genuine Leve= nshtein-distance regex matching ("find text within k edits of this pattern"= ) driven through a real IndexAmRoutine, with WAL coverage, VACUUM awareness= , and REINDEX CONCURRENTLY support. Highlights ---------- * Custom IndexAmRoutine registered as USING tre, with custom rmgr (id 140) and full crash-recovery / streaming-replication coverage validated by TAP tests. * Edit-distance regex with per-subexpression budgets, e.g. body %~~ tre_pattern('(error){~1}.*(42[0-9]){~0}', 1) runs as a single indexed bitmap heap scan. * UTF-8 codepoint trigrams: CJK, accented characters, and emoji are indexed correctly; ASCII pays zero overhead. * DoS protection via configurable caps on NFA states, compile time, and per-match runtime. * Three-tier funnel cuts heap I/O dramatically: queries that return a handful of rows out of millions typically finish in sub-millisecond time. * Backward-compatible tre_amatch* UDFs from 0.1.0 are preserved. How pg_tre fits alongside what you already have ----------------------------------------------- PostgreSQL already ships strong text-search primitives. pg_tre is meant to = complement them, not replace them. * pg_trgm (GIN/GiST): exact regex, LIKE, and trigram-set similarity. Battle-tested. pg_trgm % is Jaccard similarity over trigram sets, which is not the same as edit distance: two strings with overlapping trigrams can score "similar" even when their Levenshtein distance is huge. Use pg_trgm when you need exact-substring or LIKE acceleration. * tsvector / tsquery (built-in FTS): word-level linguistic search with stemming, stopwords, ranking, and language configuration. Use FTS for natural-language prose. pg_tre is language-agnostic, which is a feature for identifiers, SKUs, error codes, and log lines, and a non-feature for "running" matching "run". * pgvector / pgvectorscale: semantic similarity over float embeddings. Orthogonal to pg_tre -- meaning vs. lexical structure. The two compose naturally as a hybrid filter (lexical pre-filter with pg_tre, semantic rank with pgvector). * pg_tre: approximate regex with explicit edit budgets and full regex semantics (character classes, alternation, anchors, {m,n} repetition) composable with the {~k} edit operator. No other in-tree or out-of-tree PostgreSQL extension answers "is this text within N edits of this regex?" through an index. A few things only pg_tre does well ---------------------------------- * Typo-tolerant log and trace search: body %~~ tre_pattern('(timeout){~1}.*(connection){~1}', 1) finds "timeoutt" and "conection" in the same query. * Catalog and SKU lookup with edit tolerance: sku %~~ tre_pattern('AB-9?[0-9]{4}', 1) catches dropped digits or transposed characters without expensive post-filtering. * Per-phrase edit budgets in a single index query: body %~~ tre_pattern('(postgres){~2}.*(system){~0}', 0) expresses "postgres-ish" with strict "system" -- one round trip, no application-level reranking. * Hybrid retrieval for agent / RAG pipelines: pg_tre is the fuzzy-lexical leg that pg_trgm, FTS, and pgvector cannot cover on their own. An LLM-generated identifier with a typo, an OCR error in a scanned document, a near-duplicate error code -- pg_tre catches them all without sacrificing regex expressivity. Installation ------------ -- Requires shared_preload_libraries =3D 'pg_tre' CREATE EXTENSION pg_tre; CREATE INDEX docs_body_tre ON docs USING tre (body); SELECT id FROM docs WHERE body %~~ tre_pattern('database', 1); A note on stability and feedback -------------------------------- pg_tre is new software. The on-disk format is stable from 1.0.0 forward (1.= 1.x is byte-compatible with 1.0.0), the SQL surface is fixed, the WAL recor= ds are versioned, and the test suite covers the storage, query, recovery, a= nd replication paths -- but the project is young and almost certainly has b= ugs that the existing tests do not yet provoke. It is not a beta and not a = research toy; it is real software released early, and it should be used wit= h appropriate caution. Bug reports, feature requests, and pull requests are very welcome. If you c= an attach a reproduction case (a minimal schema and a query that misbehaves= ), that is the most useful form, but anything is better than nothing. Links ----- * Repository: [https://codeberg.org/gregburd/pg_tre](https://codeberg.org= /gregburd/pg_tre)=20 * Issues: [https://codeberg.org/gregburd/pg_tre/issues](https://codeb= erg.org/gregburd/pg_tre/issues) * TRE library: [https://github.com/laurikari/tre](https://github.com/lauri= kari/tre) --===============2621389286423571655== Content-Type: text/html; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable pg_tre 1.1.1 released -- an approximate-REGEX index AM for Postg= reSQL 18+
 

pg_tre 1.1.1 released -- an approximate-REGEX index AM for PostgreSQL= 18+

I am pleased to announce the first public r= elease of [pg_tre] (https://codeberg.org/gregburd/pg_tre), a native Postgre= SQL 18+ index access method for approximate-regex matching.

pg_tre indexes text columns through a three= -tier filter funnel (BRIN-style range bloom -> sparsemap trigram posting= s -> per-tuple bloom) backed by Ville Laurikari's TRE library for the he= ap recheck. The result is genuine Levenshtein-distance regex matching ("fin= d text within k edits of this pattern") driven through a real IndexAmRoutin= e, with WAL coverage, VACUUM awareness, and REINDEX CONCURRENTLY support.

Highlights

  • Custom IndexAmRoutine registered as USING t= re, with custom rmgr (id 140) and full crash-recovery / streaming-replication coverage validated by TAP tests.

  • Edit-distance regex with per-subexpression = budgets, e.g. body %~~ tre_pattern('(error){~1}.*(42[0-9]){~0}', 1) runs as a single indexed bitmap heap scan.

  • UTF-8 codepoint trigrams: CJK, accented cha= racters, and emoji are indexed correctly; ASCII pays zero overhead.

  • DoS protection via configurable caps on NFA= states, compile time, and per-match runtime.

  • Three-tier funnel cuts heap I/O dramaticall= y: queries that return a handful of rows out of millions typically finish in sub-millisecond time.

  • Backward-compatible tre_amatch* UDFs from 0= .1.0 are preserved.

How pg_tre fits alongside what = you already have

PostgreSQL already ships strong text-search= primitives. pg_tre is meant to complement them, not replace them.

  • pg_trgm (GIN/GiST): exact regex, LIKE, and= trigram-set similarity. Battle-tested. pg_trgm % is Jaccard similarity over trigram sets, which is not the same as edit distance: two strings with overlapping trigrams can score "similar" even when their Levenshtein distance is huge. Use pg_trgm when you need exact-substring or LIKE acceleration.

  • tsvector / tsquery (built-in FTS): word-le= vel linguistic search with stemming, stopwords, ranking, and language configuration. Use FTS for natural-language prose. pg_tre is language-agnostic, which is a feature for identifiers, SKUs, error codes, and log lines, and a non-feature for "running" matching "run".

  • pgvector / pgvectorscale: semantic similar= ity over float embeddings. Orthogonal to pg_tre -- meaning vs. lexical structure. The two compose naturally as a hybrid filter (lexical pre-filter with pg_tre, semantic rank with pgvector).

  • pg_tre: approximate regex with explicit ed= it budgets and full regex semantics (character classes, alternation, anchors, {m,n} repetition) composable with the {~k} edit operator. No other in-tree or out-of-tree PostgreSQL extension answers "is this text within N edits of this regex?" through an index.

A few things only pg_tre does w= ell

  • Typo-tolerant log and trace search: body %~~ tre_pattern('(timeout){~1}.*(connection){~1}', 1) finds "timeoutt" and "conection" in the same query.

  • Catalog and SKU lookup with edit tolerance: sku %~~ tre_pattern('AB-9?[0-9]{4}', 1) catches dropped digits or transposed characters without expensive post-filtering.

  • Per-phrase edit budgets in a single index q= uery: body %~~ tre_pattern('(postgres){~2}.*(system){~0}', 0) expresses "postgres-ish" with strict "system" -- one round trip, no application-level reranking.

  • Hybrid retrieval for agent / RAG pipelines:= pg_tre is the fuzzy-lexical leg that pg_trgm, FTS, and pgvector cannot cover on their own. An LLM-generated identifier with a typo, an OCR error in a scanned document, a near-duplicate error code -- pg_tre catches them all without sacrificing regex expressivity.

Installation

-- Requires shared_preload_libraries =3D 'pg_tre'
CREATE EXTENSION pg_tre;

CREATE INDEX docs_body_tre ON docs USING tre (body);

SELECT id FROM docs
 WHERE body %~~ tre_pattern('database', 1);

A note on stability and feedbac= k

pg_tre is new software. The on-disk format = is stable from 1.0.0 forward (1.1.x is byte-compatible with 1.0.0), the SQL= surface is fixed, the WAL records are versioned, and the test suite covers= the storage, query, recovery, and replication paths -- but the project is = young and almost certainly has bugs that the existing tests do not yet prov= oke. It is not a beta and not a research toy; it is real software released = early, and it should be used with appropriate caution.

Bug reports, feature requests, and pull req= uests are very welcome. If you can attach a reproduction case (a minimal sc= hema and a query that misbehaves), that is the most useful form, but anythi= ng is better than nothing.

Links

  • Repository: <= a href=3D"https://codeberg.org/gregburd/pg_tre" style=3D"color: #3498db; te= xt-decoration: underline">https://codeberg.org/gregburd/pg_tre
  • Issues: <= a href=3D"https://codeberg.org/gregburd/pg_tre/issues" style=3D"color: #349= 8db; text-decoration: underline">https://codeberg.org/gregburd/pg_tre/issue= s
  • TRE library: <= a href=3D"https://github.com/laurikari/tre" style=3D"color: #3498db; text-d= ecoration: underline">https://github.com/laurikari/tre
This email was sent to you from Greg Burd. It was delivered on their behalf= by the PostgreSQL project. Any questions about the content of the message shou= ld be sent to Greg Burd.

You were sent this email as a subscriber of the pgsql-announce mai= linglist, for for one of the content tags Community or Related Open Source. To unsubscribe from further emails, or change which emails you want to receive, please click th= e personal unsubscribe link that you can find in the headers of this email, or visit https://lists.postgresql.org/unsubscribe/.
 
--===============2621389286423571655==--