Content-Type: multipart/alternative;
 boundary="===============2621389286423571655=="
MIME-Version: 1.0
Subject: pg_tre 1.1.1 released -- an approximate-REGEX index AM for PostgreSQL
 18+
To: PostgreSQL Announce <pgsql-announce@lists.postgresql.org>
From: Greg Burd via PostgreSQL Announce <announce-noreply@postgresql.org>
Reply-To: greg@burd.me
Date: Fri, 22 May 2026 18:51:48 +0000
Message-ID: <177947590891.802.11629769603682771181@wrigleys.postgresql.org>
Auto-Submitted: auto-generated
Archived-At: 
 <https://www.postgresql.org/message-id/177947590891.802.11629769603682771181%40wrigleys.postgresql.org>
Precedence: bulk

--===============2621389286423571655==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

I am pleased to announce the first public release of [pg_tre] (https://code=
berg.org/gregburd/pg_tre), a native PostgreSQL 18+ index access method for =
approximate-regex matching.

pg_tre indexes text columns through a three-tier filter funnel (BRIN-style =
range bloom -> sparsemap trigram postings -> per-tuple bloom) backed by Vil=
le Laurikari's TRE library for the heap recheck. The result is genuine Leve=
nshtein-distance regex matching ("find text within k edits of this pattern"=
) driven through a real IndexAmRoutine, with WAL coverage, VACUUM awareness=
, and REINDEX CONCURRENTLY support.

Highlights
----------

  * Custom IndexAmRoutine registered as USING tre, with custom
    rmgr (id 140) and full crash-recovery / streaming-replication
    coverage validated by TAP tests.

  * Edit-distance regex with per-subexpression budgets, e.g.
    body %~~ tre_pattern('(error){~1}.*(42[0-9]){~0}', 1)
    runs as a single indexed bitmap heap scan.

  * UTF-8 codepoint trigrams: CJK, accented characters, and
    emoji are indexed correctly; ASCII pays zero overhead.

  * DoS protection via configurable caps on NFA states, compile
    time, and per-match runtime.

  * Three-tier funnel cuts heap I/O dramatically: queries that
    return a handful of rows out of millions typically finish in
    sub-millisecond time.

  * Backward-compatible tre_amatch* UDFs from 0.1.0 are
    preserved.

How pg_tre fits alongside what you already have
-----------------------------------------------

PostgreSQL already ships strong text-search primitives. pg_tre is meant to =
complement them, not replace them.

  * pg_trgm (GIN/GiST):  exact regex, LIKE, and trigram-set
    similarity. Battle-tested. pg_trgm % is Jaccard similarity
    over trigram sets, which is not the same as edit distance:
    two strings with overlapping trigrams can score "similar"
    even when their Levenshtein distance is huge. Use pg_trgm
    when you need exact-substring or LIKE acceleration.

  * tsvector / tsquery (built-in FTS):  word-level linguistic
    search with stemming, stopwords, ranking, and language
    configuration. Use FTS for natural-language prose. pg_tre
    is language-agnostic, which is a feature for identifiers,
    SKUs, error codes, and log lines, and a non-feature for
    "running" matching "run".

  * pgvector / pgvectorscale:  semantic similarity over float
    embeddings. Orthogonal to pg_tre -- meaning vs. lexical
    structure. The two compose naturally as a hybrid filter
    (lexical pre-filter with pg_tre, semantic rank with
    pgvector).

  * pg_tre:  approximate regex with explicit edit budgets and
    full regex semantics (character classes, alternation,
    anchors, {m,n} repetition) composable with the {~k} edit
    operator. No other in-tree or out-of-tree PostgreSQL
    extension answers "is this text within N edits of this
    regex?" through an index.

A few things only pg_tre does well
----------------------------------

  * Typo-tolerant log and trace search:
    body %~~ tre_pattern('(timeout){~1}.*(connection){~1}', 1)
    finds "timeoutt" and "conection" in the same query.

  * Catalog and SKU lookup with edit tolerance:
    sku %~~ tre_pattern('AB-9?[0-9]{4}', 1)
    catches dropped digits or transposed characters without
    expensive post-filtering.

  * Per-phrase edit budgets in a single index query:
    body %~~ tre_pattern('(postgres){~2}.*(system){~0}', 0)
    expresses "postgres-ish" with strict "system" -- one round
    trip, no application-level reranking.

  * Hybrid retrieval for agent / RAG pipelines: pg_tre is the
    fuzzy-lexical leg that pg_trgm, FTS, and pgvector cannot
    cover on their own. An LLM-generated identifier with a
    typo, an OCR error in a scanned document, a near-duplicate
    error code -- pg_tre catches them all without sacrificing
    regex expressivity.

Installation
------------

    -- Requires shared_preload_libraries =3D 'pg_tre'
    CREATE EXTENSION pg_tre;

    CREATE INDEX docs_body_tre ON docs USING tre (body);

    SELECT id FROM docs
     WHERE body %~~ tre_pattern('database', 1);

A note on stability and feedback
--------------------------------

pg_tre is new software. The on-disk format is stable from 1.0.0 forward (1.=
1.x is byte-compatible with 1.0.0), the SQL surface is fixed, the WAL recor=
ds are versioned, and the test suite covers the storage, query, recovery, a=
nd replication paths -- but the project is young and almost certainly has b=
ugs that the existing tests do not yet provoke. It is not a beta and not a =
research toy; it is real software released early, and it should be used wit=
h appropriate caution.

Bug reports, feature requests, and pull requests are very welcome. If you c=
an attach a reproduction case (a minimal schema and a query that misbehaves=
), that is the most useful form, but anything is better than nothing.

Links
-----

* Repository:   [https://codeberg.org/gregburd/pg_tre](https://codeberg.org=
/gregburd/pg_tre)=20
* Issues:       [https://codeberg.org/gregburd/pg_tre/issues](https://codeb=
erg.org/gregburd/pg_tre/issues)
* TRE library:  [https://github.com/laurikari/tre](https://github.com/lauri=
kari/tre)
--===============2621389286423571655==
Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable


<!doctype html>
<html>
  <head>
    <meta name=3D"viewport" content=3D"width=3Ddevice-width">
    <meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8=
">
    <title>pg_tre 1.1.1 released -- an approximate-REGEX index AM for Postg=
reSQL 18+</title>
    <style>

    @media only screen and (max-width: 620px) {
      table[class=3Dbody] h1 {
        font-size: 28px !important;
        margin-bottom: 10px !important;
      }
      table[class=3Dbody] p,
            table[class=3Dbody] ul,
            table[class=3Dbody] ol,
            table[class=3Dbody] td,
            table[class=3Dbody] span,
            table[class=3Dbody] a {
        font-size: 16px !important;
      }
      table[class=3Dbody] .wrapper,
            table[class=3Dbody] .article {
        padding: 10px !important;
      }
      table[class=3Dbody] .content {
        padding: 0 !important;
      }
      table[class=3Dbody] .container {
        padding: 0 !important;
        width: 100% !important;
      }
      table[class=3Dbody] .main {
        border-left-width: 0 !important;
        border-radius: 0 !important;
        border-right-width: 0 !important;
      }
      table[class=3Dbody] .btn table {
        width: 100% !important;
      }
      table[class=3Dbody] .btn a {
        width: 100% !important;
      }
      table[class=3Dbody] .img-responsive {
        height: auto !important;
        max-width: 100% !important;
        width: auto !important;
      }
    }

    @media all {
      .ExternalClass {
        width: 100%;
      }
      .ExternalClass,
            .ExternalClass p,
            .ExternalClass span,
            .ExternalClass font,
            .ExternalClass td,
            .ExternalClass div {
        line-height: 100%;
      }
      .apple-link a {
        color: inherit !important;
        font-family: inherit !important;
        font-size: inherit !important;
        font-weight: inherit !important;
        line-height: inherit !important;
        text-decoration: none !important;
      }
      #MessageViewBody a {
        color: inherit;
        text-decoration: none;
        font-size: inherit;
        font-family: inherit;
        font-weight: inherit;
        line-height: inherit;
      }
      .btn-primary table td:hover {
        background-color: #34495e !important;
      }
      .btn-primary a:hover {
        background-color: #34495e !important;
        border-color: #34495e !important;
      }
    }
    </style>
  </head>
  <body class=3D"" style=3D"background-color: #f6f6f6; font-family: sans-se=
rif; -webkit-font-smoothing: antialiased; font-size: 14px; line-height: 1.4=
; margin: 0; padding: 0; -ms-text-size-adjust: 100%; -webkit-text-size-adju=
st: 100%;">
    <table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" class=3D"body" =
style=3D"border-collapse: separate; mso-table-lspace: 0pt; mso-table-rspace=
: 0pt; width: 100%; background-color: #f6f6f6;">
      <tr>
        <td style=3D"font-family: sans-serif; font-size: 14px; vertical-ali=
gn: top;">&nbsp;</td>
        <td class=3D"container" style=3D"font-family: sans-serif; font-size=
: 14px; vertical-align: top; display: block; Margin: 0 auto; max-width: 580=
px; padding: 10px; width: 580px;">
          <div class=3D"content" style=3D"box-sizing: border-box; display: =
block; Margin: 0 auto; max-width: 580px; padding: 10px;">


            <span class=3D"preheader" style=3D"color: transparent; display:=
 none; height: 0; max-height: 0; max-width: 0; opacity: 0; overflow: hidden=
; mso-hide: all; visibility: hidden; width: 0;"></span>
            <table class=3D"main" style=3D"border-collapse: separate; mso-t=
able-lspace: 0pt; mso-table-rspace: 0pt; width: 100%; background: #ffffff; =
border-radius: 3px;">


              <tr>
                <td class=3D"wrapper" style=3D"font-family: sans-serif; fon=
t-size: 14px; vertical-align: top; box-sizing: border-box; padding: 20px;">
                  <table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" s=
tyle=3D"border-collapse: separate; mso-table-lspace: 0pt; mso-table-rspace:=
 0pt; width: 100%;">
                    <tr>
                      <td style=3D"font-family: sans-serif; font-size: 14px=
; vertical-align: top;">

<div>
<h1 style=3D"color: #000; font-family: sans-serif; line-height: 1.4; margin=
: 0; margin-bottom: 30px; font-size: 25px; font-weight: 300; text-align: ce=
nter">pg_tre 1.1.1 released -- an approximate-REGEX index AM for PostgreSQL=
 18+</h1>
</div>
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">I am pleased to announce the first public r=
elease of [pg_tre] (https://codeberg.org/gregburd/pg_tre), a native Postgre=
SQL 18+ index access method for approximate-regex matching.</p>
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">pg_tre indexes text columns through a three=
-tier filter funnel (BRIN-style range bloom -&gt; sparsemap trigram posting=
s -&gt; per-tuple bloom) backed by Ville Laurikari's TRE library for the he=
ap recheck. The result is genuine Levenshtein-distance regex matching ("fin=
d text within k edits of this pattern") driven through a real IndexAmRoutin=
e, with WAL coverage, VACUUM awareness, and REINDEX CONCURRENTLY support.</=
p>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">Highlights</h2>
<ul style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal;=
 margin: 0; margin-bottom: 15px">
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Custom IndexAmRoutine registered as USING t=
re, with custom
    rmgr (id 140) and full crash-recovery / streaming-replication
    coverage validated by TAP tests.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Edit-distance regex with per-subexpression =
budgets, e.g.
    body %~~ tre_pattern('(error){~1}.*(42[0-9]){~0}', 1)
    runs as a single indexed bitmap heap scan.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">UTF-8 codepoint trigrams: CJK, accented cha=
racters, and
    emoji are indexed correctly; ASCII pays zero overhead.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">DoS protection via configurable caps on NFA=
 states, compile
    time, and per-match runtime.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Three-tier funnel cuts heap I/O dramaticall=
y: queries that
    return a handful of rows out of millions typically finish in
    sub-millisecond time.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Backward-compatible tre_amatch* UDFs from 0=
.1.0 are
    preserved.</p>
</li>
</ul>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">How pg_tre fits alongside what =
you already have</h2>
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">PostgreSQL already ships strong text-search=
 primitives. pg_tre is meant to complement them, not replace them.</p>
<ul style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal;=
 margin: 0; margin-bottom: 15px">
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">pg_trgm (GIN/GiST):  exact regex, LIKE, and=
 trigram-set
    similarity. Battle-tested. pg_trgm % is Jaccard similarity
    over trigram sets, which is not the same as edit distance:
    two strings with overlapping trigrams can score "similar"
    even when their Levenshtein distance is huge. Use pg_trgm
    when you need exact-substring or LIKE acceleration.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">tsvector / tsquery (built-in FTS):  word-le=
vel linguistic
    search with stemming, stopwords, ranking, and language
    configuration. Use FTS for natural-language prose. pg_tre
    is language-agnostic, which is a feature for identifiers,
    SKUs, error codes, and log lines, and a non-feature for
    "running" matching "run".</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">pgvector / pgvectorscale:  semantic similar=
ity over float
    embeddings. Orthogonal to pg_tre -- meaning vs. lexical
    structure. The two compose naturally as a hybrid filter
    (lexical pre-filter with pg_tre, semantic rank with
    pgvector).</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">pg_tre:  approximate regex with explicit ed=
it budgets and
    full regex semantics (character classes, alternation,
    anchors, {m,n} repetition) composable with the {~k} edit
    operator. No other in-tree or out-of-tree PostgreSQL
    extension answers "is this text within N edits of this
    regex?" through an index.</p>
</li>
</ul>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">A few things only pg_tre does w=
ell</h2>
<ul style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal;=
 margin: 0; margin-bottom: 15px">
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Typo-tolerant log and trace search:
    body %~~ tre_pattern('(timeout){~1}.*(connection){~1}', 1)
    finds "timeoutt" and "conection" in the same query.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Catalog and SKU lookup with edit tolerance:
    sku %~~ tre_pattern('AB-9?[0-9]{4}', 1)
    catches dropped digits or transposed characters without
    expensive post-filtering.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Per-phrase edit budgets in a single index q=
uery:
    body %~~ tre_pattern('(postgres){~2}.*(system){~0}', 0)
    expresses "postgres-ish" with strict "system" -- one round
    trip, no application-level reranking.</p>
</li>
<li style=3D"list-style-position: inside; margin-left: 5px">
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Hybrid retrieval for agent / RAG pipelines:=
 pg_tre is the
    fuzzy-lexical leg that pg_trgm, FTS, and pgvector cannot
    cover on their own. An LLM-generated identifier with a
    typo, an OCR error in a scanned document, a near-duplicate
    error code -- pg_tre catches them all without sacrificing
    regex expressivity.</p>
</li>
</ul>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">Installation</h2>
<pre><code>-- Requires shared_preload_libraries =3D 'pg_tre'
CREATE EXTENSION pg_tre;

CREATE INDEX docs_body_tre ON docs USING tre (body);

SELECT id FROM docs
 WHERE body %~~ tre_pattern('database', 1);
</code></pre>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">A note on stability and feedbac=
k</h2>
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">pg_tre is new software. The on-disk format =
is stable from 1.0.0 forward (1.1.x is byte-compatible with 1.0.0), the SQL=
 surface is fixed, the WAL records are versioned, and the test suite covers=
 the storage, query, recovery, and replication paths -- but the project is =
young and almost certainly has bugs that the existing tests do not yet prov=
oke. It is not a beta and not a research toy; it is real software released =
early, and it should be used with appropriate caution.</p>
<p style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal; =
margin: 0; margin-bottom: 15px">Bug reports, feature requests, and pull req=
uests are very welcome. If you can attach a reproduction case (a minimal sc=
hema and a query that misbehaves), that is the most useful form, but anythi=
ng is better than nothing.</p>
<h2 style=3D"color: #000; font-family: sans-serif; font-weight: 400; line-h=
eight: 1.4; margin: 0; margin-bottom: 30px">Links</h2>
<ul style=3D"font-family: sans-serif; font-size: 14px; font-weight: normal;=
 margin: 0; margin-bottom: 15px">
<li style=3D"list-style-position: inside; margin-left: 5px">Repository:   <=
a href=3D"https://codeberg.org/gregburd/pg_tre" style=3D"color: #3498db; te=
xt-decoration: underline">https://codeberg.org/gregburd/pg_tre</a> </li>
<li style=3D"list-style-position: inside; margin-left: 5px">Issues:       <=
a href=3D"https://codeberg.org/gregburd/pg_tre/issues" style=3D"color: #349=
8db; text-decoration: underline">https://codeberg.org/gregburd/pg_tre/issue=
s</a></li>
<li style=3D"list-style-position: inside; margin-left: 5px">TRE library:  <=
a href=3D"https://github.com/laurikari/tre" style=3D"color: #3498db; text-d=
ecoration: underline">https://github.com/laurikari/tre</a></li>
</ul>

                      </td>
                    </tr>
                  </table>
                </td>
              </tr>

            </table>

            <div class=3D"footer" style=3D"clear: both; Margin-top: 10px; t=
ext-align: center; width: 100%;">
              <table border=3D"0" cellpadding=3D"0" cellspacing=3D"0" style=
=3D"border-collapse: separate; mso-table-lspace: 0pt; mso-table-rspace: 0pt=
; width: 100%;">
                <tr>
                  <td class=3D"content-block" style=3D"font-family: sans-se=
rif; vertical-align: top; padding-bottom: 10px; padding-top: 10px; font-siz=
e: 12px; color: #999999; text-align: center;">
                    <span class=3D"apple-link" style=3D"color: #999999; fon=
t-size: 12px; text-align: center;">
This email was sent to you from Greg Burd. It was delivered on their behalf=
 by
the PostgreSQL project. Any questions about the content of the message shou=
ld be
sent to Greg Burd.
</span>
		    <br><br>
You were sent this email as a subscriber of the <em>pgsql-announce</em> mai=
linglist, for
for one of the content tags Community or Related Open Source.
To unsubscribe from
further emails, or change which emails you want to receive, please click th=
e personal unsubscribe
link that you can find in the headers of this email, or visit
<a href=3D"https://lists.postgresql.org/unsubscribe/" style=3D"color: #3498=
db; text-decoration: underline">https://lists.postgresql.org/unsubscribe/</=
a>.

                  </td>
                </tr>
              </table>
            </div>

          </div>
        </td>
        <td style=3D"font-family: sans-serif; font-size: 14px; vertical-ali=
gn: top;">&nbsp;</td>
      </tr>
    </table>
  </body>
</html>

--===============2621389286423571655==--