Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUwl-001b8C-06 for pgsql-announce@arkaria.postgresql.org; Fri, 22 May 2026 18:50:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wQUwi-00EGnP-3B for pgsql-announce@arkaria.postgresql.org; Fri, 22 May 2026 18:50:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUwi-00EGn8-0i for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:50:09 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wQUwg-00000000vrv-143Z for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:50:08 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:From:To:Subject: MIME-Version:Content-Type:Sender:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=lG2ivyfdbz8gmvJlDQqrk8AaYqP/g145jma3zEh9FB0=; b=lt330Du+LAhWtKqmbz52v7plL/ bkziSXR4cH7ioG4ia+8McY7b8H62c+zW3jybS0Sk0eqJWr0RGx1O2vwiEHvEYmFIc/WmGW7yE8LYZ iZ+Ll3ZD9F9zHFhE78QVVc7exM7pCoHCagmndWWi/OeoihjFlHXI0kSCLgo0CL8an+uXagV390QTQ oliKsnngrsjbtZ4j/xHSmPMs7X0cM6L8fNhEZKjKVS5V9YNC+H7Bodl9wU80wHHZI1P43Kc6HWS1d U98QHw7lp2HEuMyxVxmfQW0Uemi5fnGo4N62+CAgr+Jid9ueZB632f/6mgJMI/oIm41fPQD05+D96 KrkHRKIQ==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUwe-000uas-2E for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:50:04 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wQUwd-005Apu-26 for pgsql-announce@lists.postgresql.org; Fri, 22 May 2026 18:50:03 +0000 Content-Type: multipart/alternative; boundary="===============7060529155987209325==" MIME-Version: 1.0 Subject: pg_infer 1.0.0 released -- transformer model knowledge as SQL relations To: PostgreSQL Announce From: Greg Burd via PostgreSQL Announce Reply-To: greg@burd.me Date: Fri, 22 May 2026 18:49:09 +0000 Message-ID: <177947574985.794.11485875377229486447@wrigleys.postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-pglister-tags: related X-pglister-tagsig: daa2b85ce507f1986cc342dfd796409443e51dd0316cff5b238225079ee6b688 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --===============7060529155987209325== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable I am pleased to announce the first public release of [pg_infer](https://cod= eberg.org/gregburd/pg_infer), a PostgreSQL 18+ extension that exposes the internals of small transformer language models -= - gate activations, feature labels, learned associations, embeddings -- as = SQL-queryable relations and a custom index access method. pg_infer is not "natural language to SQL." It is not "SQL to natural langua= ge." There is no chat interface, no agent loop, no prompt template generati= ng queries. pg_infer brings model inference into the query plan as an opera= tor the planner can cost, schedule, parallelize, and combine with ordinary = predicates and joins. The model becomes a first-class data source -- a set = of relations the planner can scan, filter, and join -- not an external serv= ice the database talks to. Quick example ------------- -- Register a vindex (extracted model knowledge): `SELECT infer_create_model('qwen05b', '/data/qwen-0.5b.vindex');` -- What does the model know about France? ``` SELECT * FROM describe('France'); -- relation | target | confidence | layer -- -----------+-----------+------------+------- -- capital | Paris | 42.7 | 18 -- language | French | 38.1 | 17 -- continent | Europe | 35.4 | 16 -- ... ``` -- `ORDER BY` model-knowledge similarity: ``` CREATE INDEX ON documents USING infer (title) WITH (model =3D 'qwen05b'); SELECT * FROM documents ORDER BY title <~> 'artificial intelligence' LIMIT 5; ``` The `<~>` operator is index-backed, supports EXPLAIN (ANALYZE, BUFFERS), and composes with WHERE, JOIN, aggregation, and partitioning the way any other operator does. What pg_infer does that other extensions do not ----------------------------------------------- * pgvector / pgvectorscale stores user-supplied embedding vectors and answers nearest-neighbour distance queries. pg_infer goes a layer deeper: it stores the model itself (gate vectors, feature activations, label metadata) in WAL-logged 8KB pages, and answers questions like "does the model treat A and B as related?" -- not "are these two embeddings close?" * pg_search / RAG-style integrations turn user queries into embedding lookups against external vector stores. pg_infer exposes the model's internal structure to SQL: walk(prompt) returns per-layer feature activations; describe(entity) returns the relations the model has learned about an entity; implies(a, b) tests directional support. * pg_infer's index AM ships in two modes: - "model" indexes WAL-log the entire vindex inside PostgreSQL pages, so backup, replication, and point-in-time recovery cover the model the same way they cover your tables. - "column" indexes attach a model to a text column and make ORDER BY <~> on that column index-driven. * The mmap-backed local backend lets multiple PG backends share decoded model pages through the OS page cache; the remote backend (larql-server / larql-router over HTTP/2 or a Unix socket) shares one copy of the model across a host and supports layer-sharded routing. In-flight remote calls respond to pg_cancel_backend(...) within roughly 100ms. CPU inference, BitNet, and idle-cluster compute ----------------------------------------------- Database servers almost never have GPUs. They have a lot of fast cores, a l= ot of RAM, and -- on most production deployments -- standby replicas, read-= only physical replicas, logical subscribers, and DR hosts that spend most o= f the day at single-digit CPU utilization while the primary takes the write= traffic. pg_infer and the underlying larql crates target this hardware profile direc= tly: * The default execution paths run efficiently on CPU, with BLAS-backed linear algebra (OpenBLAS) and f16 gate vectors that decode to f32 lazily. * pg_infer / larql support models in the Microsoft BitNet b1.58 family ("two-bit / 1.58-bit" ternary-weight transformers, https://arxiv.org/abs/2402.17764), which were specifically designed to run on commodity CPUs at competitive quality and dramatically lower memory and power cost than f16 baselines. Combined with f16 gate activations, this brings useful inference inside a PostgreSQL backend without any specialized accelerator. * The cluster model is the point. A typical PostgreSQL HA / DR / read-scale deployment has one busy primary and one or more largely idle physical replicas, plus, increasingly, a fleet of logical subscribers. Those replicas already pay for themselves in availability, but their CPUs are idle the vast majority of the time. With pg_infer's remote backend, larql-server runs on the replica hosts and serves model operators back to the primary's query plans -- the model is materialized once per host, the activation cache is shared, and the work happens on capacity you have already paid for. No GPU, no separate inference cluster, no extra network egress. A few queries that are uniquely pg_infer ---------------------------------------- * Model-aware document ranking that does not depend on keyword overlap or pre-computed embeddings: ``` SELECT id, title FROM papers ORDER BY title <~> 'neural architecture search' LIMIT 10; ``` This finds "AutoML for Deep Networks" because the model learned that relationship -- pg_trgm cannot, FTS cannot, and pgvector can only do so if you computed and stored embeddings ahead of time with a model whose semantics happen to agree with your query. * Joining model knowledge with relational data: ``` SELECT c.id, c.name, p.title, p.title <~> c.research_interest AS dist FROM candidates c JOIN papers p ON p.title <~> c.research_interest < 0.2 WHERE c.country =3D 'DE'; ``` Standard SQL semantics, standard PostgreSQL planner, plus a model-driven join condition. * Probing what a model "knows" without running it: ``` SELECT relation, target, confidence FROM describe('PostgreSQL') WHERE confidence > 30; ``` * Auditing model behaviour over time. Because the vindex is stored in WAL-logged pages, point-in-time recovery on a pg_infer-using cluster gives you the model state at any historical moment alongside the data state. "What was the model saying about this entity at 03:14 UTC last Tuesday?" is a literal PITR + describe(...) question. Acknowledgements ---------------- pg_infer would not exist without the LARQL project by Chris Hayuk (https://= github.com/chrishayuk). LARQL pioneered the idea of making transformer mode= l internals queryable -- extracting gate vectors, feature activations, and = learned associations into a format ("vindex") that can be explored interact= ively and expressed as a query language. The vindex format, the gate KNN al= gorithm, and the feature-labeling pipeline all originate from LARQL; pg_inf= er adapts them into a PostgreSQL access method, a WAL-logged storage layer,= and a planner-visible operator. If the larql ideas resonate, please look at the original work and at Chris'= s video walkthroughs explaining the vindex format, the gate-KNN algorithm, = and the LARQL query language: * Chris Hayuk on YouTube: https://www.youtube.com/@chrishayuk * Original LARQL repo: https://github.com/chrishayuk/larql * larql-rs (Rust port): https://github.com/chrishayuk/chuk-larql-rs Thank you, Chris, for the foundational work and for being open with the des= ign. A note on stability and feedback -------------------------------- pg_infer is new software. The SQL surface, the index AM, the remote backend= protocol, and the test suite are stable enough to release at 1.0.0, and th= e vindex on-disk format is stable forward; but the project is young, and th= e combination of PostgreSQL + pgrx + transformer internals is unusual enoug= h that there are certainly bugs that the existing tests do not yet provoke.= It is not a beta and not a research toy; it is real software released earl= y, and it should be used with appropriate caution. Bug reports, feature requests, and pull requests are very welcome -- especi= ally reproductions, vindex compatibility issues, planner-cost regressions, = and integration suggestions for other PostgreSQL extensions. Links ----- * Repository: [https://codeberg.org/gregburd/pg_infer](https://codeberg.= org/gregburd/pg_infer)=20 * Issues: [https://codeberg.org/gregburd/pg_infer/issues](https://co= deberg.org/gregburd/pg_infer/issues) * LARQL: [https://github.com/chrishayuk/larql](https://github.com/c= hrishayuk/larql)=20 * larql-rs: [https://github.com/chrishayuk/chuk-larql-rs](https://gith= ub.com/chrishayuk/chuk-larql-rs) * BitNet b1.58: [https://arxiv.org/abs/2402.17764](https://arxiv.org/abs/2= 402.17764) --===============7060529155987209325== Content-Type: text/html; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable pg_infer 1.0.0 released -- transformer model knowledge as SQL re= lations
 

pg_infer 1.0.0 released -- transformer model knowledge as SQL relatio= ns

I am pleased to announce the first public r= elease of pg_infer, a PostgreSQL 18+ extension that exposes the internals of small transformer language models -= - gate activations, feature labels, learned associations, embeddings -- as = SQL-queryable relations and a custom index access method.

pg_infer is not "natural language to SQL." = It is not "SQL to natural language." There is no chat interface, no agent l= oop, no prompt template generating queries. pg_infer brings model inference= into the query plan as an operator the planner can cost, schedule, paralle= lize, and combine with ordinary predicates and joins. The model becomes a f= irst-class data source -- a set of relations the planner can scan, filter, = and join -- not an external service the database talks to.

Quick example

-- Register a vindex (extracted model knowledge):
`SELECT infer_create_model('qwen05b', '/data/qwen-0.5b.vindex');`

-- What does the model know about France?

SELECT * FROM describe('France'); -- relation | target | confidence | layer -- -----------+-----------+------------+------- -- capital | Paris | 42.7 | 18 -- language | French | 38.1 | 17 -- continent | Europe | 35.4 | 16 -- ...

-- `ORDER BY` model-knowledge similarity:

``` CREATE INDEX ON documents USING infer (title) WITH (model =3D 'qwen05b');

SELECT * FROM documents
 ORDER BY title <~> 'artificial intelligence'
 LIMIT 5;

```

The <~> operator is inde= x-backed, supports EXPLAIN (ANALYZE, BUFFERS), and composes with WHERE, JOIN, aggregation, and partitioning the way any other operator does.

What pg_infer does that other e= xtensions do not

  • pgvector / pgvectorscale stores user-suppli= ed embedding vectors and answers nearest-neighbour distance queries. pg_infer goes a layer deeper: it stores the model itself (gate vectors, feature activations, label metadata) in WAL-logged 8KB pages, and answers questions like "does the model treat A and B as related?" -- not "are these two embeddings close?"

  • pg_search / RAG-style integrations turn use= r queries into embedding lookups against external vector stores. pg_infer exposes the model's internal structure to SQL: walk(prompt) returns per-layer feature activations; describe(entity) returns the relations the model has learned about an entity; implies(a, b) tests directional support.

  • pg_infer's index AM ships in two modes:

    • "model" indexes= WAL-log the entire vindex inside PostgreSQL pages, so backup, replication, and point-in-time recovery cover the model the same way they cover your tables.
    • "column" indexe= s attach a model to a text column and make ORDER BY <~> on that column index-driven.
  • The mmap-backed local backend lets multiple= PG backends share decoded model pages through the OS page cache; the remote backend (larql-server / larql-router over HTTP/2 or a Unix socket) shares one copy of the model across a host and supports layer-sharded routing. In-flight remote calls respond to pg_cancel_backend(...) within roughly 100ms.

CPU inference, BitNet, and idle= -cluster compute

Database servers almost never have GPUs. Th= ey have a lot of fast cores, a lot of RAM, and -- on most production deploy= ments -- standby replicas, read-only physical replicas, logical subscribers= , and DR hosts that spend most of the day at single-digit CPU utilization w= hile the primary takes the write traffic.

pg_infer and the underlying larql crates ta= rget this hardware profile directly:

  • The default execution paths run efficiently= on CPU, with BLAS-backed linear algebra (OpenBLAS) and f16 gate vectors that decode to f32 lazily.

  • pg_infer / larql support models in the Micr= osoft BitNet b1.58 family ("two-bit / 1.58-bit" ternary-weight transformers, https://arxiv.org/abs/2402.17764), which were specifically designed to run on commodity CPUs at competitive quality and dramatically lower memory and power cost than f16 baselines. Combined with f16 gate activations, this brings useful inference inside a PostgreSQL backend without any specialized accelerator.

  • The cluster model is the point. A typical P= ostgreSQL HA / DR / read-scale deployment has one busy primary and one or more largely idle physical replicas, plus, increasingly, a fleet of logical subscribers. Those replicas already pay for themselves in availability, but their CPUs are idle the vast majority of the time. With pg_infer's remote backend, larql-server runs on the replica hosts and serves model operators back to the primary's query plans -- the model is materialized once per host, the activation cache is shared, and the work happens on capacity you have already paid for. No GPU, no separate inference cluster, no extra network egress.

A few queries that are uniquely= pg_infer

  • Model-aware doc= ument ranking that does not depend on keyword overlap or pre-computed embeddings:

SELECT id, title FROM papers ORDER BY title <~> 'neural architecture search' LIMIT 10;

This finds "AutoML for Deep Networks" because the model
learned that relationship -- pg_trgm cannot, FTS cannot,
and pgvector can only do so if you computed and stored
embeddings ahead of time with a model whose semantics
happen to agree with your query.
  • Joining model k= nowledge with relational data:

SELECT c.id, c.name, p.title, p.title <~> c.research_interest AS dist FROM candidates c JOIN papers p ON p.title <~> c.research_interest < 0.2 WHERE c.country =3D 'DE';

Standard SQL semantics, standard PostgreSQL planner, plus
a model-driven join condition.
  • Probing what a = model "knows" without running it:

SELECT relation, target, confidence FROM describe('PostgreSQL') WHERE confidence > 30;

  • Auditing model = behaviour over time. Because the vindex is stored in WAL-logged pages, point-in-time recovery on a pg_infer-using cluster gives you the model state at any historical moment alongside the data state. "What was the model saying about this entity at 03:14 UTC last Tuesday?" is a literal PITR + describe(...) question.

Acknowledgements

pg_infer would not exist without the LARQL = project by Chris Hayuk (https://github.com/chrishayuk). LARQL pioneered the= idea of making transformer model internals queryable -- extracting gate ve= ctors, feature activations, and learned associations into a format ("vindex= ") that can be explored interactively and expressed as a query language. Th= e vindex format, the gate KNN algorithm, and the feature-labeling pipeline = all originate from LARQL; pg_infer adapts them into a PostgreSQL access met= hod, a WAL-logged storage layer, and a planner-visible operator.

If the larql ideas resonate, please look at= the original work and at Chris's video walkthroughs explaining the vindex = format, the gate-KNN algorithm, and the LARQL query language:

  • Chris Hayuk on = YouTube: https://www.youtube.com/@chrishayuk
  • Original LARQL = repo: https://github.com/chrishayuk/larql
  • larql-rs (Rust = port): https://github.com/chrishayuk/chuk-larql-rs

Thank you, Chris, for the foundational work= and for being open with the design.

A note on stability and feedbac= k

pg_infer is new software. The SQL surface, = the index AM, the remote backend protocol, and the test suite are stable en= ough to release at 1.0.0, and the vindex on-disk format is stable forward; = but the project is young, and the combination of PostgreSQL + pgrx + transf= ormer internals is unusual enough that there are certainly bugs that the ex= isting tests do not yet provoke. It is not a beta and not a research toy; i= t is real software released early, and it should be used with appropriate c= aution.

Bug reports, feature requests, and pull req= uests are very welcome -- especially reproductions, vindex compatibility is= sues, planner-cost regressions, and integration suggestions for other Postg= reSQL extensions.

Links

This email was sent to you from Greg Burd. It was delivered on their behalf= by the PostgreSQL project. Any questions about the content of the message shou= ld be sent to Greg Burd.

You were sent this email as a subscriber of the pgsql-announce mai= linglist, for the content tag Related Open Source. To unsubscribe from further emails, or change which emails you want to receive, please click th= e personal unsubscribe link that you can find in the headers of this email, or visit https://lists.postgresql.org/unsubscribe/.
 
--===============7060529155987209325==--