DDX for PostgreSQL — About

WHAT IS THIS?

The Database Development neXus: for PostgreSQL (DDX for PostgreSQL, served at pg.ddx.io) is a free, read-only archive and intelligence service for the PostgreSQL community. All data served is already public. We add full-text search, semantic search, code intelligence with call graphs (get_callees works; get_callers under repair), and structured MCP access for AI agents.

No personal data is collected. No accounts exist. All protocols are read-only. The software powering this service (Agora) is a derivative of public-inbox.org, ported to Go and significantly extended.

WHY THIS EXISTS

PostgreSQL's mailing lists, git repositories, documentation, wikis, and commitfest patches contain decades of distributed knowledge. We currently index 240,000+ messages spanning 1991-present across 54 community archives (pgsql-hackers alone holds ~188k), with historical backfill ongoing. This knowledge is scattered across different systems and formats, making it expensive and time-consuming for AI systems (and humans) to:

DDX for PostgreSQL solves this by:

FOR AI/LLM DEVELOPERS

The Problem

Fetching raw PostgreSQL data inefficiently wastes LLM tokens. Example:

DDX for PostgreSQL provides pre-indexed, pre-embedded data with structured queries, so you get exactly what you need without parsing overhead.

How Your Agent Uses This

You'll never need to do this yourself! This is just a peek under the covers. Your AI agent will use the methods exported by the MCP service to invoke this service and get information about Postgres faster than you'd imagine. Just give it a try!

When you connect your agent to DDX for PostgreSQL via MCP, it gains access to 100+ structured methods for searching mailing lists, analyzing code, exploring git history, and discovering commitfest patches. The agent calls these directly—no manual HTTP requests needed.

Behind The Scenes (For Curiosity)

If you want to see how it works, here are some example queries against the live HTTP/JSON surface. The agent never makes you type these — it calls equivalent MCP tools directly.

1. Search a mailing list and get JSON results:

curl -s 'https://pg.ddx.io/m/pgsql-announce/?q=release&format=json' | jq .

Returns matching messages with subject, from, date, message-id, and a thread cursor. Each indexed inbox has its own search endpoint at /m/<inbox>/?q=<term>&format=json (the canonical path; /<inbox>/?... still 301-redirects there). Drop &format=json to get the HTML view a browser would render.

2. Fetch a single message as JSON:

curl -s 'https://pg.ddx.io/m/pgsql-announce/?q=release&format=json&limit=1' | jq '.results[0]'

Returns one full message-id-keyed record so you can follow up with thread, raw, or atom retrieval.

3. Talk to the MCP server directly (JSON-RPC over HTTP):

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":0,"method":"initialize",
       "params":{"protocolVersion":"2024-11-05",
                 "capabilities":{},
                 "clientInfo":{"name":"curl","version":"1"}}}' \
  https://pg.ddx.io/mcp | jq .

Initializes a session and returns the server's protocol version, capabilities, and tool surface. Real agent clients (Claude Desktop, Cursor, custom integrations) do this transparently. Subsequent tool calls (tools/list, tools/call, resources/read) require the session ID returned here, which is why most users let an MCP client library handle the protocol.

4. Clone an inbox via Git smart-HTTP:

git clone https://pg.ddx.io/m/pgsql-announce.git

Each inbox is also a public-inbox v2 git repository — mirror the whole archive locally and run your own indexer on top.

Why This Matters

INFRASTRUCTURE

Open source. Self-hosted on commodity infrastructure across two regions in Europe with active replication. All ingestion, indexing, and vector embedding is performed on-premise — no third-party service sees the content of any query, nor any private data about who asked it. The full source for the application and its deployment is at codeberg.org/ddx.

DATA SOURCES

The critical value is the connective tissue between these sources: linking a mailing list discussion to the commit it produced, to the code symbols it changed, to the documentation it updated, and to the build results that validated it.

LEGACY: THE BERKELEY POSTGRES ARCHIVE

The /legacy section is a static mirror of the original Berkeley POSTGRES archive at dsf.berkeley.edu/postgres.html — Stonebraker's research project from 1986–1995 that became modern PostgreSQL. We mirror it here for permanence; bytes are preserved in our R2 backup as well as served from DDX for PostgreSQL.

The mailing-list discussions, source releases (POSTGRES v3 through v4.2 plus postgres95-{0.01..1.02}), patches, and papers are also indexed and queryable through MCP and search_docs. See /legacy.

EXTENSIONS & ACKNOWLEDGMENTS

pg.ddx.io is a thin Go service (agora) over a much larger pile of work done by other people. The PostgreSQL extensions listed below do most of the actual heavy lifting; agora just routes traffic and stitches results.

ExtensionVersionRole
PostgreSQL18.3Core database — every byte of the archive lives in PostgreSQL.
pg_turbovec1.3.0TurboQuant-quantised vector index access method (pgrx). Backs all five embedding columns (ag_messages, ag_embeddings, ag_ci_embeddings, ag_document_embeddings, mentat.recommendations) with 4-bit-quantised HNSW. Drives semantic search over messages and code symbols. ~4× smaller indexes than f32 HNSW at the same recall.
pg_textsearch1.2.0BM25 full-text relevance ranking via Tantivy. Replica-safe (works on streaming standbys), unlike its predecessor. Drives search_messages, search_symbols, body search.
pg_trgm1.6Trigram similarity + GIN index used for ILIKE on subject/from-address fallback paths where BM25 isn't a byte-substring equivalent.
pg_deltax0.1.0Columnar / delta-compression storage for time-series data; staged for future ag_messages partitioning. No production tables yet.
pg_tre1.3.0TRE-pattern regex search with edit-distance (drives ?q=…&regex=1&k=N). Built on Ville Laurikari's TRE library.
pg_mentat1.3.0EDN/datalog triple store; lets us cross-join mailing lists, git, docs, and wiki in one query language.
pg_infer1.0.0SQL bridge to a local larql inference server, currently serving Microsoft BitNet b1.58 2B 4T (GGUF source dequantized to f16 at vindex-build time, 30 transformer layers, hidden size 2560, vocab 128k). Exposes model weights as SQL relations (infer_show_layers, infer_explain_walk, infer, infer_detect_server, …).
pg_cron1.6In-database job scheduler — mailing-list ingest, git pulls, backups all run from here.
pgcrypto1.4Hashing for content fingerprints (deduping messages, naming git objects).
pg_stat_statements1.12Query telemetry; how we find slow queries before users do.
PgQue 0.2.0Snapshot/batch job queue (pure plpgsql, no C extension). pg_cron triggers job_management.enqueue_job() which writes to pgque.event_*; agora-job-dispatcher.service consumes batches and fires the corresponding systemd *.service. Replaces the prior in-DB job_management.job_queue table-as-queue. Apache-2.0.

PgQue is a pure plpgsql schema add-on, not a binary PostgreSQL extension — it doesn't appear in \dx. Listed here because it carries weight equivalent to one.

Also indispensable, even though they're not loaded as PostgreSQL extensions:

Bug reports, patches, and version corrections welcome via contact.

OWNERSHIP & SUPPORT

This service is operated by Greg Burd. I pay all hosting fees personally. No ads, no tracking, no data collection. This is a gift to the PostgreSQL community.

A Tip Jar is coming soon to help defray infrastructure costs. If you find this service valuable, please consider supporting it.

Questions? Get in touch.