About — DDX for PostgreSQL

WHY THIS EXISTS

PostgreSQL's mailing lists, git repositories, documentation, wikis, and commitfest patches contain decades of distributed knowledge. We currently index over 1.4 million messages spanning 1985-present across 56 community archives (pgsql-hackers alone holds over 1.3 million), with historical backfill ongoing. This knowledge is scattered across different systems and formats, making it expensive and time-consuming for AI systems (and humans) to:

Search across all sources simultaneously
Correlate information (which discussion led to this commit?)
Fetch and parse different formats (mbox, git, HTML, markdown)
Handle repeated requests that overload upstream infrastructure

DDX for PostgreSQL solves this by:

Indexing all PostgreSQL community sources in one place
Providing full-text search, fuzzy matching, and semantic search (semantic search is experimental and currently slow — prefer keyword search for time-sensitive queries)
Offering multiple access protocols (HTTP, NNTP, IMAP, POP3, MCP, Git)
Reducing load on postgresql.org infrastructure

FOR AI/LLM DEVELOPERS

The Problem

Fetching raw PostgreSQL data inefficiently wastes LLM tokens. Example:

Before: Fetch all 100,000 pgsql-hackers messages, parse manually → 50MB+ of tokens
After: Query structured, indexed results → 2MB of tokens (25x cheaper)

DDX for PostgreSQL provides pre-indexed, pre-embedded data with structured queries, so you get exactly what you need without parsing overhead.

How Your Agent Uses This

You'll never need to do this yourself! This is just a peek under the covers. Your AI agent will use the methods exported by the MCP service to invoke this service and get information about Postgres faster than you'd imagine. Just give it a try!

When you connect your agent to DDX for PostgreSQL via MCP, it gains access to 100+ structured methods for searching mailing lists, analyzing code, exploring git history, and discovering commitfest patches. The agent calls these directly—no manual HTTP requests needed.

Behind The Scenes (For Curiosity)

If you want to see how it works, here are some example queries against the live HTTP/JSON surface. The agent never makes you type these — it calls equivalent MCP tools directly.

1. Search a mailing list and get JSON results:

curl -s 'https://pg.ddx.io/m/pgsql-announce/?q=release&format=json' | jq .

Returns matching messages with subject, from, date, message-id, and a thread cursor. Each indexed inbox has its own search endpoint at /m/<inbox>/?q=<term>&format=json (the canonical path; /<inbox>/?... still 301-redirects there). Drop &format=json to get the HTML view a browser would render.

2. Fetch a single message as JSON:

curl -s 'https://pg.ddx.io/m/pgsql-announce/?q=release&format=json&limit=1' | jq '.results[0]'

Returns one full message-id-keyed record so you can follow up with thread, raw, or atom retrieval.

3. Talk to the MCP server directly (JSON-RPC over HTTP):

curl -s -X POST -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":0,"method":"initialize",
       "params":{"protocolVersion":"2024-11-05",
                 "capabilities":{},
                 "clientInfo":{"name":"curl","version":"1"}}}' \
  https://pg.ddx.io/mcp | jq .

Initializes a session and returns the server's protocol version, capabilities, and tool surface. Real agent clients (Claude Desktop, Cursor, custom integrations) do this transparently. Subsequent tool calls (tools/list, tools/call, resources/read) require the session ID returned here, which is why most users let an MCP client library handle the protocol.

4. Clone an inbox via Git smart-HTTP:

git clone https://pg.ddx.io/m/pgsql-announce.git

Each inbox is also a public-inbox v2 git repository — mirror the whole archive locally and run your own indexer on top.

Why This Matters

Context Windows Are Expensive: Structured queries cost 25-100x less in tokens than raw data
Semantic Search Works Better: Ask "How does PostgreSQL handle NUMA?" and find related code, discussions, and documentation automatically (experimental, currently slow — use keyword search for time-sensitive queries)
Real-Time Data: Updates hourly (mailing lists), continuously (git), nightly (docs)
No Overhead: Data is pre-parsed, pre-embedded, ready to feed to your LLM

INFRASTRUCTURE

Open source. Self-hosted on commodity infrastructure across two regions in Europe with streaming replication and a hot-standby replica. All ingestion, indexing, and vector embedding is performed on-premise — no third-party service sees the content of any query, nor any private data about who asked it. The full source for the application and its deployment is at codeberg.org/ddx.

LIVE ACTIVITY (IRC)

The site posts deploy alerts, status changes, and workflow events to a public, read-only IRC channel. If you'd like to lurk on what's going on behind the scenes, connect with any IRC client.

Server	`irc.ddx.io` (TLS only on tcp/6697)
Channel	`#ddx`
Voiced (writes)	`pgesq-bot` (alert poster) and `gregburd` (operator). Everyone else can join and read; the channel is `+m`.
Topic	DDX for PostgreSQL — site alerts, status, workflow.

Example invocations:

irssi:

/network add -nick lurker ddx
/server add -auto -ssl -ssl_verify -network ddx irc.ddx.io 6697
/channel add -auto #ddx ddx
/connect ddx

weechat:

/server add ddx irc.ddx.io/6697 -tls
/set irc.server.ddx.autojoin "#ddx"
/connect ddx

Browser (no install, lurker mode):

https://kiwiirc.com/nextclient/?server=irc.ddx.io:+6697&channel=%23ddx&nick=lurker_?

Anyone can join and listen. Only allowlisted SASL-authenticated nicks can send messages.

DATA SOURCES

Mailing lists: 56 PostgreSQL community archives, updated hourly. Full NNTP/IMAP/POP3/Git/HTTP access.
GitHub inboxes: pgjdbc and psqlodbc issue & pull-request feeds — these projects use GitHub rather than mailing lists as their primary record.
Git repositories: PostgreSQL upstream, forks (Supabase, Postgres Pro, Greenplum, AgensGraph, Postgres-XL), connection poolers (pgbouncer, pgpool2, pgdog), and client libraries (JDBC, ODBC, psycopg, tokio-postgres, etc.)
Documentation: postgresql.org/docs (all versions), full-text indexed
Wiki: wiki.postgresql.org (1,800+ public pages), searchable
Commitfest: Patch submission lifecycle tracking, in development — data is indexed and queryable via MCP; the /commitfest web UI is not yet live
Buildfarm: 247 animals + run history, indexed and queryable via MCP and /mcp/api/buildfarm/*
Discord: PostgreSQL community Discord — indexed for cross-source identity linkage only. Participants are connected to their git/email/docs activity via their public handle; per our agreement with the Discord community, message content is never exposed through any API.

The critical value is the connective tissue between these sources: linking a mailing list discussion to the commit it produced, to the code symbols it changed, to the documentation it updated, and to the build results that validated it.

LEGACY: THE BERKELEY POSTGRES ARCHIVE

The /legacy section is a static mirror of the original Berkeley POSTGRES archive at dsf.berkeley.edu/postgres.html — Stonebraker's research project from 1986–1995 that became modern PostgreSQL. We mirror it here for permanence; bytes are preserved in our R2 backup as well as served from DDX for PostgreSQL.

The mailing-list discussions, source releases (POSTGRES v3 through v4.2 plus postgres95-{0.01..1.02}), patches, and papers are also indexed and queryable through MCP and search_docs. See /legacy.

EXTENSIONS & ACKNOWLEDGMENTS

pg.ddx.io is a thin Go service (agora) over a much larger pile of work done by other people. The PostgreSQL extensions listed below do most of the actual heavy lifting; agora just routes traffic and stitches results.

Extension	Version	Role
PostgreSQL	18.3	Core database — every byte of the archive lives in PostgreSQL.
pg_turbovec	1.17.1	TurboQuant-quantised vector index access method (pgrx). Backs all five embedding columns (`ag_messages`, `ag_embeddings`, `ag_ci_embeddings`, `ag_document_embeddings`, `mentat.recommendations`) with 4-bit-quantised HNSW. Drives semantic search over messages and code symbols. ~4× smaller indexes than f32 HNSW at the same recall.
pg_textsearch	1.2.0	BM25 full-text relevance ranking via Tantivy. Replica-safe (works on streaming standbys), unlike its predecessor. Drives `search_messages`, `search_symbols`, body search.
pg_trgm	1.6	Trigram similarity + GIN index used for `ILIKE` on subject/from-address fallback paths where BM25 isn't a byte-substring equivalent.
pg_deltax	0.1.0	Columnar / delta-compression storage for time-series data; staged for future `ag_messages` partitioning. No production tables yet.
pg_tre	1.8.2	TRE-pattern regex search with edit-distance (drives `?q=…&regex=1&k=N`). Built on Ville Laurikari's TRE library.
pg_mentat	1.5.1	EDN/datalog triple store; lets us cross-join mailing lists, git, docs, and wiki in one query language.
pg_infer	1.0.0	SQL bridge to a local larql inference server, serving Microsoft BitNet b1.58 2B 4T as a native-ternary (i2_s, keep-quant) vindex — 30 transformer layers, hidden size 2560, vocab 128k. Exposes model weights as SQL relations (`infer_show_layers`, `infer_explain_walk`, `infer`, `infer_detect_server`, …).
pg_cron	1.6	In-database job scheduler — mailing-list ingest, git pulls, backups all run from here.
pgcrypto	1.4	Hashing for content fingerprints (deduping messages, naming git objects).
pg_stat_statements	1.12	Query telemetry; how we find slow queries before users do.
PgQue ^†	0.2.0	Snapshot/batch job queue (pure plpgsql, no C extension). pg_cron triggers `job_management.enqueue_job()` which writes to `pgque.event_`; `agora-job-dispatcher.service` consumes batches and fires the corresponding systemd `.service`. Replaces the prior in-DB `job_management.job_queue` table-as-queue. Apache-2.0.

^† PgQue is a pure plpgsql schema add-on, not a binary PostgreSQL extension — it doesn't appear in \dx. Listed here because it carries weight equivalent to one.

Also indispensable, even though they're not loaded as PostgreSQL extensions:

public-inbox v2 — the on-disk archive format. agora is a Go reimplementation of public-inbox semantics; we owe Eric Wong and the public-inbox contributors the entire data model.
The Timescale team for pg_textsearch.
Cloudflare Pages / Workers / Load Balancer — the edge proxy in front of pg.ddx.io.
ollama — local embedding inference; no third-party model provider sees query text.
Linux / NixOS / colmena — the deployment substrate. Two-region active replication is just colmena pushing the same flake to both hosts.
DB-IP Lite IP-to-Country database — the IP-to-country mapping that powers the /status world map. Used under CC-BY-4.0; only aggregated counts are stored, never individual IP addresses.

Bug reports, patches, and version corrections welcome via contact.

DDX for PostgreSQL — About

WHAT IS THIS?