Community — DDX for PostgreSQL

OVERVIEW

Every PostgreSQL contribution we can extract from public sources, attributed to a person, indexed by release. … contributors so far across 18 authoritative git repos (postgres core plus JDBC/ODBC/psycopg/pgpool/pgdog and extensions), the mailing-list archives, and the GitHub issue / pull-request feeds for pgjdbc and psqlodbc (those two projects don't use mailing lists; their issue trackers are the primary record). The PG core toggle below restricts to the canonical postgres repo only.

Disclaimer. This is one estimation among many. It is not meant to shame, promote, rank, or compare anyone. Contribution is not measurable; what we count here are observable interactions in public archives, weighted equally regardless of size, impact, or difficulty. A 12-line bug fix counts the same as 6,000-line architectural rewrite. A drive-by typo correction counts the same as a multi-year rebase campaign. The numbers below should be read as a rough activity signal across the ecosystem, nothing more. People appear here because their work shows up in public archives; people not here are not less valuable contributors.

The employer column is best-effort: it uses a manual override where one is set, falling back to the email-domain mapping for everyone else, so many personal-email contributors land in Other. See the org rollup page for the per-employer view.

Sources, methodology, the data-flow diagram, and identity coalescing details are further down the page, after the contributor list.

CONTRIBUTORS

Scope

Time

Branch

people…

total contributions…

median per person…

average…

stddev…

TOP 60 CONTRIBUTORS

Loading…

ALL CONTRIBUTORS

Every person who's authored a commit, reviewed a patch, reported an issue, or contributed in any other tracked role. Click any name to jump to their per-person page.

Loading…

SOURCES

The contributor list above pulls from six public data sources, deduplicated by stable identifiers and cross-linked into a single per-person view.

Git commits and trailers. The commit Author: and Committer: headers, plus body trailers Reviewed-by:, Tested-by:, Reported-by:, Co-authored-by:, Suggested-by:, Signed-off-by:. Pulled from git.postgresql.org's official mirror (src) for postgres core, and from the upstream of each ecosystem repo.
Mailing lists. The email_from / email_to counts come from our mailing-list mirror (src), covering ~385k archived messages.
GitHub issues and PRs. For projects that don't use mailing lists, threads from gh-pgjdbc (src) and gh-psqlodbc (src), ingested via the GitHub REST API and threaded with synthetic Message-IDs.
Buildfarm. Animal-owner attribution from the buildfarm (src); shown on each per-person page.
Tier and committer status. The tier badge (Core Team, Major, Significant) is scraped from the PostgreSQL community contributors page; the Committer badge is scraped from the active-committers page. The two are independent: many committers are not on the Core Team.
Bot exclusion. Auto-merge bots, GitHub no-reply addresses for known CI/automation, and 4,700+ pattern-matched bot rows are filtered out via community.is_bot() before any count is taken.

DATA FLOW

The pipeline from upstream sources to the per-person card on this page:

Each box is a system component; arrows mark data dependence (left side computed first). The aggregate boxes are REFRESH MATERIALIZED VIEW CONCURRENTLY'd every six hours on a postgres cron schedule (HH:17), then the static rendering job fires eight minutes later (HH:25) to pre-build per-person card pages.

HOW THE NUMBERS ARE COMPUTED

For each person p, we count their interactions across thirteen role categories: r ∈ { authored, committed, reviewed_by, tested_by, reported_by, coauthored_by, suggested_by, signed_off, discussion, backpatch, wiki_edit, email_from, email_to }. Let $b$ be a time bucket (a release like PG18, a year like 2026, or all), $s \in \{\text{core}, \text{ecosystem}\}$ , and $m \in \{\text{true}, \text{false}\}$ a flag for restricting to master-branch commits only.

The per-(person, bucket, scope, master-only) total is then:

\mathrm{Total}(p, b, s, m) \;=\; \sum_{r} C(p, r, b, s, m)

where $C(p, r, b, s, m)$ is the number of distinct interactions of role $r$ for person $p$ that match the given bucket / scope / master filter. Each event (commit, review trailer, mailing-list message, GitHub comment) is counted at most once per role, deduplicated by its stable identifier (commit_sha for git, mid for mail and GitHub).

Identity coalescing groups multiple aliases under one canonical person:

\mathrm{hash}(p) \;=\; \mathrm{sha256}\bigl(\mathrm{canonical\_email}(p)\bigr)[1\!\!:\!\!8]

Each person has a single canonical email; every git Author:, mailing-list From:, and GitHub handle that's been observed for them maps to that canonical email via community.person_aliases. The 8-character SHA-256 prefix is the URL slug at /community/<hash>/; it's stable across re-syncs and avoids leaking email plaintext.

The org rollup at /community/orgs/ is the same total grouped by employer:

\mathrm{Org\_total}(o, b, s, m) \;=\; \sum_{p\,:\,\mathrm{Org}(p) = o} \mathrm{Total}(p, b, s, m)

where $\mathrm{Org}(p)$ is the operator-curated employer override for $p$ if set, otherwise the inferred organisation from $p$ 's email domain via community.org_for_person().

IDENTITY COALESCING

One person can show up in our data as many email addresses — a personal account, a current employer, a past employer, a GitHub noreply+<handle>@github.com bouncer, an @oldco.example address that became @newco.example five years ago. Without coalescing, every alias is its own row; totals are scattered, the per-person card under-reports.

The fix is a manual mapping in data/community-identities.yaml (also served at /community/identities.yaml for direct download / audit). Each entry pairs one canonical email with a list of aliases:

- canonical: [email protected]
  display_name: Alex Rivera
  aliases:
    - [email protected]
    - [email protected]
- canonical: [email protected]
  display_name: Sam Okafor
  aliases:
    - [email protected]
    - [email protected]

The list is operator-curated, intentionally conservative, and auditable. Anyone can propose additions via the community-correction template. On each per-person page (/community/<hash>/), the "Aliases" section lists every email + display-name pair the mapping has folded into that one canonical record — so disagreement is visible, not hidden.

The 8-character slug at /community/<hash>/ is sha256 of the canonical email, prefix-8. Stable across re-syncs; doesn't leak the email plaintext; the same person always gets the same URL no matter which alias we encountered first.

What we do not do: heuristic clustering on display name, email-prefix similarity, GitHub-handle guessing, or any automatic multi-source merging without a human in the loop. The cost of a false-positive merge (two distinct people collapsed into one) is much higher than the cost of a false-negative split (one person appearing as two rows until somebody files an issue), so we deliberately err on the side of leaving identities split.

COLUMN MEANINGS

Counts come from community.mv_person_contribution_summary, a materialized view rebuilt every six hours. Per the algorithm above, each role r corresponds to one of the columns in the per-person card:

authored — git author of the commit
committed — git committer when distinct from author (rare in postgres-core's strict-rebase workflow; common in extension repos)
committer-not-author — committed minus authored, surfacing "integration effort" (someone committing someone else's patch, etc.)
reviewed-by / tested-by / reported-by / suggested-by — git trailer matches
discussion — references in commit messages to mailing-list URLs or forwarded patches
email-from — count of mailing-list and GitHub messages the person sent
email-to — count of messages addressed to them in To:/Cc:

The master-only toggle restricts the underlying contribution rows to those whose first-seen branch is master/main/HEAD (excluding backported commits on stable branches). It's only meaningful for the PG core scope; ecosystem repos use varying branch conventions.

KNOWN GAPS

Identity coalescing — most aliases collapse cleanly. Operator-curated YAML merges land in data/community-identities.yaml; corrections welcome via the contact form.
Employer attribution — most contributors use personal email domains, so the orgname falls back to Other.
Body-Author parsing — when a maintainer applies someone else's patch with the original author recorded only in the commit message body (not the git-author header), we credit the committer not the author. Joe Conway's tooling parses both. Closing this gap is Phase 3.

Raw data and methodology in /community/data.sql.

FIX YOUR ENTRY

If we got your name, emails, employer, committer status, or tier wrong — or if you're appearing as two separate /community/<hash>/ rows that should be merged — file an issue against ddx/site on Codeberg using the community-correction template. The template asks for everything we'd want to know:

Display name — with diacritics + capitalisation, exactly as you'd like it shown.
Canonical email — the single email that drives your /community/<hash>/ slug.
Other email addresses — every alias we should coalesce: past employer addresses, personal addresses, git-author addresses, GitHub no-reply addresses. We use these to merge duplicate rows.
Committer status — if you're on the active-committers page but the badge isn't showing (or vice versa).
Tier status — your listing on postgresql.org/community/contributors/: Core Team, Major, Significant, Past, or none.
Employer — the manual override that wins over the email-domain inference.
PostgreSQL team membership — project-internal teams (e.g. PGCT = PostgreSQL Community Team at AWS).
Merge two profiles into one — paste the URLs and tell us which is canonical.

Edits land in data/community-identities.yaml in the infra repo and propagate on the next nightly identity sync (usually within 24 hours).

COMMUNITY RESOURCES

Other hosted tools and resources in the broader PostgreSQL community ecosystem.

commitfest.postgresql.org	Official commitfest patch tracker — submit, review, and track patches through the release cycle.
cfbot.cputube.org	Commitfest CI bot — automated patch application and build testing for open commitfest entries.
buildfarm.postgresql.org	PostgreSQL build farm — continuous build and test across dozens of OS/arch/compiler combinations. Animal ownership shown on per-person pages here.
coverage.postgresql.org	PostgreSQL code coverage reports — line and branch coverage for the postgres source tree.
planet.postgresql.org	Planet PostgreSQL — community blog aggregator.
wiki.postgresql.org	PostgreSQL wiki — community documentation, HowTos, TODO lists, and developer notes.
postgresql.org/list	Official mailing list archives — pgsql-hackers, pgsql-general, pgsql-bugs, and 40+ others. Also mirrored on this site.

DDX for PostgreSQL — Community