Metrics Catalog — plain-language explainer

Metrics Catalog is the toolbox's dictionary of HR metrics: the canonical definition, formula, and unit for every workforce number the rest of the toolbox computes against.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/metrics-catalog/, contract 1.1.0 in contracts/types.ts); anything not yet built is marked (TBD).

1. What is it?

Metrics Catalog is the canonical registry of HR-metric definitions — give it a metric id and it returns what that metric means: a stable id, a human name, a one-line definition, the formula, the unit, and (where curated) the recommended statistics for measuring it reliably.

The division of labor is the whole point: the catalog owns the dictionary; calculus owns the calculator. This spoke does not compute "total headcount" for your company — it is the authoritative statement of what "total headcount" is, so that every tool that does compute it means the same thing by the same id.

Visual — Tier A (live API capture). A real lookup against the catalog, grounded in the PAT-40 seed migration (drizzle/0026_pat40_metrics_catalog.sql):

GET /api/spokes/metrics-catalog/metrics/hr-metric.workforce-composition.total-headcount
→ {
    "id": "hr-metric.workforce-composition.total-headcount",
    "slug": "total-headcount",
    "name": "Total Headcount",
    "categoryId": "workforce-composition",
    "description": "Total number of active employees across the organization",
    "formula": "COUNT(employees WHERE status='active')",
    "unit": "number",
    "denominatorType": null,
    "recommendedCiMethod": null,
    "minSampleSizeForReliability": null,
    "methodologyNotes": null,
    "citations": null
  }

(Real seed row — MetricDefinition shape from contracts/types.ts, values from the donor-bootstrapped seed. The nullable enrichment fields are honest: a base metric carries no curated statistics yet.)

2. What problem does it solve — and why is it different?

The pain it removes: in most organizations "turnover rate" or "time to fill" means a slightly different thing in every report, because the formula lives in whoever's spreadsheet built it. There is no authority that says which denominator is correct, so two dashboards disagree and nobody can say which is right.

The difference, stated as a shift:

FROM a metric that is whatever the last analyst's formula said it was, redefined silently in every BI tool.
TO a metric with a stable id, a published formula, and a single canonical definition that every computing tool validates against.

How it differs from the obvious substitutes:

vs. a glossary in a wiki — a wiki definition is prose nobody enforces. The catalog id is a contract handle: calculus soft-validates every MetricEnvelope.metricKey against this catalog at request time (PAT-40 Phase 3), so an unknown or misspelled metric key surfaces a warning instead of quietly producing a number.
vs. a generic BI semantic layer — those tie definitions to one BI vendor's model. The catalog is vendor-neutral, code-versioned, and consumed by HTTP + MCP, so the same definition serves a dashboard, an API consumer, and an AI agent identically.

Visual — Tier B (FROM→TO typographic block). The shift above is the visual; a rendered comparison block is a follow-up (FU-A).

3. How does it work?

The practitioner's questions, answered concretely:

"What can I ask it?" — Four read endpoints (all public GETs, no service key, per the PAT-11 rule): list metrics (with optional category filter + pagination), look up one metric by its stable id, case-insensitive substring search across name / description / slug, and list the categories.
"Where do the definitions come from?" — The catalog was bootstrapped from the donor people-analyst/metric-engine-calculus — a hand-authored HR Metrics Catalog of 102 metrics across 6 categories (the donor's UI-only icon/color fields were stripped). Subsequent metrics were added by migration: a regulatory-compliance category (PAT-82), a tenure trio (PAT-D1-B), and segment-distribution definitions mirroring segmentation-studio's canonical fields (PAT-156, PAT-164).
"What does a metric carry?" — Every metric has the core fields (id, slug, name, category, description, formula, unit). A subset also carries Phase-2 enrichment: the recommended confidence-interval method (wilson / normal / t / exact), the minimum sample size for reliability, long-form methodology notes, and bibliographic citations.
Output: a MetricDefinition (or a list/category result), validated against the Zod contract on the way out — text columns for unit and recommendedCiMethod are re-parsed against the contract enums so a bad seed fails loudly at the read, not silently at the consumer.

Differentiation beat: the catalog does not just name a metric — for curated rows it tells you how to measure it honestly. The regulatory-compliance metrics ship with their recommended CI method, a small-N threshold, and the statistical reasoning behind both, so a consumer doesn't have to re-derive the right interval.

Visual — Tier A (live capture of a fully enriched row). The PAT-82 compliance metric, with Phase-2 fields populated (values verbatim from drizzle/0033_pat82_compliance_metrics.sql):

GET /api/spokes/metrics-catalog/metrics/hr-metric.regulatory-compliance.pay-floor-failure-rate
→ {
    "name": "Pay-Floor Failure Rate",
    "unit": "percent",
    "formula": "COUNT(compliance_status='fail') / COUNT(evaluated_workers)",
    "denominatorType": "count",
    "recommendedCiMethod": "wilson",
    "minSampleSizeForReliability": 30,
    "methodologyNotes": "Wilson confidence interval is appropriate because the metric
      is a proportion with a well-defined denominator (evaluated workers, not total
      workforce). For sub-30 denominators ... data-anonymizer.min-n-check enforces this. ...",
    "citations": [
      "Wilson, E.B. (1927). Probable inference, the law of succession ... JASA 22(158): 209-212.",
      "U.S. Department of Labor — FLSA enforcement statistics methodology",
      "wage-compliance spoke contract types (PAT-79): ComplianceStatusEnumSchema ..."
    ]
  }

(Real seed row, truncated for length where marked .... This is the difference a curated metric makes: the definition tells you which interval to use and why.)

4. What does it enable?

Concrete uses a practitioner would recognize:

Resolve metric disputes — when two reports disagree on "turnover rate," look up the canonical formula and denominator; the catalog is the tiebreaker.
Validate a computed metric — calculus checks each MetricEnvelope.metricKey against the catalog; an analyst sees a warning when they've used a key that isn't canonical, before the number ships.
Pick the right statistics automatically — once a metric carries recommendedCiMethod and minSampleSizeForReliability, calculus.stats-enrich can default to the correct interval when the consumer doesn't specify one (the opt-in lands once Phase-2 curation is broad).
Browse by domain — list the 7 categories (workforce composition, compensation & benefits, talent acquisition, performance & development, engagement & retention, workforce planning, regulatory compliance) and page through the metrics in each.
Search for a metric — substring search across name, description, and slug finds the canonical id for a half-remembered metric.
Power dashboard KPI rows — the wage-compliance surface KPI row (PAT-84) and the metric × segment × period grids that calculus + anycomp compose are addressed by catalog ids, so the labels and definitions stay consistent across surfaces.

Visual — Tier A (live capture). GET /api/spokes/metrics-catalog/categories returns the canonical category list (ids + names + sort order), seeded verbatim from drizzle/0026_pat40_metrics_catalog.sql.

5. How it fits in the toolbox

Data flow:

Consumes — its own seed migrations (the donor catalog + later additions). It is upstream of computation, so it does not consume other spokes' runtime data. It mirrors segmentation-studio's canonical_fields for the segment-distribution metrics (PAT-156 / PAT-164), keeping segment definitions and segment metrics in lockstep.
Emits — MetricDefinition, MetricCategory, and list/search result contracts. Consumers vendor src/spokes/metrics-catalog/contracts/types.ts.
Feeds — calculus (live consumer: soft-validates metricKey against the catalog, PAT-40 Phase 3); wage-compliance surfaces and the anycomp / calculus metric grids that label their numbers by catalog id; Performix (planned consumer per the registry).
Principia link — optional nullable principia_construct_id / principia_measure_id columns (PAT-Principia-prep) let a metric declare that it is the operational form of a known scientific construct or one specific Measure of it; resolution flows through the principia-connector spoke when that wiring lands (PAT-114). These are soft-validated — unknown ids never reject — and the source of truth for the referenced ids lives in people-analyst/principia.

Visual — Tier B (typographic data-flow). seed migrations → Metrics Catalog (definitions) → { calculus metricKey validation · wage-compliance / anycomp metric grids · Performix (planned) }, with segmentation-studio.canonical_fields mirrored in for segment metrics and optional Principia construct/measure ids attached.

6. Commercialization / packaging

Metrics Catalog is infrastructure, not a standalone product — it is the shared dictionary the computing spokes and buyer-facing surfaces depend on, the way a units-of-measure standard underpins instruments without being sold on its own.

Data-license posture: the metric definitions and formulas are the toolbox's own authored content (the donor catalog was a hand-authored definition set, not a licensed vendor survey). Curated methodology notes cite public sources (e.g. the Wilson 1927 paper, U.S. DOL FLSA methodology). No vendor-survey licensing constraints attach to the catalog itself.
Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram showing the catalog as the shared definition layer beneath the computing spokes).

7. The vision

One canonical definition for every workforce metric — so that every tool, dashboard, and AI agent in the portfolio means the same thing by the same id, and the right way to measure each one travels with the definition.

The direction is Phase-2 enrichment across the whole catalog (PAT-40-FU-A): populate denominatorType, recommendedCiMethod, minSampleSizeForReliability, methodologyNotes, and citations for the base metrics the way the regulatory-compliance rows already are, so that picking the statistically correct interval becomes automatic everywhere calculus computes. The Principia link (PAT-114) is the second axis: connect operational metrics to the peer-reviewed scientific constructs they measure, so a metric can carry not just a formula but a citation to the science behind it. Per-tenant catalog overrides and admin write tooling are noted as future work — v1 is global and read-only, written via migration.

Visual — (TBD — an enrichment-coverage map: which categories carry curated Phase-2 statistics vs. base definitions only).

8. Current status

Grounded in the real code state (contract 1.1.0 in contracts/types.ts; CHANGELOG shows 1.2.0 reached by later additive migrations; status: "live" in src/lib/contracts/registry.ts):

Shipped: 7 categories and ~118 metric definitions (102 donor-seeded at PAT-40; +5 regulatory-compliance at PAT-82; +3 tenure at PAT-D1-B; +6 segment-distribution at PAT-156; +2 demographic-similarity at PAT-164 — counts per the CHANGELOG). Four public GET endpoints (/metrics, /metrics/[id], /metrics/search, /categories) plus /health. Five MCP tools registered (metrics-catalog.{list,lookup,search,list-categories,health}). The regulatory-compliance metrics ship with Phase-2 enrichment fully populated; calculus consumes the catalog live for metricKey soft-validation. Optional Principia construct/measure id columns added (PAT-Principia-prep).
In flight / planned: Phase-2 enrichment curation for the base (non-compliance) metrics (PAT-40-FU-A); live Principia resolution wiring (PAT-114); the planned Performix consumer; per-tenant overrides + admin write tooling.

Visual — Tier A (live capture). GET /api/spokes/metrics-catalog/health reports the spoke's live status, contract version, and schema reachability at request time.

Worked example used above is real: the total-headcount row (base definition, no curated statistics) and the pay-floor-failure-rate row (fully Phase-2-enriched, with its Wilson-interval reasoning and citations) are both verbatim from the spoke's own seed migrations (drizzle/0026_pat40_metrics_catalog.sql, drizzle/0033_pat82_compliance_metrics.sql), rendered through the MetricDefinition contract shape. The contrast between them is the load-bearing point — a catalog metric carries its definition, and when curated, the honest way to measure it. No figure here is invented.