Calculus — plain-language explainer

Calculus is the numerical brain of the toolbox: it turns a raw number into a number that knows how sure it is, where it sits, and which way it's moving.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/calculus/, contract 1.13.0); anything not yet built is marked (TBD).

1. What is it?

Calculus is a set of callable statistical primitives — confidence intervals, percentile ranks, trend classification, anomaly flags, distribution fitting, regression diagnostics, gap decomposition, Monte Carlo, and OLS surrogates — exposed as plain HTTP and MCP endpoints rather than as a programming library.

It also owns the toolbox's canonical unit of currency: the MetricEnvelope — { metricKey, segmentId, period, value, sampleSize, provenance, enrichment }. That shape is how the rest of the portfolio passes a measured number around together with its provenance (where it came from, what method produced it, when) and its enrichment (the statistics that say how trustworthy and how notable it is).

The job, in one line: it is the layer that turns "engagement is 78" into "engagement is 78, in the 60th percentile of comparable teams, up 3.3% from last quarter."

Visual — Tier B (typographic shift).

"engagement is 78"
   →  value: 78.5
      enrichment: { percentile: 60, zScore: 0.30, changeRate: +0.033 }
      provenance: { source, method, computedAt }

2. What problem does it solve — and why is it different?

The pain it removes: bare numbers in people analytics travel without their statistics. A dashboard shows "78" with no interval, no peer context, no direction — and a reader can't tell a meaningful move from sampling noise, or a thin-sample artifact from a real signal.

The difference, stated as a shift:

FROM a point number on a slide, with the uncertainty, the peer context, and the provenance all stripped off.
TO an envelope that carries its confidence interval, its z-score and percentile against a comparison set, its change rate versus the prior period, and a record of how it was computed — so the same object that displays the number also justifies it.

How it differs from the obvious substitutes (grounded in the spoke's own positioning notes, src/spokes/calculus/README.md):

vs. a stats package (R, scipy, statsmodels) — those are libraries you have to host and call from code. Calculus offers the same primitives as a network service an AI agent or another spoke can invoke without standing up a runtime.
vs. a BI semantic layer (Looker, Cube) — semantic layers serve pre-modeled metrics; they don't expose the math underneath. Calculus gives you the primitives to construct a new metric with proper statistical guarantees, not just read a pre-baked one.
vs. a notebook (Jupyter, Hex, Mode) — notebooks produce one-off, author-time analyses. Calculus produces durable, addressable envelopes that carry provenance, so every number has a source, a method, and a moment.

Visual — Tier B (FROM→TO block). The shift above is the visual; a rendered comparison panel is a follow-up.

3. How does it work?

Inputs → method → outputs, concretely, organized around the practitioner's questions.

"How sure am I about this number?" — POST /stats/enrich. You hand it a MetricEnvelope plus, optionally, a comparison distribution and a previous value. It returns the enrichment block:

a confidence interval — Wilson-score when you pass a proportion denominator, a t-interval for small samples (n < 30) and a normal interval otherwise (core/stats.ts, core/stats-enrich.ts);
a z-score and percentile rank of the value against the comparison distribution;
a change rate versus the previous value.

"Which way is it moving?" — POST /stats/trend. Given a periods array, it fits an OLS slope and classifies direction as rising / stable / falling using a proportional change threshold (default 2%) — proportional, not absolute, so the same threshold behaves sensibly across metrics of different magnitudes.

"Where are the surprises and the gaps?" — POST /stats/anomaly (z-score, IQR, change-point flags), POST /stats/impute (forward-fill / linear-interpolate / flag-missing over irregular periods).

"Score and rank a whole grid at once." — POST /factory/build. The bulk path: you supply a valuesProvider array over a metric × segment × period grid; it returns every enriched envelope plus a ranked list (strategies: impact, significance, change, recency, sample-size), each carrying a rank and a human-readable rankReason. Stateless by default; ?persist=true writes envelopes to the metric_envelopes table.

"Decompose, diagnose, simulate." — POST /oaxaca (Oaxaca–Blinder mean-gap decomposition, pooled / two-fold / three-fold), POST /diagnostics (OLS leverage, Cook's distance, DFFITS, residuals, plus optional residual-space DBSCAN), POST /monte-carlo/run and POST /regression-surrogate/{fit|forward|inverse} (the CompEngine path — Monte Carlo accumulation plus OLS/ridge surrogate models with analytic forward and inverse calculators), POST /distribution-fit and POST /importance-reconcile (the two PA Instruments).

The science is named explicitly in the code, not hand-waved: the two-fold Oaxaca variant uses β_B (Blinder 1973, the modern pay-fairness-software convention), the pooled variant uses β_pooled (Neumark 1988), with Jann 2008 §3 cited for the comparison (core/oaxaca-blinder.ts, contract JSDoc). Confidence-interval method is chosen by sample size, not assumed.

Differentiation beat: the practitioner's real question is rarely "what's the number" — it's "is this move real, and can I defend it?" Calculus answers that in the response itself: the interval widens when the sample is thin (t over z under n=30), the percentile situates the value against its peers, and the provenance records the method — so the number arrives already defensible.

Visual — Tier B (request → response step flow). MetricEnvelope (+ distribution, +previousValue) → /stats/enrich → enriched envelope { ci, zScore, percentile, changeRate }.

4. What does it enable?

Concrete uses a practitioner would recognize:

Put error bars on a metric — attach a confidence interval to an engagement or attrition number so a reader can tell signal from sampling noise, with the interval method chosen by sample size.
Situate a team against its peers — a z-score and percentile that say whether a segment's number is ordinary or notable within a comparison set.
Classify a trend honestly — call rising / stable / falling on a proportional threshold so small absolute wiggles on a big metric don't read as movement.
Score and rank a metric × segment grid in one call — the factory path returns ranked envelopes with a rankReason, the load-bearing input for an Insight Player or a portfolio dashboard.
Decompose a pay or outcome gap — Oaxaca–Blinder splits a mean difference between two cohorts into explained vs. unexplained parts, the backbone of a pay-fairness read.
Flag anomalies and influential points — z-score / IQR / change-point anomaly flags on a series, and regression diagnostics (leverage, Cook's distance) that surface the rows distorting a fit.
Simulate and invert a cost model — Monte Carlo accumulation plus an OLS surrogate you can run forward (inputs → cost with a 95% interval) or invert (target cost → required input).

Visual — Tier B (capability list). The list above is the visual; a rendered capability map is a follow-up.

5. How it fits in the toolbox

Calculus sits at the bottom of the dependency DAG, alongside data-anonymizer — a primitive other spokes build on rather than one that builds on others.

Data flow:

Consumes — nothing exotic: it takes already-resolved values. Callers assemble the metric × segment × period grid elsewhere (cohorts from segmentation-studio, signals from other spokes) and hand Calculus the raw values to enrich. It does not own HRIS ingestion or cohort resolution.
Owns + emits — the canonical MetricEnvelope contract and the enriched-envelope outputs. Consumers vendor src/spokes/calculus/contracts/types.ts.
Feeds — any surface that ranks or displays metrics. The README names the Insight Player and portfolio dashboards as the expected consumers of the factory/build ranked output, reading off enrichment.percentile, enrichment.effectSize, and enrichment.changeRate to order what a user sees.
Split-of-ownership (intentional) — Cronbach α stays in reincarnation (PAT-2 owns it); the Wilson-score interval is duplicated inline in preference-modeler today, with Calculus designated the canonical home so a future preference-modeler refactor can import it via the contract layer. The empirical-importance regression that feeds the importance-reconcile empirical side lives in factor-models; cross-rater alignment lives in performance-validity — Calculus does the reconciliation/combine only.

Visual — Tier B (typographic data-flow). segmentation-studio cohorts + spoke signals → (raw values) → Calculus → enriched MetricEnvelopes → { Insight Player · dashboards · pay-fairness reads }.

6. Commercialization / packaging

Calculus is a service component, not a standalone product — it is the statistical layer the buyer-facing surfaces and PA Products compose, and two of its endpoints (distribution-fit, importance-reconcile) are catalogued PA Instruments (composable primitives; catalog docs/primitives/00-CATALOG.md) that meals like the AnyComp decision layer consume.

Access posture (grounded in the toolbox auth rules): GET endpoints are public; the POST write/compute routes require a service key (TOOLBOX_SERVICE_KEY over HTTP, per-consumer keys over MCP). It is consumed by other spokes and apps, not sold on its own.
Data-license posture: Calculus computes on values its callers supply — it carries no licensed third-party dataset of its own, so it imposes no additional data-license constraint beyond whatever the input data already carries.
Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram showing Calculus under the PA Products that compose it).

7. The vision

The numerical brain that every metric in the portfolio passes through — so that no number is ever shown without the statistics that say how much to trust it.

The direction is convergence rather than expansion: the canonical algorithms (the Wilson interval today duplicated in preference-modeler; the CI methods) settle here so the rest of the portfolio imports one trusted copy, and the MetricEnvelope becomes the shared shape every spoke emits and the Insight Player ranks against. The README's own follow-ups point the way — the empirical-importance regression that completes importance-reconcile (PAT-PMI5), broader CompEngine wiring behind the Monte Carlo path (PAT-147-C-FU-A), and t-distribution honesty already shipped for small samples.

8. Current status

status: "live" in src/lib/contracts/registry.ts, contract 1.13.0 (src/spokes/calculus/contracts/types.ts + CHANGELOG).

Shipped: the MetricEnvelope contract; pure-function stats core (core/stats.ts) with Wilson / normal / t-interval CIs, z-score, percentile, OLS slope, proportional-threshold trend classification; live routes stats/enrich, stats/trend, stats/impute, stats/anomaly, factory/build (with ?persist=true), governance/check, oaxaca, diagnostics, monte-carlo/run, regression-surrogate/{fit|forward|inverse} + GET [modelId], distribution-fit, importance-reconcile, metric-keys/unknown, and health. Oaxaca–Blinder (pooled/two-fold/three-fold), regression diagnostics, residual DBSCAN, Monte Carlo + OLS/ridge surrogates with forward/inverse calculators. MCP tools registered mirroring the routes. Demo seeds: drizzle/0009_pat6_seed.sql (the engagement_score envelope) and the CompEngine synthetic seed script.
In flight / planned: the empirical-importance regression feeding importance-reconcile (PAT-PMI5, lives in factor-models); CompEngine generator wiring behind monte-carlo/run (PAT-147-C-FU-A); larger Monte Carlo runs backed by blob storage (PAT-147-C-FU-B).

Visual — Tier A (live capture). GET /api/spokes/calculus/health reports the real shipped status, contract version, and schema reachability at request time.

Worked example — `stats/enrich` end to end

A practitioner has an engagement score for one team and wants to know whether it's notable and which way it's moving. The focal value (78.5) and the comparison context match the shipped seed envelope (drizzle/0009_pat6_seed.sql: engagement_score, value 78.5). The 30-team comparison distribution below is a clearly-labelled illustrative peer set; every statistic was computed by re-running the spoke's own pure functions from core/stats.ts (mean, stdev, zScore, percentileRank, normalInterval) — no figure is hand-fabricated.

Input (illustrative): focal value = 78.5, previousValue = 76.0, and a 30-team comparison distribution with mean = 76.30, stdev = 7.28.

Computed output (real output of the stats functions):

z-score of 78.5 vs. peers = 0.30 — about a third of a standard deviation above the peer mean; ordinary, not exceptional.
percentile rank = 60 — this team sits at the 60th percentile of the comparison set.
change rate vs. last quarter = +3.29% ((78.5 − 76.0) / 76.0) — a real upward move past the 2% trend threshold.
95% confidence interval on the peer mean (normalInterval, n = 30) = 73.69 to 78.91 — the band on the comparison mean the focal value is being judged against.

What the practitioner does with it: rather than report a bare "78.5, up from 76," they report "78.5 — up 3.3% quarter-over-quarter, 60th percentile among comparable teams, about a third of a standard deviation above the peer mean." The move is real (past threshold) and the standing is middling (60th percentile, z ≈ 0.3) — a conclusion the bare number could not support.

Every statistic above is a real return value of the spoke's core/stats.ts functions on the stated inputs; the peer distribution is labelled illustrative, the focal value matches the in-repo seed, and nothing is invented.