Factor Models — plain-language explainer

Factor Models turns a pile of HR metrics into a small, named set of latent drivers — and refuses to put any version of that model into production until it has passed a holdout test.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/factor-models/, contract 0.3.0); anything not yet built is marked (TBD).

1. What is it?

Factor Models owns versioned latent-factor models — small models that say "these many raw indicators roll up into these few underlying drivers, with these weights." Its job is the full lifecycle of such a model as a managed asset: build → validate → score → publish.

The default backbone is the Capability / Alignment / Motivation three-factor framing (you can add tenant-specific factors), aimed at outcome axes the model is meant to predict — Attraction / Activation / Attrition. A model is a first-class, hot-swappable object: consumers pin stable meaning to a (modelId, factor, percentile) triple while the trained weights underneath evolve under new, immutable versionIds.

Visual — Tier B (lifecycle step flow). The four states a model moves through:

create (model header) → register version (weights + CI) → validate (holdout gate) → publish (advance currentVersionId) → score (subjects against the published version)

Only a version with a passing validation report can be published; currentVersionId advances on that event and nothing else.

2. What problem does it solve — and why is it different?

The pain it removes: in most shops a "model" of engagement or attrition lives in a spreadsheet or a notebook, with weights nobody can trace, no version history, and no gate between "I fit something" and "it's live." When the weights change, every downstream consumer silently shifts under them.

The difference, stated as a shift:

FROM a single mutable spreadsheet model whose coefficients change without a record, validated (if at all) on the same data it was fit on.
TO an append-only chain of immutable versions, each fingerprinted by a versionHash over its weight payload and a trainingDatasetHash over the cohort it was trained on, where promotion to live requires a separately supplied holdout and a published pointer that moves only on a pass.

How it differs from the obvious substitutes:

vs. fitting in a notebook — the spoke makes the validation gate non-optional: publish returns a 409 unless a passing validation_reports row exists for that version. You cannot promote an unvalidated model.
vs. generic BI — BI tools show you metrics; they do not hold a theory (which indicators load on which latent factor), version that theory, or stop you from shipping a model that failed on holdout data.

Visual — Tier B (FROM→TO shift block). The block above is the visual; a rendered comparison component is a follow-up (FU-A).

3. How does it work?

A practitioner meets it as a sequence of questions, each answered by a real endpoint.

"What am I modeling?" — POST /models registers a model header: a name, the latentFactors (CAM by default, extendable), and the outcome it targets. No weights yet.
"What are the weights?" — POST /models/[id]/versions appends an immutable version. Each weight edge is latent factor ← indicator metric, carrying a point weight and a confidence interval ci: [low, high]. Indicator IDs are calculus-aligned metric keys; the version is stamped with trainedAt, a versionHash, and a trainingDatasetHash.
"Does it actually predict?" — POST /models/[id]/versions/[versionId]/validate takes holdout pairs { y, yHat } (minimum 3) plus optional per-segment fairness audit rows, and computes a HoldoutFitSummary: Pearson correlation, mean absolute error mae, and OLS r2 (see core/validation.ts). The gate (validationGatePassed) requires finite metrics, correlation within [-1, 1], non-negative R², and every segment fitDelta within a default ceiling of 0.5.
"Make it live." — POST /models/[id]/publish advances currentVersionId to the validated version — and refuses (409) if no passing report exists.
"Score a person." — POST /scores/[modelId]/[subjectId] runs the published weights against a subject's indicators. The score for a factor is a linear combination Σ weight × indicator (missing indicators default to 0), mapped to a 0–100 percentile by a deterministic monotone (tanh(raw)+1)/2 × 100 (core/scoring.ts). Repeat scores are memoized per (versionId × subject × factor), so the same inputs return the identical number.

Data sources and science. Inputs are indicator metrics from calculus, construct-level latents from reincarnation where the psychometric backbone applies, and preference-derived signals from preference-modeler. The theory donor is OB / turnover literature (the CAM backbone and Attraction / Activation / Attrition outcomes — framing only, no code lift). Training cohorts and fairness-dimension keys come from segmentation-studio (audit rows are consumer-supplied — the spoke does not read another spoke at score time).

Differentiation beat: the percentile a consumer reads is stable in meaning even as the science improves underneath it. A recruiter sees "Capability, 76th percentile"; the weights producing that can be re-fit and re-published without breaking the contract, because consumers bind to (modelId, factor, percentile), not to a coefficient table.

Visual — Tier A (real computed score). Using the demo seed's published Attraction CAM model (seeds/2026-05-22-factor-models-demo.sql), the Capability factor has two weights — demo.ind.tenure at 0.12 and demo.ind.velocity at 0.35. Scoring a subject with { tenure: 2, velocity: 1 }:

raw = 0.12 × 2 + 0.35 × 1 = 0.59
percentile = ((tanh(0.59) + 1) / 2) × 100 = 76.4947803764

(Real output of linearFactorScore + rawScoreToPercentile against the in-repo demo seed weights — no invented numbers.)

4. What does it enable?

Concrete uses a practitioner would recognize:

Roll many metrics into a few defensible drivers — collapse a wall of HR indicators into Capability / Alignment / Motivation scores leadership can actually reason about.
Govern model promotion — keep a validated, published model live for downstream consumers while a new candidate version is built and tested in parallel, with no silent swap.
Hot-swap the science without breaking consumers — re-fit weights as more data arrives; consumers keep reading the same (factor, percentile) shape.
Audit a score after the fact — every persisted score names the exact versionId, and every version carries a trainingDatasetHash, so "why was this person's score what it was last quarter?" is answerable.
Run a fairness check before going live — supply per-segment fit deltas at validation time; the gate blocks publish when a segment's fit drifts beyond the ceiling.
Generate an analytics plan — the Analytics-Plan Generator (below) turns executive priorities into a ranked list of which models matter and which measurements are worth running.

Visual — Tier A (real Analytics-Plan output). POST /api/spokes/factor-models/analytics-plan ranks candidate models by priority-relevance and ranks measures by net value-of-information (expectedNetVoi = evpi × reduction − cost). From the spoke's own test fixture:

{
  "rankedModels": [{ "modelId": "aligned", "name": "Attrition CAM", "relevanceScore": 0.7 }],
  "measurementPlan": [
    { "measureId": "hi",  "expectedNetVoi": 16, "priority": "high" },
    { "measureId": "mid", "expectedNetVoi": 8,  "priority": "high" },
    { "measureId": "lo",  "expectedNetVoi": 1,  "priority": "medium" }
  ]
}

(Values are the asserted outputs of generateAnalyticsPlan in tests/analytics-plan.test.ts: priority retention normalized to 0.7; measure hi = 20×0.9−2 = 16.)

5. How it fits in the toolbox

Data flow:

Consumes — indicator metric series from calculus; latent-construct measurement from reincarnation where psychometric linkage matters; preference-derived signals from preference-modeler; training-cohort and fairness-dimension keys from segmentation-studio. It imports only those spokes' contracts/types where IDs must align — never their internals; fairness/fit deltas are computed outside this spoke and handed in.
Emits — the model lifecycle contract: FactorModel, immutable FactorModelVersion + FactorWeight edges, FactorScore, and ModelValidationReport. Consumers (Performix — in progress; vela — planned) vendor a copy of src/spokes/factor-models/contracts/types.ts.
Composes into a Product — the Analytics-Plan Generator (PAT-PMI1) is a stateless composite that consumes value-of-information inputs computed upstream in forecasting (EVPI / EVSI proxies) and turns priorities into a measurement plan. It is one of the toolbox's "meals" assembled from instrument primitives.
Future corroboration — program-evaluation for causal backing before high-stakes promotion (planned, PAT-D3).

Visual — Tier B (typographic data-flow). calculus indicators + reincarnation latents + preference-modeler signals → Factor Models (build·validate·publish·score) → { FactorScore for consumers · Analytics-Plan (with forecasting VOI) }, with segmentation-studio supplying cohorts + fairness keys.

6. Commercialization / packaging

Factor Models is a service component, not a standalone product — it is the modeling-and-governance layer a People Analytics offering composes, sitting behind consumer surfaces rather than being sold on its own.

Data-license posture: the spoke holds no licensed third-party content. It operates on tenant-supplied indicators and consumer-supplied holdout/fairness data; the model weights and versions are the tenant's own. There is no vendor-survey constraint at this layer.
Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram).

7. The vision

A factory for trustworthy people-analytics models, where every model carries its theory, its version history, and its passing-grade — and the science underneath can keep improving without ever breaking the consumers reading it.

The near-term direction the code already points at: cohort-relative percentiles via calculus batch paths (today's percentile is a deterministic monotone mapping of the raw score, not yet a cohort rank — flagged in core/scoring.ts); richer holdout diagnostics (the contract already reserves auc and brier for discriminative models); and causal corroboration from program-evaluation before high-stakes promotion. The Analytics-Plan Generator extends the spoke from scoring into deciding what to measure next.

8. Current status

Grounded in the real code state (contract 0.3.0, src/spokes/factor-models/, status live in src/lib/contracts/registry.ts):

Shipped: Postgres schema factor_models (models, model_versions, factor_weights, factor_scores, validation_reports); the full build→validate→publish→score lifecycle with the deterministic scoring core and the holdout validation gate; the publish-only-on-pass guard (409 otherwise); memoized deterministic re-scores; the stateless Analytics-Plan Generator (PAT-PMI1). Live routes: GET /health, POST /models, GET /models/[id], POST /models/[id]/versions, POST /models/[id]/versions/[versionId]/validate, POST /models/[id]/publish, POST /scores/[modelId]/[subjectId], POST /analytics-plan. MCP tools registered: factor-models.models.create, factor-models.scores.compute, factor-models.validate, factor-models.analytics-plan. Demo tenant seed shipped.
In flight / planned: cohort-relative percentiles via calculus batch paths; discriminative holdout diagnostics (auc / brier) wired through validation; causal corroboration via program-evaluation (PAT-D3); Performix consumption (in progress), vela (planned).

Visual — Tier A (live capture). GET /api/spokes/factor-models/health reports { spoke, status, contractVersion: "0.3.0", schemaReachable, latencyMs, checkedAt } at request time.

Worked example used above is real: the demo seed's published Attraction CAM model has Capability weights demo.ind.tenure = 0.12 and demo.ind.velocity = 0.35; scoring a subject { tenure: 2, velocity: 1 } through the spoke's own linearFactorScore + rawScoreToPercentile yields raw 0.59 → percentile 76.4947803764. The Analytics-Plan figures are the asserted outputs of generateAnalyticsPlan in the spoke's test suite. No figure here is invented.