Manager Effectiveness — plain-language explainer

Manager Effectiveness turns the scattered evidence about a manager — survey ratings, team outcomes, calibration behavior, retention, forecast quality — into one defensible 0–100 score, with every contributing piece visible underneath it.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/manager-effectiveness/, contract 1.2.0); anything not yet built is marked (TBD).


1. What is it?

Manager Effectiveness produces the Manager Effectiveness Index (MEI) — a hierarchical composite that scores a leader on a 0–100 ladder by blending nine evidence domains, each of which expands to a stable list of named measures. It also ships a second, lighter composite — the Leadership Index — that scores a leader purely on epistemic quality: how well-calibrated their forecasts are and how aligned their self-view is with how others see them (up, down, and lateral).

The job it does: take the evidence about a manager that today lives in five different systems and reduce it to a single number a calibration committee can argue about — while keeping the full ladder underneath, so the number is never a black box.

Visual — Tier B (the nine MEI pillars, with default weights). The composite is a weighted blend over these nine domains (default shares from core/compute.ts, mirroring the FiveTran MQI mix with a residual split across ancillary pillars):

  • Upward feedback — 0.40
  • Decision discipline (calibration) — 0.20
  • Operational health — 0.20
  • People development — 0.10
  • Team output — 0.04
  • Talent stewardship — 0.02
  • Equity & fairness — 0.02
  • Self-awareness & consistency — 0.02
  • Span / load fit — 0.02

(Default DEFAULT_DOMAIN_WEIGHTS, src/spokes/manager-effectiveness/core/compute.ts. Weights are per-tenant hot-swappable and validated to sum to 1 ± 0.02.)

2. What problem does it solve — and why is it different?

The pain it removes: "is this a good manager?" is a question almost every organization answers with one survey question and a gut call, because the real evidence is scattered — engagement-survey ratings in one tool, team goal attainment in another, calibration behavior in the talent system, retention in the HRIS, the manager's own past-cycle forecasts nowhere at all.

The difference, stated as a shift:

  • FROM a single upward-feedback average standing in for "manager quality," with no way to see what's underneath or to retune it.
  • TO a nine-domain composite where every domain expands to named measures, the weights are explicit and per-tenant editable, and — when validity evidence exists — the weights can be recalibrated to what actually predicts retention, promotion, and next-cycle performance.

How it differs from the obvious substitutes:

  • vs. a single engagement-survey score — upward feedback is one of nine domains here (the heaviest, but only 0.40), so a manager who scores well on the survey but bleeds top talent or never differentiates in calibration doesn't get to hide.
  • vs. a hand-built spreadsheet composite — the spreadsheet's weights are arbitrary and frozen. This spoke's weights start from a defensible prior (the FiveTran-aligned mix) and can be convex-blended toward empirically-derived shares as predictive evidence accumulates. The blend is auditable: every recalibration appends a row.
  • vs. generic BI — BI shows you the nine numbers side by side; it does not give you one defensible composite, the band, the narration, or the validity bridge that retunes the blend.

Visual — Tier B (FROM→TO shift). The shift above is the visual; a rendered comparison block is a follow-up (FU-A).

3. How does it work?

The practitioner's question is "where does this score come from, and can I defend each piece?" — so the method is built to be traceable end to end.

Inputs. A CompositeComputeRequest: a leaderId, tenantId, cycleId, an optional map of measureValues keyed by stable measure IDs (pm_survey.overall_mgr_rating, ops.op_metric_attainment, talent.regret_exit_rate, calibration.harshness_index, …), and an optional flightRiskPenalty.

Method — hierarchical weighted blend.

  1. Each of the nine domains expands to its enumerated measureIds (the canonical ladder in core/compute.ts).
  2. Each measure is standardized around a baseline and scored to 0–100; a domain's score is the mean of its contributing measures.
  3. The composite is the weight-by-domain sum of those domain scores, minus the flight-risk penalty, clamped to 0–100.
  4. A band is assigned (high ≥ 71, mid ≥ 46, else low) and a narration string is composed via the rating-codec narrateRSquared narrator.

Outputs. A MeiCompositeScore: the compositeScore (0–100), the band, the full domainScores[] array (each with its weight and its contributingMeasures), the flightRiskPenalty, a computedAt timestamp, and a plain-language narration.

The Leadership Index path (additive, stateless). A second composite scores epistemic quality only: Predictive Acuity (forecast calibration rate, from the forecasting interval-scoring primitive) plus three-way alignment (up / down / lateral, from the performance-validity directional-alignment primitive). The caller computes those upstream primitives and feeds their outputs here — no cross-spoke imports.

The validity bridge — the differentiation beat. The honest version of "manager quality" isn't a fixed formula; it's whatever predicts the outcomes you care about. POST /weights/recalibrate reads pooled per-domain predictive r² snapshots (persisted upstream in performance-validity), converts each domain's explanatory mass into a pillar share, convex-blends those empirical shares against the FiveTran-informed prior (alphaEmpirical defaults to 0.7 empirical / 0.3 prior), persists the merged weights, and appends an analyst-facing audit row. So the composite can learn which domains actually matter for this tenant — and show its work.

Science backing. The default weight mix is informed by the FiveTran Manager Quality Index pillar structure (prior, not gospel — it is explicitly the thing the validity bridge blends away from as evidence arrives). Standardization, banding, and the convex empirical/prior blend are the spoke's own deterministic methods, documented in core/.

Visual — Tier B (inputs → method → outputs, step flow). measureValues (5 systems) → standardize per measure → mean per domain → weighted blend over 9 domains − flight-risk penalty → MeiCompositeScore { score · band · full ladder · narration }.

4. What does it enable?

Concrete uses a practitioner would recognize:

  • Score managers for an ELT scorecard — one defensible 0–100 number per leader per cycle, with the band and the full domain ladder underneath for the calibration conversation.
  • Retune what "good manager" means for this org — UPSERT the nine pillar weights per tenant (validated to sum to 1) when leadership decides, say, that people-development should count more.
  • Let the weights learn from outcomes — recalibrate the blend toward the domains that empirically predict retention, promotion, or next-cycle performance, and keep an audit trail of every reweighting.
  • Score epistemic leadership separately — run the Leadership Index when the question is "is this leader well-calibrated and self-aware?" rather than "are they operationally effective?"
  • Classify a team's staffing shapeGET /archetype maps net growth + exit rate + headcount to a labeled archetype (healthy-growth, revolving-door, hyper-growth-high-churn, …) so a "high MEI" leader running a revolving door gets flagged.
  • Hydrate individual measure ladders — because each domain expands to stable measureIds, a downstream surface can drill into a single ladder without pinning to the headline composite.

Visual — (TBD — a rendered MEI domain-contribution bar showing the nine weighted slices for one leader).

5. How it fits in the toolbox

Data flow:

  • Consumes (as primitive outputs, via contracts only)forecasting's interval-scoring output (calibration rate → Predictive Acuity) and performance-validity's directional-alignment output (up/down/lateral → 3-way alignment) for the Leadership Index. The validity bridge consumes pooled per-domain predictive r² snapshots that operators UPSERT into performance-validity (POST /mei-predictive-evidence). All cross-spoke contact is HTTP + vendored contracts — never another spoke's core/.
  • Consumes (as raw inputs) — measure values that originate in upstream HR systems: engagement/upward-feedback surveys, operational/goal systems, the talent system (calibration, retention, mobility), and HRIS headcount flows. These arrive as the measureValues map; the spoke does not own those source systems.
  • Persistsmanager_effectiveness.tenant_domain_weight_profiles (hot-swappable per-tenant weights) and manager_effectiveness.mei_weight_recalibration_audit (append-only recalibration log).
  • Emits — the MeiCompositeScore, LeadershipIndexScore, TenantWeightsResponse, RecalibrationResponse, and ArchetypeClassifyResponse contracts. Consumers (Performix, vela — both currently planned in the registry) vendor src/spokes/manager-effectiveness/contracts/types.ts.
  • Composes into — the Leadership Index PA Product (manager-effectiveness 1.2.0), one of the toolbox's "meals" assembled from PA Instruments via HTTP + contracts.

Visual — Tier B (typographic data-flow). forecasting (interval-scoring) + performance-validity (directional-alignment) ─▶ Leadership Index HR survey · ops · talent · HRIS measures ─▶ MEI composite ─▶ { ELT scorecard · Performix · vela } performance-validity (mei_predictive_evidence) ─▶ /weights/recalibrate ─▶ retuned tenant weights

6. Commercialization / packaging

Manager Effectiveness is a service component, not a standalone product — it is the scoring engine behind a leadership-analytics offering, surfaced through buyer-facing scorecards (the registry lists Performix and vela as planned consumers) rather than sold on its own.

  • Data-license posture: the default weight mix is informed by the publicly-described FiveTran Manager Quality Index pillar structure, used as a starting prior — the composite math, standardization, and validity-blend are the toolbox's own. The actual scoring inputs are the customer's own HR data; no third-party survey license is embedded in the spoke.
  • Tier placement / pricing: (TBD) — not earned yet, so not stated. The product packaging that would sell a Leadership Index is downstream of consumer adoption (both consumers are still planned).

Visual — (TBD — product-tier placement diagram once a consuming surface ships).

7. The vision

One honest answer to "is this a good manager?" — composed from all the evidence, weighted by what actually predicts the outcomes you care about, and always able to show its work.

The direction is to move the composite from a defensible prior toward a learned model: as more validity evidence accumulates per tenant, the recalibration bridge shifts weight onto the domains that genuinely predict retention, promotion, and next-cycle performance — and every shift is auditable. The deeper per-domain measure connectors (today the composite runs deterministically on supplied or baseline-derived measure values) land incrementally atop the stable contract spine, and the Leadership Index extends as more epistemic-quality primitives come online.

8. Current status

Grounded in the real code state (contract 1.2.0, status: "live" in src/lib/contracts/registry.ts, src/spokes/manager-effectiveness/):

  • Shipped: the nine-domain MEI contract spine + deterministic composite engine with rating-codec narration (1.0.0); Postgres tenant weight UPSERT surface; the empirical recalibration bridge reading pooled mei_predictive_evidence and convex-blending vs FiveTran defaults (1.1.0); the additive, stateless Leadership Index — Predictive Acuity + 3-way alignment (1.2.0). Live routes: POST /composite, GET/POST /weights, POST /weights/recalibrate, GET /archetype, POST /leadership-index, GET /health. MCP tools registered (src/spokes/manager-effectiveness/mcp/register.ts); wired into the health aggregate (manager_effectiveness schema) and the contract registry. Leadership Index core is unit-tested (tests/leadership-index.test.ts).
  • In flight / planned: deeper per-domain measure connectors that replace baseline-derived measure values with real upstream feeds; Performix + vela consumer wiring (both planned in the registry); product-tier packaging (TBD).

Visual — Tier A (live capture). GET /api/spokes/manager-effectiveness/health reports the real shipped status + schema reachability at request time, and GET /api/spokes/manager-effectiveness/weights?tenantId=<id> returns the merged pillar weights (Postgres profile when present, else the defaults in §1).


Worked example (load-bearing)

A real end-to-end run of the Leadership Index core (computeLeadershipIndex, src/spokes/manager-effectiveness/core/leadership-index.ts). A leader's upstream primitives have already been computed elsewhere: their forecasts landed inside the interval 82% of the time (forecasting interval-scoring → calibrationRate: 0.82), and performance-validity directional-alignment scored them up: 88, down: 74, lateral: 81.

Request (the wire shape is the LeadershipIndexRequest contract):

POST /api/spokes/manager-effectiveness/leadership-index
{
  "subjectId": "leader_42",
  "predictiveAcuity": { "calibrationRate": 0.82 },
  "alignment": { "up": 88, "down": 74, "lateral": 81 }
}

The core computes: Predictive Acuity = 0.82 × 100 = 82; alignment composite = mean(88, 74, 81) = 81.00; index = 82×0.4 + 88×0.2 + 74×0.2 + 81×0.2 = 32.8 + 17.6 + 14.8 + 16.2 = 81.40 (default weights, renormalized over the four present components).

Response (LeadershipIndexScore):

{
  "subjectId": "leader_42",
  "predictiveAcuity": 82,
  "alignment": { "up": 88, "down": 74, "lateral": 81, "composite": 81 },
  "leadershipIndex": 81.4,
  "components": [
    { "label": "Predictive Acuity", "value": 82, "weight": 0.4 },
    { "label": "Upward Alignment", "value": 88, "weight": 0.2 },
    { "label": "Downward Alignment", "value": 74, "weight": 0.2 },
    { "label": "Lateral Alignment", "value": 81, "weight": 0.2 }
  ]
}

(Computed directly from the spoke's pure computeLeadershipIndex core; the arithmetic matches the contract's renormalization rule and the spoke's own unit tests. The single down-alignment soft spot — 74 vs. 88 upward — is the actionable read a practitioner takes from the components array. No figure here is invented.)