Leadership Index — plain-language explainer

Leadership Index scores a leader on the one thing a 360 survey can't see: whether they actually understand their world — how well they predict it, and how closely their self-view matches how the people above, below, and beside them experience them.

A People Analytics Toolbox Product — a "meal" assembled from composable measurement primitives ("ingredients") via HTTP + vendored contracts, never cross-spoke internals. It is delivered through the manager-effectiveness spoke (contract 1.2.0) and composes the forecasting and performance-validity spokes. Every claim below is grounded in the real code (src/spokes/manager-effectiveness/core/leadership-index.ts, src/spokes/performance-validity/core/directional-alignment.ts, src/spokes/forecasting/) and the contracts those spokes emit; anything not yet built is marked (TBD). The spoke that hosts this Product has its own explainer — see Manager Effectiveness for the nine-domain MEI internals this sits alongside.

The lead

Leadership Index is a small, stateless composite that answers a deliberately narrow question: is this an epistemically sound leader? Not "is the team happy" and not "did the numbers hit" — those live elsewhere. It scores two things you can defend on the merits: Predictive Acuity (how often the leader's forecasts actually came true) and three-way alignment (how closely the leader's view of themselves matches how their managers, their reports, and their peers see them — up, down, and lateral).

It is one of the toolbox's three PA Products, alongside the AnyComp decision layer and the Analytics-Plan Generator. Where those compose decision and modeling primitives, Leadership Index composes measurement primitives into a leadership readout.

Visual — Tier A (live API capture). The whole Product is one POST. A real call to the running spoke, computed by the pure computeLeadershipIndex core:

POST /api/spokes/manager-effectiveness/leadership-index
{
  "subjectId": "leader_42",
  "predictiveAcuity": { "calibrationRate": 0.80 },
  "alignment": { "up": 99.13, "down": 85.53, "lateral": 96.51 }
}
→ {
    "subjectId": "leader_42",
    "predictiveAcuity": 80,
    "alignment": { "up": 99.13, "down": 85.53, "lateral": 96.51, "composite": 93.72 },
    "leadershipIndex": 88.23,
    "components": [
      { "label": "Predictive Acuity",  "value": 80,    "weight": 0.4 },
      { "label": "Upward Alignment",   "value": 99.13, "weight": 0.2 },
      { "label": "Downward Alignment", "value": 85.53, "weight": 0.2 },
      { "label": "Lateral Alignment",  "value": 96.51, "weight": 0.2 }
    ]
  }

(Real output of computeLeadershipIndex; the upstream alignment values are themselves real output of computeDirectionalAlignment — see the worked example for the full chain and where each number comes from.)

Manifesto — why it exists

Almost every organization measures leaders the same way: a 360 survey, an engagement pulse, maybe a calibration rating. Every one of those instruments asks variations of do people like working for this person? That matters — but it is a popularity signal, and it is gameable, lagging, and silent on the thing that actually separates good leaders from lucky ones.

The worldview behind Leadership Index is that good leadership is, at root, epistemic. A leader who consistently calls what's going to happen — and who has an accurate read on how they land with the people around them — makes better bets, surfaces fewer surprises, and corrects faster. A leader who is confidently wrong about both is a risk no engagement score will catch until it's too late.

The pain it removes, stated as a shift:

FROM "is this a good manager?" answered with one upward-feedback average — a single number that conflates likeability with competence, can't see whether the leader's confidence is earned, and goes quiet on whether their self-image is calibrated.
TO two orthogonal, defensible signals: were they right? (Predictive Acuity, scored against realized outcomes) and do they see themselves the way others see them, in every direction? (three-way alignment) — blended into one 0–100 index with every component visible underneath.

The differentiation beat, against the obvious substitutes:

vs. a 360 / engagement survey — those measure sentiment about the leader. Leadership Index measures the leader's accuracy: their forecasts are scored against what actually happened (via the Winkler interval score), and their self-rating is scored against three independent cohorts. Sentiment is an input to alignment, not the whole answer.
vs. doing it by hand — the directional read ("my reports rate me lower than my peers do") is the kind of thing a careful analyst can eyeball from a spreadsheet, but the calibration leg requires keeping a leader's past interval forecasts and scoring them when outcomes land — bookkeeping nobody does manually. Leadership Index makes the loop a primitive.
vs. generic BI — BI can chart the survey scores side by side. It cannot turn "forecasts landed inside the interval 80% of the time" and "down-alignment is 14 points below up-alignment" into one defensible composite with a renormalized weighting and a per-component breakdown.

It is also explicitly not the whole story. Leadership Index scores epistemic quality only. Operational effectiveness, team output, retention, calibration discipline — those are the nine domains of the Manager Effectiveness Index, which lives in the same spoke. Leadership Index is the additive, lighter-weight composite you reach for when the question is specifically "is this leader well-calibrated and self-aware?"

Visual — Tier B (FROM→TO shift). The shift above is the visual; a rendered comparison block is a follow-up (TBD — FU-A rendered FROM→TO block).

Walkthrough — how it actually works

The developer's path through the Product is three calls, in order. Leadership Index itself is the third — it is pure and stateless, and it composes the outputs of the first two primitives. The caller runs the upstream primitives and feeds their results in; there are no cross-spoke imports anywhere in this chain.

Step 1 — score the leader's forecasts (Predictive Acuity). Call the forecasting PA Instrument POST /api/spokes/forecasting/interval-scoring with the leader's past prediction intervals and the actuals that have since landed. Each estimate is { lower, upper, level, actual }; the primitive returns, per estimate, whether the actual was contained and the Winkler intervalScore (which rewards tight intervals and penalizes ones that miss), and a summary calibrationRate — the fraction of intervals that contained their actual. That calibrationRate ∈ [0,1] is the only field Leadership Index needs from this step.

Step 2 — score self-vs-others alignment in three directions. Call the performance-validity primitive POST /api/spokes/performance-validity/directional-alignment with the focal leader's self-ratings and three cohorts: up (managers / execs), down (direct reports), and lateral (peers). The primitive is org-graph-agnostic — the caller supplies cohort membership; it does not infer the hierarchy. For each non-empty direction it builds per-item value arrays across {focal + cohort} and runs the same inverted–coefficient-of-variation item logic as the base alignment primitive (cv = sd/mean, alignmentScore = (1 − min(cv/1.5, 1)) × 100), then rolls each direction up to a single alignmentScore (0–100). Tighter agreement → higher score. The result is byDirection[] with one entry per direction that had a cohort.

Step 3 — compose the index. Call POST /api/spokes/manager-effectiveness/leadership-index with subjectId, predictiveAcuity: { calibrationRate }, and alignment: { up, down, lateral } (each direction optional). The core then:

Converts calibration to a 0–100 Predictive Acuity score (calibrationRate × 100, clamped).
Treats Predictive Acuity and each present alignment direction as its own component.
Applies default component weights — Predictive Acuity 0.4, each of up / down / lateral 0.2 (LEADERSHIP_INDEX_DEFAULT_WEIGHTS) — and renormalizes them over the components actually present, so an absent direction doesn't silently drop the index, it just reweights. Weights are overridable per call.
Returns the LeadershipIndexScore: the leadershipIndex (0–100), the predictiveAcuity score, the alignment block (each direction plus a composite mean of the present directions), and a components[] array spelling out every label, value, and renormalized weight.

Inputs → outputs, at a glance: past interval forecasts + a self-rating + three rater cohorts → { leadershipIndex · predictiveAcuity · per-direction alignment + composite · the weighted component ladder }.

The differentiation that matters here: the output is never a black box. The components[] array is the receipt — a reviewer can see that, say, the headline is being pulled down by downward alignment specifically, not by calibration, and act on exactly that.

Visual — Tier B (the three-call chain, step flow). forecasting.interval-scoring (past forecasts + actuals → calibrationRate) → performance-validity.directional-alignment (self + up/down/lateral cohorts → 3 alignmentScores) → manager-effectiveness.leadership-index (compose → renormalized weighted index + component ladder).

What it enables

Concrete uses a practitioner would recognize:

A leadership scorecard that isn't a popularity contest — give the ELT one 0–100 number per leader that rewards being right and being self-aware, with the component ladder underneath for the conversation.
Catch the confidently-wrong leader — a leader can sail through engagement surveys while their forecasts routinely miss; low Predictive Acuity surfaces that even when sentiment is high.
See the blind side — the per-direction breakdown exposes the classic pattern where a leader rates well up (their manager loves them) but badly down (their reports experience them very differently). A single composite hides this; the components[] array names it.
Coach on the specific gap — because the output decomposes, a leadership-development partner can target the actual deficit (calibration drills vs. a downward-alignment listening problem) instead of a vague "work on leadership."
Run it on whatever cohorts you define — the alignment primitive is org-graph-agnostic, so you can score alignment against a project team, a matrixed dotted-line group, or a formal reporting chain — whatever cohort membership the caller supplies.
Tune the emphasis per use — override the component weights for a given program (e.g. weight downward alignment higher for a first-line-manager cohort) without changing the math.

Visual — (TBD — a rendered per-component bar showing Predictive Acuity + the three alignment directions for one leader, with the headline index annotated).

How it fits in the toolbox

Leadership Index is a Product, so it owns no schema of its own beyond the host spoke — it is composition, not storage.

Hosted by — the manager-effectiveness spoke (manager_effectiveness schema, contract 1.2.0). The Leadership Index endpoint is additive and stateless; it does not touch the nine-domain MeiCompositeScore, the tenant weight profiles, or the recalibration audit. Those are the spoke's separate MEI surface — see the Manager Effectiveness explainer.
Composes (PA Instruments, via HTTP + vendored contracts only) —
- forecasting.interval-scoring (a PA Instrument in the registry) for the calibration leg → Predictive Acuity. Contract: src/spokes/forecasting/contracts/types.ts (IntervalScoreResult).
- performance-validity directional-alignment for the three-way alignment leg. Core + route are live (POST /api/spokes/performance-validity/directional-alignment); contract: src/spokes/performance-validity/contracts/types.ts (DirectionalAlignmentResult). The base performance-validity.alignment Instrument it reuses is registered in src/lib/contracts/registry.ts.
Org-graph relationship — the canonical primitives catalog (docs/primitives/00-CATALOG.md) lists org-graph as the natural definer of up/down/lateral cohorts. Today the directional-alignment primitive is org-graph-agnostic — the caller supplies cohort membership; wiring org-graph to derive those cohorts is (TBD / not yet built).
Emits — the LeadershipIndexScore contract. Consumers vendor src/spokes/manager-effectiveness/contracts/types.ts.
Consumers — the registry lists performix and vela as consumers of the host spoke, both status: "planned" — so no consumer surface meets this Product in production yet.
The boundary is real — every leg of this chain is HTTP + contracts. computeLeadershipIndex imports only its own contract types; it never reaches into forecasting/core or performance-validity/core.

Visual — Tier B (typographic data-flow). forecasting (interval-scoring) ─▶ calibrationRate performance-validity (directional-alignment) ─▶ up / down / lateral alignmentScores { those outputs } ─▶ manager-effectiveness.leadership-index ─▶ LeadershipIndexScore ─▶ { ELT scorecard · performix (planned) · vela (planned) }

One load-bearing worked example

A full three-primitive run for one leader (leader_42), with each number traced to the core that produced it. The directional-alignment and leadership-index values below are real output of the spokes' pure cores (computeDirectionalAlignment, computeLeadershipIndex); the input cohorts and forecast set are a clearly-labeled illustrative scenario.

Step 1 — Predictive Acuity (forecasting.interval-scoring). Over the leader's last five forecasts, four of the five actuals landed inside the stated interval (one missed). The summary calibrationRate is therefore 4 / 5 = 0.80. Leadership Index reads this as Predictive Acuity 0.80 × 100 = 80.

Step 2 — three-way alignment (performance-validity.directional-alignment). The leader self-rates on three items; three cohorts rate the same items:

Up (two execs) rate the leader very close to their self-view → computeDirectionalAlignment returns up: 99.13.
Down (three direct reports) rate the leader markedly lower and with more spread → down: 85.53.
Lateral (two peers) land in between → lateral: 96.51.

These are the actual outputs of the inverted-CV item math (cv = sd/mean, alignmentScore = (1 − min(cv/1.5, 1)) × 100, averaged over items), computed directly from the illustrative cohort ratings.

Step 3 — compose (manager-effectiveness.leadership-index). Feeding those four numbers in:

POST /api/spokes/manager-effectiveness/leadership-index
{
  "subjectId": "leader_42",
  "predictiveAcuity": { "calibrationRate": 0.80 },
  "alignment": { "up": 99.13, "down": 85.53, "lateral": 96.51 }
}
→ {
    "subjectId": "leader_42",
    "predictiveAcuity": 80,
    "alignment": { "up": 99.13, "down": 85.53, "lateral": 96.51, "composite": 93.72 },
    "leadershipIndex": 88.23,
    "components": [
      { "label": "Predictive Acuity",  "value": 80,    "weight": 0.4 },
      { "label": "Upward Alignment",   "value": 99.13, "weight": 0.2 },
      { "label": "Downward Alignment", "value": 85.53, "weight": 0.2 },
      { "label": "Lateral Alignment",  "value": 96.51, "weight": 0.2 }
    ]
  }

The arithmetic the core runs: index = 80×0.4 + 99.13×0.2 + 85.53×0.2 + 96.51×0.2 = 32 + 19.826 + 17.106 + 19.302 = 88.23 (weights 0.4 / 0.2 / 0.2 / 0.2 already sum to 1, so renormalization is the identity here); alignment composite = mean(99.13, 85.53, 96.51) = 93.72.

The actionable read. A headline of 88 looks like a strong leader — and on calibration and upward alignment, they are. But the component ladder is the point: downward alignment (85.53) sits ~14 points below upward alignment (99.13), and the missed forecast pulls Predictive Acuity (80) below the alignment legs. The two things a development partner would actually do — tighten the leader's forecasting discipline, and investigate why direct reports experience this leader differently than execs do — fall straight out of the decomposed output. That is the difference between a Leadership Index and a 360 average.

(The leadership-index and directional-alignment figures are real core output; the forecast set and cohort ratings are illustrative inputs, labeled as such. No figure is invented as real.)

Commercialization / packaging

Leadership Index is a Product in the toolbox's "ingredients → meals" framing — a composed capability, not a separately-priced SKU today. It is surfaced through a leadership-analytics offering (the kind of scorecard a consumer like Performix or vela would render), not sold on its own.

Data-license posture: clean. The composite math, the Winkler interval scoring, and the inverted-CV alignment are the toolbox's own deterministic methods. The scoring inputs are the customer's own data — their leaders' past forecasts and their own rater cohorts. No third-party survey instrument or licensed dataset is embedded in this Product.
Tier placement / pricing: (TBD) — not earned yet, so not stated. Both registry consumers are still planned; product packaging is downstream of a consuming surface shipping.

Visual — (TBD — product-tier placement diagram, once a consuming surface ships).

Current status

Grounded in the real code state (host spoke contract 1.2.0, status: "live" in src/lib/contracts/registry.ts):

Shipped:
- The Leadership Index composite core (computeLeadershipIndex) — pure, stateless, renormalizing weighted blend with the full component ladder — and its route POST /api/spokes/manager-effectiveness/leadership-index (service-key gated, logged, contract-validated on the way in and out). Unit-tested (src/spokes/manager-effectiveness/tests/).
- The LeadershipIndexRequest / LeadershipIndexScore / LeadershipIndexComponent contracts (manager-effectiveness/contracts/types.ts, 1.2.0).
- Both upstream legs as live primitives: forecasting.interval-scoring (registered PA Instrument) and performance-validity directional-alignment (live core + route at POST /api/spokes/performance-validity/directional-alignment).
In flight / planned:
- The directional-alignment route is live but not yet enrolled in the contract registry's performance-validity contract list (only the base performance-validity.alignment Instrument is) — registry enrolment is a small open gap, (TBD).
- org-graph-derived up/down/lateral cohorts (today cohorts are caller-supplied) — (TBD / not yet built).
- A consuming surface — both performix and vela are status: "planned" in the registry, so no buyer meets this Product in production yet.
- A dedicated MCP tool for the Leadership Index composite step (the host spoke registers MCP tools for the MEI surface; a Leadership-Index-specific tool is (TBD)).
- Product-tier packaging — (TBD).

Visual — Tier A (live capture). GET /api/spokes/manager-effectiveness/health reports the host spoke's real shipped status + schema reachability at request time; GET /api/registry lists the host spoke's contract version (1.2.0) and live contract IDs.

The vision

A leadership score that rewards being right and being honest with yourself — measured against reality and against every cohort that experiences you — and that always shows its work.

The direction is to close the elicitation loop end to end: leaders' forward forecasts captured as calibrated intervals, scored automatically as outcomes land, so Predictive Acuity becomes a continuously-refreshing signal rather than a one-time computation; cohorts derived from org-graph so up/down/lateral resolve themselves; and the composite surfaced through a consuming scorecard where the component ladder — not just the headline — is what a leader and their coach actually work from. Every step is additive to the contract spine already shipped, and every step keeps the rule that makes this Product trustworthy: the number is never a black box.