Preference Modeler — plain-language explainer

Preference Modeler turns "what do people actually care about" into a defensible number — collected through surveys built to resist guessing, aggregated only when enough people answered to keep anyone anonymous.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/preference-modeler/, contract 1.6.0); anything not yet built is marked (TBD).

1. What is it?

Preference Modeler is two things in one spoke: a survey engine that runs preference-elicitation instruments (not just rating scales but MaxDiff, conjoint, penny-allocation, paired comparison, and present-vs-future markers), and an aggregation engine that turns the raw responses into ranked preference weights with confidence intervals — gated so a result is never surfaced for a group too small to stay anonymous.

The job it does: answer "given a set of options, which ones do people actually value, by how much, and how sure are we?" — without letting respondents game the answer by saying everything is important, and without exposing any individual when the cohort is thin.

Visual — Tier A (real computed output). A present-vs-future run (the aggregatePresentFuture core, five raters across two dimensions):

POST /api/spokes/preference-modeler/present-future
→ byDimension: [
    { "dimensionId": "decision_speed", "label": "Decision speed",
      "avgPresent": 37.6, "avgFuture": 68.4, "changeIntent": 30.8,
      "stdDevPresent": 5, "stdDevFuture": 5.3, "raterCount": 5, "changeFlagged": true },
    { "dimensionId": "risk_appetite", "label": "Risk appetite",
      "avgPresent": 55, "avgFuture": 58.8, "changeIntent": 3.8,
      "stdDevPresent": 3.7, "stdDevFuture": 2.3, "raterCount": 5, "changeFlagged": false }
  ]

(Real output of the spoke's aggregatePresentFuture core on a five-rater illustrative scenario — see the worked example at the foot of this page.)

2. What problem does it solve — and why is it different?

The pain it removes: rating scales lie about priorities. Ask people to rate ten benefits one through five and almost everything comes back a four or a five — everyone says everything matters. You learn nothing about trade-offs, which is the only thing a decision-maker actually needs.

The difference, stated as a shift:

FROM a stack of 1–5 ratings where every option scores "important," with no way to see what people would give up.
TO forced-choice instruments (best-worst, paired, fixed-budget allocation, choice tasks) that make trade-offs unavoidable, scored into a single ranked scale with uncertainty attached.

How it differs from the obvious substitutes:

vs. a generic survey tool (Qualtrics, Google Forms, etc.) — those collect the responses; they rarely design a balanced MaxDiff task set or fit a choice model. Preference Modeler owns the deterministic task-balancing and the scoring math (best-worst counting, penny totals, paired win-rates, and a multinomial-logit fit for conjoint), so the elicitation and the analysis are one contract, not a spreadsheet you stitch together afterward.
vs. doing it by hand — by hand, the anonymity gate is a manual judgment call that gets forgotten under deadline. Here it is enforced in code: a result for a cohort below the survey's minimumResponseThreshold returns { status: "blocked" }, never a number.

Visual — Tier B (FROM→TO typographic block). ten options, all rated "important" → forced trade-offs → one ranked weight per option + a confidence band. A rendered comparison block is a follow-up (FU-A).

3. How does it work?

The questions a practitioner asks, with the concrete machinery behind each:

"How do I stop everyone saying everything matters?" — Choose a forced-choice instrument. The contract supports maxdiff, penny_allocation, paired_comparison, conjoint, plus dual_slider present/future markers and the standard rating/Likert/NPS types (QuestionTypeEnum, 16 types). For MaxDiff the spoke generates a balanced task design: generateMaxDiffTasks runs a seeded greedy BIBD-style balancing that penalizes already-frequent items and already-seen pairs, so every item appears a fair number of times and the design is byte-for-byte reproducible for a given (itemIds, kItems, seed).
"How do raw clicks become a ranking?" — Each instrument has its own scorer. MaxDiff: score = (best − worst) / appearances, rescaled to 0–100. Penny allocation: each option's share of the total points. Paired comparison: win share. Conjoint: a multinomial-logit (MNL) fit (Newton steps on the choice sets) yielding per-level utilities normalized within each attribute. Survey-bound aggregations also attach a Wilson-score confidence interval so a small sample reads as a wide band, not false precision.
"How do I avoid outing individuals?" — Every aggregation call runs the anonymity gate (core/anonymity-threshold.ts) against the survey's minimumResponseThreshold (default 5). Below threshold, no weights — { status: "blocked", anonymity: {...} }. With ?bySegment=true, each cohort clears the same bar independently; under-threshold cohorts are omitted entirely rather than reported.
"Where do the inputs come from?" — Uploaded/collected survey responses are the data source. There is no external feed here: the science is in the elicitation design and the scoring, not in a third-party dataset. Items can carry externalMetadata (rid / sid / studyId) so a question can be tied back to a reincarnation item or an external index.

Differentiation beat: the practitioner's real question isn't "what's the average rating" — it's "what would this group trade away, and can I trust the order?" Forced-choice elicitation produces the order; the Wilson band and the anonymity gate tell you when to trust it and when to keep quiet.

Visual — Tier A (real computed output, MaxDiff scoring path). From the scoreMaxDiff core — three benefit items over four tasks, illustrative best/worst counts; score = ((best − worst)/appearances + 1) / 2 × 100:

POST /api/spokes/preference-modeler/maxdiff/score
→ scores: [
    { "itemId": "remote",   "score": 87.5, "bestCount": 3, "worstCount": 0, "appearanceCount": 4 },
    { "itemId": "bonus",    "score": 50.0, "bestCount": 1, "worstCount": 1, "appearanceCount": 4 },
    { "itemId": "training", "score": 12.5, "bestCount": 0, "worstCount": 3, "appearanceCount": 4 }
  ]

(Values follow the core's exact formula; counts are an illustrative single-respondent scenario.)

4. What does it enable?

Concrete uses a practitioner would recognize:

Rank benefits / total-rewards trade-offs — run a MaxDiff over a benefits menu and learn that "remote work" beats "annual bonus" beats "training budget" as a ranked scale, not ten ties.
Price attribute bundles with conjoint — estimate per-level utilities (e.g. how much a title bump is worth versus a base increase) via the MNL fit, normalized within attribute.
Capture change intent — present-vs-future dual markers surface where a team thinks the org is versus where they want it, flagging dimensions with a material desired shift (changeFlagged when |changeIntent| > 10).
Segment-safe insight — read preference weights ?bySegment=true to compare cohorts, with every under-threshold cohort suppressed automatically.
Allocate a fixed budget — penny-allocation forces respondents to spend a finite pot, exposing real priorities the way a 1–5 scale never will.
Feed a downstream decision model — emit ranked, weighted preferences into a compensation or analytics-planning loop (see §5).

Visual — (TBD — a rendered ranked-weights bar with Wilson confidence bands for one survey question).

5. How it fits in the toolbox

Data flow:

Consumes — collected survey responses (its own preference_modeler schema) and, by type-only reference, segmentation-studio segment node IDs: the bySegment breakdown keys align to SegmentMembership.nodeId, so cohorts resolved by segmentation-studio are the cohorts preferences can be sliced by.
Cross-spoke wiring — questions carry optional externalMetadata (rid / sid / studyId), the hook by which a reincarnation item or external index attaches to a preference question. Types only; no internal imports.
Emits — PreferenceWeightsResponse (ranked weights + Wilson CI + anonymity block, discriminated ok | blocked), plus the standalone instrument outputs MaxDiffTasks / MaxDiffScores / PresentFutureResult. Consumers vendor src/spokes/preference-modeler/contracts/types.ts.
PA Instruments it exposes — three of the toolbox's composable elicitation primitives live here: MaxDiff generate, MaxDiff score, and present↔future markers (each tagged PA Instrument — in the registry, stateless, no DB round-trip). These compose into Products — the AnyComp decision layer is the priority-elicitation consumer that asks "which comp objectives matter most."
Feeds — Performix is the first registered consumer (PFX-4, in-progress). vela is a future consumer.

Visual — Tier B (typographic data-flow). survey responses → Preference Modeler { elicit · score · anonymity-gate } → ranked weights + CI, with segmentation-studio cohorts as the slicing dimension and AnyComp / Performix as the downstream consumers.

6. Commercialization / packaging

Preference Modeler is a service component, not a standalone product — it is the elicitation-and-aggregation engine that buyer-facing decision surfaces compose (the AnyComp decision layer's priority step is the clearest example).

Data-license posture: the inputs are the customer's own survey responses — there is no licensed third-party dataset embedded, so the licensing constraints that apply to comp-survey or government data elsewhere in the toolbox do not apply here. Anonymity protection is a product feature (the enforced min-N gate), not a license term.
Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram showing Preference Modeler under the AnyComp decision surface).

7. The vision

Make trade-offs the unit of measurement — every "which matters more" question in the portfolio answered with a forced-choice instrument, scored into one trustworthy ranked scale, and never surfaced when the cohort is too small to stay anonymous.

The near-term direction is to deepen the conjoint leg (the README flags conjoint task generation — balanced design for adaptive presentation — as a remaining follow-up; the conjoint_tasks table exists but the generator isn't wired, whereas the MNL aggregation already is) and to broaden the stateless PA-Instrument set so more Products can elicit priorities directly. Tighter coupling into the universal feedback loop — where preference signals feed back into what content/options each person sees next — is a portfolio-level aim, (TBD) here.

Visual — (TBD — instrument-coverage map: which elicitation types are wired end-to-end vs. follow-up).

8. Current status

Grounded in the real code state (contract 1.6.0, src/spokes/preference-modeler/):

Shipped: survey create/get/respond with per-question shape + range validation; anonymity-gated preference aggregation ({ ok | blocked }) with optional ?bySegment=true cohort breakdowns; Wilson-score confidence intervals; aggregation for maxdiff, penny_allocation, paired_comparison, and conjoint (MNL fit, since 1.2.0); deterministic MaxDiff/Conjoint task materialization per respondent (1.3.0); tenant scoping on writes and reads (1.4.0); and three stateless PA Instruments — MaxDiff generate + score (1.5.0) and present↔future markers (1.6.0). Live routes: POST /surveys, GET /surveys/[id], POST /surveys/[id]/responses, POST .../respondents/[id]/tasks, GET .../preferences, POST /maxdiff/generate, POST /maxdiff/score, POST /present-future, GET /health. MCP tools registered. A demo seed (drizzle/0003_pat3_seed.sql + 0010) returns aggregated weights immediately on a fresh deploy.
In flight / planned: conjoint task generation (balanced BIBD design for adaptive presentation) — table present, generator not yet wired (README follow-up). The spoke README's header still cites CONTRACT_VERSION = "1.1.0", which is stale — the contract file and registry are at 1.6.0 (a doc-drift fix, not a code gap).

Visual — Tier A (live capture). GET /api/spokes/preference-modeler/health reports { spoke, status, contractVersion: "1.6.0", schemaReachable, latencyMs, checkedAt } at request time.

Worked example (real core output): the present-vs-future scenario in §1 is the actual output of aggregatePresentFuture for five raters across two dimensions. On Decision speed the team places itself at an average of 37.6/100 today and wants 68.4 — a changeIntent of +30.8, well past the ±10 material-change threshold, so changeFlagged: true; the cross-rater stdDev of ~5 says they broadly agree on both. On Risk appetite present 55.0 → future 58.8 is a changeIntent of +3.8, below threshold (changeFlagged: false): the team is roughly where it wants to be. A leader reads this as "we are aligned that decision speed must change materially, and we don't need to touch risk appetite" — a prioritized, dispersion-aware answer no 1–5 rating could give. Every number here is computed by the spoke's own core; the rater markers are an illustrative scenario, clearly labeled as such.