Preference Modeler — plain-language explainer
Preference Modeler turns "what do people actually care about" into a defensible number — collected through surveys built to resist guessing, aggregated only when enough people answered to keep anyone anonymous.
A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/preference-modeler/, contract 1.6.0); anything not yet built is marked (TBD).
1. What is it?
Preference Modeler is two things in one spoke: a survey engine that runs preference-elicitation instruments (not just rating scales but MaxDiff, conjoint, penny-allocation, paired comparison, and present-vs-future markers), and an aggregation engine that turns the raw responses into ranked preference weights with confidence intervals — gated so a result is never surfaced for a group too small to stay anonymous.
The job it does: answer "given a set of options, which ones do people actually value, by how much, and how sure are we?" — without letting respondents game the answer by saying everything is important, and without exposing any individual when the cohort is thin.
Visual — Tier A (real computed output). A present-vs-future run (the aggregatePresentFuture core, five raters across two dimensions):
POST /api/spokes/preference-modeler/present-future
→ byDimension: [
{ "dimensionId": "decision_speed", "label": "Decision speed",
"avgPresent": 37.6, "avgFuture": 68.4, "changeIntent": 30.8,
"stdDevPresent": 5, "stdDevFuture": 5.3, "raterCount": 5, "changeFlagged": true },
{ "dimensionId": "risk_appetite", "label": "Risk appetite",
"avgPresent": 55, "avgFuture": 58.8, "changeIntent": 3.8,
"stdDevPresent": 3.7, "stdDevFuture": 2.3, "raterCount": 5, "changeFlagged": false }
]
(Real output of the spoke's aggregatePresentFuture core on a five-rater illustrative scenario — see the worked example at the foot of this page.)
2. What problem does it solve — and why is it different?
The pain it removes: rating scales lie about priorities. Ask people to rate ten benefits one through five and almost everything comes back a four or a five — everyone says everything matters. You learn nothing about trade-offs, which is the only thing a decision-maker actually needs.
The difference, stated as a shift:
- FROM a stack of 1–5 ratings where every option scores "important," with no way to see what people would give up.
- TO forced-choice instruments (best-worst, paired, fixed-budget allocation, choice tasks) that make trade-offs unavoidable, scored into a single ranked scale with uncertainty attached.
How it differs from the obvious substitutes:
- vs. a generic survey tool (Qualtrics, Google Forms, etc.) — those collect the responses; they rarely design a balanced MaxDiff task set or fit a choice model. Preference Modeler owns the deterministic task-balancing and the scoring math (best-worst counting, penny totals, paired win-rates, and a multinomial-logit fit for conjoint), so the elicitation and the analysis are one contract, not a spreadsheet you stitch together afterward.
- vs. doing it by hand — by hand, the anonymity gate is a manual judgment call that gets forgotten under deadline. Here it is enforced in code: a result for a cohort below the survey's
minimumResponseThresholdreturns{ status: "blocked" }, never a number.
Visual — Tier B (FROM→TO typographic block). ten options, all rated "important" → forced trade-offs → one ranked weight per option + a confidence band. A rendered comparison block is a follow-up (FU-A).
3. How does it work?
The questions a practitioner asks, with the concrete machinery behind each:
- "How do I stop everyone saying everything matters?" — Choose a forced-choice instrument. The contract supports
maxdiff,penny_allocation,paired_comparison,conjoint, plusdual_sliderpresent/future markers and the standard rating/Likert/NPS types (QuestionTypeEnum, 16 types). For MaxDiff the spoke generates a balanced task design:generateMaxDiffTasksruns a seeded greedy BIBD-style balancing that penalizes already-frequent items and already-seen pairs, so every item appears a fair number of times and the design is byte-for-byte reproducible for a given(itemIds, kItems, seed). - "How do raw clicks become a ranking?" — Each instrument has its own scorer. MaxDiff:
score = (best − worst) / appearances, rescaled to 0–100. Penny allocation: each option's share of the total points. Paired comparison: win share. Conjoint: a multinomial-logit (MNL) fit (Newton steps on the choice sets) yielding per-level utilities normalized within each attribute. Survey-bound aggregations also attach a Wilson-score confidence interval so a small sample reads as a wide band, not false precision. - "How do I avoid outing individuals?" — Every aggregation call runs the anonymity gate (
core/anonymity-threshold.ts) against the survey'sminimumResponseThreshold(default 5). Below threshold, no weights —{ status: "blocked", anonymity: {...} }. With?bySegment=true, each cohort clears the same bar independently; under-threshold cohorts are omitted entirely rather than reported. - "Where do the inputs come from?" — Uploaded/collected survey responses are the data source. There is no external feed here: the science is in the elicitation design and the scoring, not in a third-party dataset. Items can carry
externalMetadata(rid/sid/studyId) so a question can be tied back to a reincarnation item or an external index.
Differentiation beat: the practitioner's real question isn't "what's the average rating" — it's "what would this group trade away, and can I trust the order?" Forced-choice elicitation produces the order; the Wilson band and the anonymity gate tell you when to trust it and when to keep quiet.
Visual — Tier A (real computed output, MaxDiff scoring path). From the scoreMaxDiff core — three benefit items over four tasks, illustrative best/worst counts; score = ((best − worst)/appearances + 1) / 2 × 100:
POST /api/spokes/preference-modeler/maxdiff/score
→ scores: [
{ "itemId": "remote", "score": 87.5, "bestCount": 3, "worstCount": 0, "appearanceCount": 4 },
{ "itemId": "bonus", "score": 50.0, "bestCount": 1, "worstCount": 1, "appearanceCount": 4 },
{ "itemId": "training", "score": 12.5, "bestCount": 0, "worstCount": 3, "appearanceCount": 4 }
]
(Values follow the core's exact formula; counts are an illustrative single-respondent scenario.)
4. What does it enable?
Concrete uses a practitioner would recognize:
- Rank benefits / total-rewards trade-offs — run a MaxDiff over a benefits menu and learn that "remote work" beats "annual bonus" beats "training budget" as a ranked scale, not ten ties.
- Price attribute bundles with conjoint — estimate per-level utilities (e.g. how much a title bump is worth versus a base increase) via the MNL fit, normalized within attribute.
- Capture change intent — present-vs-future dual markers surface where a team thinks the org is versus where they want it, flagging dimensions with a material desired shift (
changeFlaggedwhen|changeIntent| > 10). - Segment-safe insight — read preference weights
?bySegment=trueto compare cohorts, with every under-threshold cohort suppressed automatically. - Allocate a fixed budget — penny-allocation forces respondents to spend a finite pot, exposing real priorities the way a 1–5 scale never will.
- Feed a downstream decision model — emit ranked, weighted preferences into a compensation or analytics-planning loop (see §5).
Visual — (TBD — a rendered ranked-weights bar with Wilson confidence bands for one survey question).
5. How it fits in the toolbox
Data flow:
- Consumes — collected survey responses (its own
preference_modelerschema) and, by type-only reference, segmentation-studio segment node IDs: thebySegmentbreakdown keys align toSegmentMembership.nodeId, so cohorts resolved by segmentation-studio are the cohorts preferences can be sliced by. - Cross-spoke wiring — questions carry optional
externalMetadata(rid/sid/studyId), the hook by which a reincarnation item or external index attaches to a preference question. Types only; no internal imports. - Emits —
PreferenceWeightsResponse(ranked weights + Wilson CI + anonymity block, discriminatedok|blocked), plus the standalone instrument outputsMaxDiffTasks/MaxDiffScores/PresentFutureResult. Consumers vendorsrc/spokes/preference-modeler/contracts/types.ts. - PA Instruments it exposes — three of the toolbox's composable elicitation primitives live here: MaxDiff generate, MaxDiff score, and present↔future markers (each tagged
PA Instrument —in the registry, stateless, no DB round-trip). These compose into Products — the AnyComp decision layer is the priority-elicitation consumer that asks "which comp objectives matter most." - Feeds — Performix is the first registered consumer (
PFX-4, in-progress). vela is a future consumer.
Visual — Tier B (typographic data-flow). survey responses → Preference Modeler { elicit · score · anonymity-gate } → ranked weights + CI, with segmentation-studio cohorts as the slicing dimension and AnyComp / Performix as the downstream consumers.
6. Commercialization / packaging
Preference Modeler is a service component, not a standalone product — it is the elicitation-and-aggregation engine that buyer-facing decision surfaces compose (the AnyComp decision layer's priority step is the clearest example).
- Data-license posture: the inputs are the customer's own survey responses — there is no licensed third-party dataset embedded, so the licensing constraints that apply to comp-survey or government data elsewhere in the toolbox do not apply here. Anonymity protection is a product feature (the enforced min-N gate), not a license term.
- Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.
Visual — (TBD — product-tier placement diagram showing Preference Modeler under the AnyComp decision surface).
7. The vision
Make trade-offs the unit of measurement — every "which matters more" question in the portfolio answered with a forced-choice instrument, scored into one trustworthy ranked scale, and never surfaced when the cohort is too small to stay anonymous.
The near-term direction is to deepen the conjoint leg (the README flags conjoint task generation — balanced design for adaptive presentation — as a remaining follow-up; the conjoint_tasks table exists but the generator isn't wired, whereas the MNL aggregation already is) and to broaden the stateless PA-Instrument set so more Products can elicit priorities directly. Tighter coupling into the universal feedback loop — where preference signals feed back into what content/options each person sees next — is a portfolio-level aim, (TBD) here.
Visual — (TBD — instrument-coverage map: which elicitation types are wired end-to-end vs. follow-up).
8. Current status
Grounded in the real code state (contract 1.6.0, src/spokes/preference-modeler/):
- Shipped: survey create/get/respond with per-question shape + range validation; anonymity-gated preference aggregation (
{ ok | blocked }) with optional?bySegment=truecohort breakdowns; Wilson-score confidence intervals; aggregation for maxdiff, penny_allocation, paired_comparison, and conjoint (MNL fit, since 1.2.0); deterministic MaxDiff/Conjoint task materialization per respondent (1.3.0); tenant scoping on writes and reads (1.4.0); and three stateless PA Instruments — MaxDiff generate + score (1.5.0) and present↔future markers (1.6.0). Live routes:POST /surveys,GET /surveys/[id],POST /surveys/[id]/responses,POST .../respondents/[id]/tasks,GET .../preferences,POST /maxdiff/generate,POST /maxdiff/score,POST /present-future,GET /health. MCP tools registered. A demo seed (drizzle/0003_pat3_seed.sql+0010) returns aggregated weights immediately on a fresh deploy. - In flight / planned: conjoint task generation (balanced BIBD design for adaptive presentation) — table present, generator not yet wired (README follow-up). The spoke README's header still cites
CONTRACT_VERSION = "1.1.0", which is stale — the contract file and registry are at1.6.0(a doc-drift fix, not a code gap).
Visual — Tier A (live capture). GET /api/spokes/preference-modeler/health reports { spoke, status, contractVersion: "1.6.0", schemaReachable, latencyMs, checkedAt } at request time.
Worked example (real core output): the present-vs-future scenario in §1 is the actual output of aggregatePresentFuture for five raters across two dimensions. On Decision speed the team places itself at an average of 37.6/100 today and wants 68.4 — a changeIntent of +30.8, well past the ±10 material-change threshold, so changeFlagged: true; the cross-rater stdDev of ~5 says they broadly agree on both. On Risk appetite present 55.0 → future 58.8 is a changeIntent of +3.8, below threshold (changeFlagged: false): the team is roughly where it wants to be. A leader reads this as "we are aligned that decision speed must change materially, and we don't need to touch risk appetite" — a prioritized, dispersion-aware answer no 1–5 rating could give. Every number here is computed by the spoke's own core; the rater markers are an illustrative scenario, clearly labeled as such.