Survey Orchestrator — plain-language explainer

Survey Orchestrator decides who gets asked what survey when — and routes each program to the right measurement engine — without ever owning the questions or the answers.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. This is an honest status read: Survey Orchestrator is a scout / scaffold slice (PAT-D2), not a live spoke. Only a feasibility memo and a contract sketch exist today (src/spokes/survey-orchestrator/, contract 0.1.0). There is no database schema, no API route, no MCP module, and no entry in src/lib/contracts/registry.ts. Everything that is not yet built is marked (TBD / not yet built) — most of this document is, by design.

1. What is it?

Survey Orchestrator is the program-of-programs scheduler for survey workloads: it maps operator intent — the Big Annual engagement census, quarterly pulses, random-sample micro-surveys, lifecycle touchpoints (post-offer, onboarding, exit), and voluntary longitudinal panels — onto calendar windows, event triggers, targeted audiences, per-respondent fatigue limits, and a measurement engine to actually run the instrument.

Critically, it is not a survey engine. The toolbox already has those: preference-modeler for conjoint / forced-choice / penny-allocation elicitation, and reincarnation for adaptive psychometric measurement. Survey Orchestrator sits above them. It selects an engine per program, materializes the audience, hands off to the engine, and stores only pointers to the engine-side session — never a second copy of the response payload.

Visual — Tier D. (TBD — a scope diagram showing the orchestrator as a conductor above the preference-modeler and reincarnation engines. No live surface exists to capture; this is a scaffold slice.)

2. What problem does it solve — and why is it different?

The pain it removes: in most organizations survey cadence is run by hand — a spreadsheet of who-got-asked-what-when, a calendar invite for the annual census, and no shared memory of how many times any one person has already been pinged this quarter. Two well-meaning teams launch overlapping pulses; a candidate gets an exit survey and a quarterly engagement survey in the same week; longitudinal panels and one-off micro-surveys compete for the same inbox.

The difference, stated as a shift:

FROM survey scheduling as scattered, per-team calendars with no respondent-level throttle and no shared engine.
TO a single program catalog where cadence, triggers, audience, and a fatigue budget are first-class — and where each program declares which measurement engine runs it.

How it would differ from the obvious substitutes:

vs. a generic survey tool's scheduler (Qualtrics/SurveyMonkey workflows) — those schedule one tool's surveys. The orchestrator is engine-agnostic: a single program can route to adaptive psychometrics (reincarnation) or preference elicitation (preference-modeler), and the fatigue budget spans all of them, not one vendor's silo.
vs. doing it by hand — the per-respondent rolling-window fatigue cap and the priority-based contention resolver (a post-offer touchpoint preempting a low-priority quarterly pulse) are exactly the bookkeeping that hand-run calendars get wrong.

Visual — Tier B (FROM→TO typographic block). The shift above is the visual. A rendered comparison block is a follow-up once the slice is lifted.

3. How does it work?

Inputs → method → outputs, as sketched in the contract (contracts/types.ts, 0.1.0) — none of this is wired to a running endpoint yet.

Inputs: a SurveyProgram (cadence: annual | quarterly | event | longitudinal; engine: preference-modeler | reincarnation | hybrid; lifecycle status: draft → scheduled_ready → active → paused → retired), a Cadence (a cron-style expression + timezone + holiday policy + blackout ranges + optional dispatch jitter), one or more Triggers (source: ats | hris | manual, an opaque eventType, optional delayMs), AudienceRules (a segmentId plus opaque include/exclude rule strings), and a FatigueRule (a tenant default maxSurveysPerWindow over windowDays, with perProgramOverrides carrying a priority for contention).
Method (planned, per the SCOUT memo):
1. A cadence fires (calendar) or a trigger fires (an ATS/HRIS webhook, patterned on the existing wage-compliance Greenhouse webhook: verify signature → normalize → correlate to a Trigger).
2. Materialize the audience by calling segmentation-studio to resolve the segmentId and rules into a concrete set of principals — never by cross-importing segmentation's internals.
3. Apply the fatigue budget — count each principal's completed orchestrated touches inside the rolling window; drop or defer anyone over cap, with priority breaking ties so lifecycle signals preempt routine pulses.
4. Dispatch to the engine — authenticate with the service key, POST to the chosen engine's documented surface, and store the returned session id as the Wave's engineInstanceId.
5. Track completion via pointers — wave_responses rows hold only { waveId, principalId, engineResponseId, completedAt }. The engine stays canonical for questions, invitations, reminders, scoring, and anonymity thresholds.
Outputs: a Wave (one dispatch occurrence: scheduledAt, an AudienceSnapshot with a stable snapshotHash, the engineInstanceId, and a lifecycle status from scheduled → materializing_audience → dispatch_pending → live → collecting → closed_complete, plus cancelled / failed), and pointer rows linking principals to engine responses.

Data sources named honestly: the orchestrator itself stores no PII rows. It consumes uploaded HRIS/ATS event data (via triggers), segmentation-studio resolved cohorts (audiences), and emits routing metadata downstream of collection where data-anonymizer applies the privacy floor. It does not consume Principia priors, BLS, O*NET, or NAICS — those belong to other spokes; orchestration is plumbing, not measurement.

Differentiation beat: the practitioner's real question isn't "what's in the survey" — it's "are we over-asking this person, and is the right tool running each program?" The fatigue budget and the per-program engine field answer exactly that, across engines, in one place.

Visual — Tier B (step flow). Cadence/Trigger fires → segmentation-studio resolves audience → fatigue budget filters principals → engine POST (preference-modeler | reincarnation) → Wave records engineInstanceId + pointer rows.

4. What does it enable?

Concrete uses a practitioner would recognize — all planned, none shippable today:

Run a Big Annual engagement census on a fixed calendar window with a holiday-aware schedule and blackout ranges, without colliding with in-flight pulses.
Fire lifecycle touchpoints off ATS/HRIS events — a post-offer survey on candidate.hired, an exit survey on a termination event — with an optional SLA delay before dispatch.
Protect respondents from over-surveying with a tenant-wide rolling-window cap, while letting voluntary longitudinal panels opt into a heavier load via per-program overrides.
Resolve contention deterministically — when a high-priority lifecycle survey and a routine quarterly pulse both want the same person in the same window, priority decides who asks.
Route a single program to the right engine — adaptive psychometrics for a measurement program, preference elicitation for a tradeoff program, or hybrid for both in one touch.
Audit which engine session belongs to which wave and principal via the pointer rows, without the orchestrator ever holding response payloads.

Visual — Tier D. (TBD — a program catalog screenshot. No operator surface is built.)

5. How it fits in the toolbox

Data flow, as designed in the SCOUT memo — the dependency directions are real even though the wiring is not:

Would consume — segmentation-studio (audience resolves; contract import is acceptable, internal-module imports are not), preference-modeler and reincarnation (dispatch targets, returning the session ids stored as engineInstanceId), and uploaded ATS/HRIS event streams (triggers). It would borrow the wage-compliance Greenhouse-webhook ergonomics (verify → normalize → enqueue) as a precedent, not a code import.
Would feed — data-anonymizer, downstream of collection, as the privacy floor on any rollup; and operator survey-program dashboards.
The contract it would emit — SurveyProgram, Cadence, Trigger, AudienceRule, FatigueRule, Wave, and SurveyOrchestratorResponsePointer, today sketched in src/spokes/survey-orchestrator/contracts/types.ts (the one type it already imports from a sibling is SegmentMembership["nodeId"] from segmentation-studio). Consumers would vendor this contract once it hardens past 0.1.0.

Visual — Tier B (typographic data-flow). ATS/HRIS triggers + segmentation-studio cohorts → Survey Orchestrator (cadence · fatigue · engine routing) → { preference-modeler | reincarnation } → pointer rows → data-anonymizer floor.

6. Commercialization / packaging

Survey Orchestrator is positioned as an internal scheduling and routing layer beneath survey-driven offerings, not a standalone product — it is the conductor, not the instrument the buyer hears.

Data-license posture: it stores no response payloads and no PII rows by design (pointers only), which keeps the privacy and licensing surface with the engines and with data-anonymizer, where it belongs.
Everything about pricing, tiers, or packaged offerings is (TBD) — and especially so here: the slice is pre-lift, so any commercialization claim would be unearned.

Visual — Tier D. (TBD — product-tier placement diagram. Not earned at scout stage.)

7. The vision

A single conductor for every survey a workforce ever receives — one place where cadence, triggers, audiences, fatigue, and engine choice are coordinated, so the right person is asked the right thing at the right time, exactly as often as is fair to them.

The intended path is the standard spoke-onboarding lift (PAT-D2-B): land the survey_orchestrator Postgres schema (survey_programs, program_cadences, triggers, audience_rules, fatigue_rules, waves, wave_responses, heartbeat), the webhook and dispatch routes under /api/spokes/survey-orchestrator/*, an MCP module, and registry + health-aggregate registration — then harden the contract past 0.1.0. The SCOUT memo flags the real open forks to resolve at lift: audience-snapshot freezing vs. late joiners, program cancellation semantics (cancelled-status + best-effort engine revoke), reminder ownership (engine-local), synchronous vs. queued trigger evaluation, and how a hybrid program counts against the fatigue budget (one touch or two).

Visual — Tier D. (TBD — the lift roadmap from scout to live spoke.)

8. Current status

Grounded in the real code state (src/spokes/survey-orchestrator/, contract 0.1.0):

Shipped: nothing operational. A feasibility memo (SCOUT.md), a README naming role + planned schema, and a contract sketch (contracts/types.ts) with Zod schemas for programs, cadences, triggers, audience rules, fatigue rules, waves, and pointer rows. The contract already imports SegmentMembership from segmentation-studio, fixing the audience-resolve direction.
Not built: no survey_orchestrator schema or migration, no API routes, no MCP module, no registry entry, no health route, no health-aggregate row. It is not in the canonical live-spoke set (status: "live" in src/lib/contracts/registry.ts). Per AGENTS.md it must pass the full spoke-onboarding checklist (check:spokes --scaffold) before being counted as live.
In flight / planned: the PAT-D2-B full lift, per the SCOUT memo and §7 above.

Visual — Tier B (in-repo reference). The truth of this section is the folder itself: src/spokes/survey-orchestrator/ contains exactly README.md, SCOUT.md, and contracts/types.ts — no db/, api/, core/, mcp/, or tests/.

Worked example (clearly labeled — illustrative, grounded in the real contract shapes)

There is no running endpoint and no seed data, so this cannot be a live capture. The following is an illustrative scenario built strictly from the contract types in contracts/types.ts (0.1.0); the field names and enum values are real, the data values are illustrative placeholders, not measured numbers.

A quarterly engagement pulse, routed to the adaptive engine, with a fatigue budget:

SurveyProgram — { programId: "prog_q_pulse", name: "Quarterly Engagement Pulse", cadence: "quarterly", engine: "reincarnation", enabled: true, status: "active" }.
Cadence — { cadenceId: "cad_q1", programId: "prog_q_pulse", cronExpr: "0 9 1 1,4,7,10 *", timezone: "America/New_York", holidayPolicy: "observed_us_federal", dispatchJitterMinutes: 30 } (fire 9am on the 1st of each quarter, federal-holiday aware, ±30-minute jitter so micro-surveys don't herd at top-of-hour).
FatigueRule — { fatigueRuleId: "fat_default", tenantScope: "tenant_default", windowDays: 30, maxSurveysPerWindow: 2 } (no one gets more than two orchestrated touches in any 30-day window).
What the orchestrator would do: cadence fires → call segmentation-studio to resolve the audience → drop anyone already at two completed touches in the last 30 days → POST the survivors to reincarnation → record a Wave { waveId: "wave_q1_2026", status: "live", engineInstanceId: "rid_session_…", audienceSnapshot: { snapshotHash: "…" } } → as people finish, write SurveyOrchestratorResponsePointer rows { waveId, principalId, engineResponseId, completedAt } — pointers only, never the answers.

The point of the example is the contract shape, not the values: a program declares its engine and cadence, the orchestrator filters by fatigue and dispatches, and the wave holds an opaque engine-session pointer rather than a copy of the response. No figure here is presented as a real measurement.