Spoke

segmentation-studio

stitching, done right — canonical fields, identity resolution, versioned packs.

Character

A PA professional stitching together data from multiple HRIS sources — or one HRIS that renames columns every release. You spend more time on data prep than on analysis.

Problem

External. Every HRIS uses different column names. Workday calls it Employee Status; ADP calls it Active/Inactive; SuccessFactors uses Employment Status. Multi-source joins are heroic Excel exercises. Segment definitions drift across teams — "engineering and west" means different things to different people.

Internal. You're the one who knows the columns. That knowledge isn't in any system; it's in your head, and it walks out the door if you leave.

Philosophical. Canonical-field normalization is a platform problem, not an analyst problem. It should be solved once, versioned, and consumed by every downstream tool.

Guide

Segmentation-studio translates every HR vendor's column soup into stable canonical keys, resolves identities across uploads, persists Workday/Bamboo sync when you ask it to, evaluates declarative workbook rules (Config_Segmentation parity), and versions schemas so cohort logic is immutable like code. Demographic-ratio segments plus six PAT-156 canonical enrichment fields now ship beside the flagship 35-field catalog. MCP + HTTP parity keeps agents and analysts on identical contracts.

segmentation-studio — HRIS canonical-field normalization plus multi-membership segmentation. Cross-source joins carry column lineage plus conflict disclosures; hierarchical canonical segments expose stable ids; segmentation packs remain vendorable artefacts for downstream systems.

Abstract

Background. HRIS schemas diverge by vendor release; segmentation rot is inevitable unless canonical keys, membership algebra, and auditable versioning live in substrate services analysts do not rewrite every quarter.

Methodology. A scored priority catalogue maps headers to enumerated canonical fields. Multi-membership segmentation evaluates composable predicates over materialised nodes. Supplemental connectors persist tenant datasets; PAT-156 extends the global field registry with cohort-comparison constructs; declarative rules mirror spreadsheet engines for repeatability; SegmentationSchemaVersion rows provide branch/diff/snapshot semantics.

Scope. Resolves segments only—does not price compensation (anycomp) or prove causality (program-evaluation). Calculated psychological segments progressively gain resolver parity per PAT-63 follow-ons.

Contribution. Stateless HTTP + MCP share Zod contracts; semver-like schema tooling gives analytics engineers the governance language already normal for APIs.

Evidence / Provenance. Donor lift plus PAT-catalogued migrations (e.g., drizzle/0087_pat156_*, demographic-ratio seeds) recorded in CHANGELOG and README.

Plan

  1. 01

    Ingest + sync HRIS rows

    Use segmentation-studio.hris-ingest or PAT-65 Workday / Bamboo sync routes with tenant context + service keys; responses include run ids for audit trails.

  2. 02

    Resolve identities + joins

    identity-resolve clusters sources; data-join/run merges supplements with OVERWRITE / IGNORE / FILL_HOLES policies and overlap reports.

  3. 03

    Author declarative + classic segments

    Manage PAT-160 declarative rules via declarative-segmentation/* APIs and continue to call segments-define / cohorts-resolve for multi-membership cohorts.

  4. 04

    Pin immutable definitions

    Branch or snapshot schemas/* versions and/or publish packs so finance, TA, and comp quote the same segmentation moment.

Call to Action

Direct. Try the API. Ingest a sample HRIS file free.

Transitional. Read the canonical-field methodology (Phase 3). Hit the catalog inline.

Spoke I/O (visual language v1)

Every toolbox spoke shares the same abstract choreography: typed inputs on the left, distilled verbs in the center, typed outputs on the right, and (when relevant) cross-spoke HTTP composition along the bottom rail. Source package: @people-analytics-toolbox/spoke-illustrations.

Segmentation studioINPUTSMAIN ACTIONSOUTPUTSHRIS roster rowsCanonicalRow[]Segment definitionsSegmentRulePackNormalize fieldsResolve membershipsMulti-membership graphMembershipEdge[]COMPOSES WITHcalculusdata-anonymizer

Try it now

Copy this curl. Paste in any terminal. POST endpoint — set TOOLBOX_SERVICE_KEY in your shell first.

segmentation-studio.cohorts-resolve

POST

SERVICE KEY REQUIRED

Resolve a multi-membership cohort from criteria. Returns memberIds + segmentNodeIds for downstream joins.

curl -sS -X POST "https://people-analytics-toolbox.vercel.app/api/spokes/segmentation-studio/cohorts/resolve" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOOLBOX_SERVICE_KEY" \
  -d '{
  "tenantId": "demo",
  "criteria": {
    "any": [
      { "field": "function", "op": "eq", "value": "engineering" }
    ]
  }
}'

Vendor the contract

The Zod contract is the source of truth. Vendor a copy into your consumer app — you keep it; we don't break it underneath you. Re-vendor when the version bumps.

// In your consumer app:
import { z } from "zod";

// Vendor a copy of these contracts from the toolbox repo at:
//   src/spokes/segmentation-studio/contracts/types.ts

import {
  CanonicalFieldsListResponseSchema,
  HrisIngestRequestSchema,
  HrisIngestResponseSchema,
  ResolveCohortRequestSchema,
  ResolveCohortResponseSchema,
  PublishPackResponseSchema,
  CONTRACT_VERSION,
} from "./vendored/segmentation-studio/types";

// Then call the toolbox over HTTP or MCP.
// See docs/EXTERNAL-CONSUMERS.md for onboarding.

Source path: src/spokes/segmentation-studio/contracts/types.ts · GitHub

Failure

Demographic-ratio narratives ship without numerator discipline; spreadsheets return because Postgres-backed Config_Segmentation parity was never trusted. Schema drift between People Analytics and finance persists when versions live in Slack screenshots.

Success

Composable cohort logic with mathematically reproducible versioning—engineering, TA, finance, and comp share the same labelled slice instead of arguing over parallel dashboards.