Worker Resolution — plain-language explainer
Worker Resolution takes the scattered, inconsistently-labeled HR spreadsheets that describe the same people and stitches them into one keyed record per person — without letting a stray tab overwrite payroll truth.
A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/worker-resolution/, contract 1.0.0, status: "live" in src/lib/contracts/registry.ts); anything not yet built is marked (TBD).
1. What is it?
Worker Resolution is a stateless identity-resolution service: hand it several HR source files — a core HRIS export, a terminations list, a survey extract, a payroll tab — and it returns one master identity per real human, the merged column values for that person, and a full account of how it got there.
The job it does is the one every people-analytics project quietly burns days on first: deciding that E900 in the HRIS, ada@performix.fixture in the survey tool, and "Ada L." in the comp planning sheet are the same employee — and merging their columns without clobbering the fields you can least afford to get wrong.
Visual — Tier A (live API capture). A real request/response pair from the spoke's own integration fixture (src/spokes/worker-resolution/integration/performix/fixtures/), the exact shape POST /api/spokes/worker-resolution/resolve returns:
POST /api/spokes/worker-resolution/resolve
{
"runId": "performix-sample-run",
"matchLadder": ["employee-id", "work-email", "worker-name"],
"admissionPolicy": "strict-anchor",
"sources": [
{ "sourceId": "anchor", "role": "anchor-active",
"rows": [{ "Employee ID": "E900", "Email": "ada@performix.fixture", "Department": "Finance" }] },
{ "sourceId": "supplemental-payroll", "role": "supplemental",
"rows": [{ "Email": "ada@performix.fixture", "Territory": "EMEA" }] }
]
}
→ {
"masterIdentities": [
{ "Employee ID": "E900", "Email": "ada@performix.fixture",
"Department": "Finance", "Territory": "EMEA" }
],
"runStats": { "sourcesProcessed": 2, "rowsIn": 2,
"masterIdentitiesOut": 1, "failuresCount": 0,
"methodCounts": { "employee-id": 1, "work-email": 1, "worker-name": 0 } }
}
(Real fixture I/O. Two files, two rows, one human — joined on work email when no shared ID existed.)
2. What problem does it solve — and why is it different?
The pain it removes: core HR dumps, surveys, exit lists, ops extracts, and comp-planning sheets all describe the same humans but disagree on column headers, key on different identifiers, and contradict each other — and the glue logic that reconciles them is usually a brittle spreadsheet macro that drifts silently and occasionally lets a stale tab overwrite a calibrated rating.
The difference, stated as a shift:
- FROM a hand-maintained join macro where the matching logic, the header aliases, and the "don't overwrite pay" rules are all tangled together and invisible — and where a wrong merge surfaces months later as a misdiagnosed analytics problem.
- TO a single deterministic call that returns the merged record plus the lineage of every column, every alias it learned, and a roster of every row it could not place and why — so the join is auditable instead of trusted-on-faith.
How it differs from the obvious substitutes:
- vs. doing it by hand in a spreadsheet — Worker Resolution normalizes the headers (
Worker ID/Person Number/EE IDall collapse toEmployee ID), matches on a configurable identifier ladder, protects the fields you designate, and hands back afailuresroster instead of dropping unmatched rows on the floor. - vs. a generic ETL / BI join — a generic join treats every source as equal and the last write wins. Worker Resolution ranks sources by role (anchors are trusted; supplementals fill gaps), and it refuses to let a supplemental file overwrite a vault-locked HR fact unless you explicitly allow it.
Visual — Tier B (FROM→TO typographic block). four disagreeing spreadsheets → Worker Resolution → { one keyed record per person · column lineage · aliases learned · failures roster }. A rendered comparison block is a follow-up (FU-A).
3. How does it work?
Inputs → method → outputs, concretely — the questions a practitioner actually asks:
-
Input: a
ResolveRequest— arunIdplus one or moreSourceFiles, each carrying itsrows, arole(anchor-active,anchor-terminated, orsupplemental), and optionalkeyHints. Optional knobs: amatchLadder, analiasMap,fieldConfigs, and anadmissionPolicy. -
Method — a two-pass self-healing join (
resolveSourcesincore/resolve.ts):- Header normalization. Every row's columns pass through the alias map (
DEFAULT_WORKER_ALIAS_MAP, mirrored from the donorrunMasterETLApps Script plus common vendor synonyms) soWorker ID,Person Number, andEmpl IDall resolve to one canonicalEmployee ID. Identifier and lifecycle columns are then picked per source by a lightweight scored header match (fuzzy-identifiers.ts), respecting anykeyHintsyou supply. - Pass 1 — learning index. Anchor sources bootstrap trusted identities (each gets a
mk:<uuid>master key). Supplemental sources then teach the index new identifier edges — a payroll tab that knows an email→ID bridge the anchor never saw — iterating to a fixpoint (capped at 48 passes) so a chain of partial overlaps still converges. This is the snowball indexing step; it propagates bridges but never mints new identities unless you opt in. - Pass 2 — deterministic merge. Each row is matched down the ladder (employee-id → work-email → worker-name, lowercased and trimmed) and merged onto its master, governed by an HRIS vault: anchor-written columns are locked, and a supplemental write to a locked column is blocked (and logged as a
vault-blockedfailure) unless that column is on the vault-exception allowlist or you set itsfieldConfigaction toOVERWRITE. The default action isFILL_HOLES— supplementals fill blanks, never overwrite. A rehire guard skips a stale termination overlay when the master's hire date is newer than the incoming termination date.
- Header normalization. Every row's columns pass through the alias map (
-
Output: a
ResolveResponse—masterIdentities(one merged record per person),lineage(per column: where it was first seen and every source it appeared in),aliasesLearned, afailuresroster (each with areason:unidentifiable,unmatched-but-identified,vault-blocked, orpolicy-skipped), andrunStats.
Science / provenance backing: the matching logic, alias map, vault-exception allowlist (TTC, compa-ratio, calibrated/manager ratings, promotion flag), and rehire guard are mirrored from a production Google Apps Script ETL pipeline (runMasterETL / ETL_Pipeline.gs) that ran real FiveTran-era HR workbooks — this is a hardened lift of working glue logic, not a from-scratch heuristic. (The donor script is not yet vendored in-repo for literal diff; noted in the CHANGELOG.)
Differentiation beat: the practitioner's real question isn't "did it join" — it's "can I trust this merged record, and prove how it was built?" The lineage, aliasesLearned, and reason-coded failures answer that directly: every cell traces to a source, and every row that didn't make it in says why.
Visual — Tier B (admission-policy ladder). strict-anchor (the safe default — only anchor-known people get a record; unmatched supplemental rows go to the failures roster as unmatched-but-identified) → snowball-admit (lets standalone supplemental cohorts mint their own identities) → review-queue (currently clamped to strict-anchor behavior — see status). A rendered diagram is a follow-up (FU-A).
4. What does it enable?
Concrete uses a practitioner would recognize:
- Build the analysis-ready employee table. Join core HRIS + exits + survey extracts into one keyed Golden Record before any segmentation, pay-fairness, or engagement work begins.
- Protect pay and rating columns. Let planning tabs overwrite only the explicit allowlist (TTC, compa-ratios, calibrated and manager ratings, promotion flag) while every other HR fact stays vault-locked against accidental supplemental writes.
- Reconcile rehires correctly. Automatically skip a stale termination row when a newer hire date proves the person was reactivated — no manual exit-list scrubbing.
- Audit a contested merge. When a downstream number looks wrong, the
lineageshows which source supplied each column and thefailuresroster shows exactly which rows were dropped and why. - Self-heal across partial overlaps. Resolve a person who shares only an email with one file and only a name with another — the snowball index bridges the chain rather than producing two half-records.
- Feed the org graph. Emit clean, keyed identities that
org-graphmaterializes into time-bounded reporting/cost edges.
Visual — (TBD — a rendered lineage-trace view for a single master identity, columns → source provenance).
5. How it fits in the toolbox
Data flow:
- Consumes — uploaded HRIS / survey / payroll / exit / planning files (rows + role + optional key hints). No external reference datasets (no BLS, O*NET, NAICS, or Principia priors); the resolution logic is intrinsic and the only "prior" is the alias map and vault-exception allowlist lifted from the donor ETL.
- Emits — a
ResolveResponsecontract (master identities + lineage + aliases learned + failures + run stats). Consumers vendorsrc/spokes/worker-resolution/contracts/types.ts(contract1.0.0). - Feeds —
org-graph, which (per the registry) materializes org nodes and typed temporal edges "for(tenantId, snapshotId)from worker-resolution rows." Clean keyed identities are the upstream precondition for any spoke that needs one row per person: segmentation, manager-effectiveness rollups, pay-fairness analysis. - Stateless — the
/resolveroute runs the pureresolveSourcescore and returns; it persists nothing beyond theworker_resolutionschema's heartbeat. Each call is self-contained and reproducible from its inputs.
Visual — Tier B (typographic data-flow). uploaded HR source files → Worker Resolution (/resolve) → { keyed master identities · lineage · failures } → org-graph + downstream per-person analytics.
6. Commercialization / packaging
Worker Resolution is a service component, not a standalone product — it is the data-hygiene primitive that the toolbox's people-analytics offerings stand on. It sits upstream of the buyer-facing surfaces (segmentation, pay-fairness, leadership index) rather than being sold on its own.
- Data-license posture: no external licensed reference data is involved — the spoke operates only on the tenant's own uploaded HR files. The matching logic and alias map are toolbox IP lifted from a donor ETL; nothing here is vendor-survey-encumbered.
- Privacy note: Worker Resolution does not itself anonymize — PII detection, tokenization, and the min-N gate live in
data-anonymizer. Worker Resolution resolves identities; the privacy layer is a separate, composable step. - Pricing tiers and packaged offerings are (TBD) — not earned yet, so not stated.
Visual — (TBD — product-tier placement diagram showing Worker Resolution as the upstream hygiene layer beneath the buyer-facing analytics surfaces).
7. The vision
One reliable keyed record per person from any pile of HR files — every merge auditable to its source, every dropped row explained, and the join logic configurable rather than buried in a macro.
The near-term direction is to close the parity loop with the donor (ETL_Pipeline.gs vendored for a literal diff so the alias map and vault rules are provably faithful), and to make the review-queue admission policy a real third behavior rather than a strict-anchor alias. Persisted runs, a reviewer UI for the failures roster, and tenant-scoped alias-map learning are natural extensions but are (TBD / not yet built).
8. Current status
Grounded in the real code state (contract 1.0.0, status: "live", src/spokes/worker-resolution/, CHANGELOG 1.0.0 — 2026-05-24):
- Shipped: the full two-pass self-healing join (
resolveSources) — header normalization via the donor-mirrored alias map, scored identifier/lifecycle header picking, snowball learning index to a fixpoint, deterministic ladder merge, HRIS vault with exception allowlist +FILL_HOLES/OVERWRITE/IGNOREfield actions, rehire-aware termination guard, column lineage, aliases-learned log, and a reason-coded failures roster. Live routes:POST /api/spokes/worker-resolution/resolve(service-key gated) andGET /api/spokes/worker-resolution/health. MCP toolsworker-resolution.resolveandworker-resolution.healthregistered. Stateless —worker_resolutionschema carries a heartbeat only. - Caveats / in flight: the
review-queueadmission policy is currently clamped tostrict-anchorbehavior (clampAdmissionincore/resolve.ts); the donor Apps Script is not yet vendored for a literal alias-map diff (CHANGELOG note); registered consumers (performix,vela) arestatus: "planned", so no production consumer is wired yet. - Planned / (TBD): persisted runs, a failures-roster reviewer surface, and tenant-scoped alias learning.
Visual — Tier A (live capture). GET /api/spokes/worker-resolution/health reports the spoke's real status, contract version, and schema reachability at request time.
Worked example used above is real fixture I/O from src/spokes/worker-resolution/integration/performix/fixtures/ (sample-request.json → sample-response.json): two source files — an active-anchor HRIS row (Employee ID: E900, Email, Department: Finance) and a supplemental payroll row carrying only Email + Territory: EMEA — resolve to a single master identity. No shared ID existed between the files, so the merge matched on work email (methodCounts.work-email: 1), the supplemental Territory filled a hole on the anchor record (no vault block), and lineage records that Email appeared in both sources while Territory came only from payroll. No figure here is invented.