Worker Resolution — plain-language explainer

Worker Resolution takes the scattered, inconsistently-labeled HR spreadsheets that describe the same people and stitches them into one keyed record per person — without letting a stray tab overwrite payroll truth.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/worker-resolution/, contract 1.0.0, status: "live" in src/lib/contracts/registry.ts); anything not yet built is marked (TBD).


1. What is it?

Worker Resolution is a stateless identity-resolution service: hand it several HR source files — a core HRIS export, a terminations list, a survey extract, a payroll tab — and it returns one master identity per real human, the merged column values for that person, and a full account of how it got there.

The job it does is the one every people-analytics project quietly burns days on first: deciding that E900 in the HRIS, ada@performix.fixture in the survey tool, and "Ada L." in the comp planning sheet are the same employee — and merging their columns without clobbering the fields you can least afford to get wrong.

Visual — Tier A (live API capture). A real request/response pair from the spoke's own integration fixture (src/spokes/worker-resolution/integration/performix/fixtures/), the exact shape POST /api/spokes/worker-resolution/resolve returns:

POST /api/spokes/worker-resolution/resolve
{
  "runId": "performix-sample-run",
  "matchLadder": ["employee-id", "work-email", "worker-name"],
  "admissionPolicy": "strict-anchor",
  "sources": [
    { "sourceId": "anchor", "role": "anchor-active",
      "rows": [{ "Employee ID": "E900", "Email": "ada@performix.fixture", "Department": "Finance" }] },
    { "sourceId": "supplemental-payroll", "role": "supplemental",
      "rows": [{ "Email": "ada@performix.fixture", "Territory": "EMEA" }] }
  ]
}
→ {
    "masterIdentities": [
      { "Employee ID": "E900", "Email": "ada@performix.fixture",
        "Department": "Finance", "Territory": "EMEA" }
    ],
    "runStats": { "sourcesProcessed": 2, "rowsIn": 2,
                  "masterIdentitiesOut": 1, "failuresCount": 0,
                  "methodCounts": { "employee-id": 1, "work-email": 1, "worker-name": 0 } }
  }

(Real fixture I/O. Two files, two rows, one human — joined on work email when no shared ID existed.)

2. What problem does it solve — and why is it different?

The pain it removes: core HR dumps, surveys, exit lists, ops extracts, and comp-planning sheets all describe the same humans but disagree on column headers, key on different identifiers, and contradict each other — and the glue logic that reconciles them is usually a brittle spreadsheet macro that drifts silently and occasionally lets a stale tab overwrite a calibrated rating.

The difference, stated as a shift:

  • FROM a hand-maintained join macro where the matching logic, the header aliases, and the "don't overwrite pay" rules are all tangled together and invisible — and where a wrong merge surfaces months later as a misdiagnosed analytics problem.
  • TO a single deterministic call that returns the merged record plus the lineage of every column, every alias it learned, and a roster of every row it could not place and why — so the join is auditable instead of trusted-on-faith.

How it differs from the obvious substitutes:

  • vs. doing it by hand in a spreadsheet — Worker Resolution normalizes the headers (Worker ID / Person Number / EE ID all collapse to Employee ID), matches on a configurable identifier ladder, protects the fields you designate, and hands back a failures roster instead of dropping unmatched rows on the floor.
  • vs. a generic ETL / BI join — a generic join treats every source as equal and the last write wins. Worker Resolution ranks sources by role (anchors are trusted; supplementals fill gaps), and it refuses to let a supplemental file overwrite a vault-locked HR fact unless you explicitly allow it.

Visual — Tier B (FROM→TO typographic block). four disagreeing spreadsheets → Worker Resolution → { one keyed record per person · column lineage · aliases learned · failures roster }. A rendered comparison block is a follow-up (FU-A).

3. How does it work?

Inputs → method → outputs, concretely — the questions a practitioner actually asks:

  • Input: a ResolveRequest — a runId plus one or more SourceFiles, each carrying its rows, a role (anchor-active, anchor-terminated, or supplemental), and optional keyHints. Optional knobs: a matchLadder, an aliasMap, fieldConfigs, and an admissionPolicy.

  • Method — a two-pass self-healing join (resolveSources in core/resolve.ts):

    1. Header normalization. Every row's columns pass through the alias map (DEFAULT_WORKER_ALIAS_MAP, mirrored from the donor runMasterETL Apps Script plus common vendor synonyms) so Worker ID, Person Number, and Empl ID all resolve to one canonical Employee ID. Identifier and lifecycle columns are then picked per source by a lightweight scored header match (fuzzy-identifiers.ts), respecting any keyHints you supply.
    2. Pass 1 — learning index. Anchor sources bootstrap trusted identities (each gets a mk:<uuid> master key). Supplemental sources then teach the index new identifier edges — a payroll tab that knows an email→ID bridge the anchor never saw — iterating to a fixpoint (capped at 48 passes) so a chain of partial overlaps still converges. This is the snowball indexing step; it propagates bridges but never mints new identities unless you opt in.
    3. Pass 2 — deterministic merge. Each row is matched down the ladder (employee-id → work-email → worker-name, lowercased and trimmed) and merged onto its master, governed by an HRIS vault: anchor-written columns are locked, and a supplemental write to a locked column is blocked (and logged as a vault-blocked failure) unless that column is on the vault-exception allowlist or you set its fieldConfig action to OVERWRITE. The default action is FILL_HOLES — supplementals fill blanks, never overwrite. A rehire guard skips a stale termination overlay when the master's hire date is newer than the incoming termination date.
  • Output: a ResolveResponsemasterIdentities (one merged record per person), lineage (per column: where it was first seen and every source it appeared in), aliasesLearned, a failures roster (each with a reason: unidentifiable, unmatched-but-identified, vault-blocked, or policy-skipped), and runStats.

Science / provenance backing: the matching logic, alias map, vault-exception allowlist (TTC, compa-ratio, calibrated/manager ratings, promotion flag), and rehire guard are mirrored from a production Google Apps Script ETL pipeline (runMasterETL / ETL_Pipeline.gs) that ran real FiveTran-era HR workbooks — this is a hardened lift of working glue logic, not a from-scratch heuristic. (The donor script is not yet vendored in-repo for literal diff; noted in the CHANGELOG.)

Differentiation beat: the practitioner's real question isn't "did it join" — it's "can I trust this merged record, and prove how it was built?" The lineage, aliasesLearned, and reason-coded failures answer that directly: every cell traces to a source, and every row that didn't make it in says why.

Visual — Tier B (admission-policy ladder). strict-anchor (the safe default — only anchor-known people get a record; unmatched supplemental rows go to the failures roster as unmatched-but-identified) → snowball-admit (lets standalone supplemental cohorts mint their own identities) → review-queue (currently clamped to strict-anchor behavior — see status). A rendered diagram is a follow-up (FU-A).

4. What does it enable?

Concrete uses a practitioner would recognize:

  • Build the analysis-ready employee table. Join core HRIS + exits + survey extracts into one keyed Golden Record before any segmentation, pay-fairness, or engagement work begins.
  • Protect pay and rating columns. Let planning tabs overwrite only the explicit allowlist (TTC, compa-ratios, calibrated and manager ratings, promotion flag) while every other HR fact stays vault-locked against accidental supplemental writes.
  • Reconcile rehires correctly. Automatically skip a stale termination row when a newer hire date proves the person was reactivated — no manual exit-list scrubbing.
  • Audit a contested merge. When a downstream number looks wrong, the lineage shows which source supplied each column and the failures roster shows exactly which rows were dropped and why.
  • Self-heal across partial overlaps. Resolve a person who shares only an email with one file and only a name with another — the snowball index bridges the chain rather than producing two half-records.
  • Feed the org graph. Emit clean, keyed identities that org-graph materializes into time-bounded reporting/cost edges.

Visual — (TBD — a rendered lineage-trace view for a single master identity, columns → source provenance).

5. How it fits in the toolbox

Data flow:

  • Consumes — uploaded HRIS / survey / payroll / exit / planning files (rows + role + optional key hints). No external reference datasets (no BLS, O*NET, NAICS, or Principia priors); the resolution logic is intrinsic and the only "prior" is the alias map and vault-exception allowlist lifted from the donor ETL.
  • Emits — a ResolveResponse contract (master identities + lineage + aliases learned + failures + run stats). Consumers vendor src/spokes/worker-resolution/contracts/types.ts (contract 1.0.0).
  • Feedsorg-graph, which (per the registry) materializes org nodes and typed temporal edges "for (tenantId, snapshotId) from worker-resolution rows." Clean keyed identities are the upstream precondition for any spoke that needs one row per person: segmentation, manager-effectiveness rollups, pay-fairness analysis.
  • Stateless — the /resolve route runs the pure resolveSources core and returns; it persists nothing beyond the worker_resolution schema's heartbeat. Each call is self-contained and reproducible from its inputs.

Visual — Tier B (typographic data-flow). uploaded HR source files → Worker Resolution (/resolve) → { keyed master identities · lineage · failures } → org-graph + downstream per-person analytics.

6. Commercialization / packaging

Worker Resolution is a service component, not a standalone product — it is the data-hygiene primitive that the toolbox's people-analytics offerings stand on. It sits upstream of the buyer-facing surfaces (segmentation, pay-fairness, leadership index) rather than being sold on its own.

  • Data-license posture: no external licensed reference data is involved — the spoke operates only on the tenant's own uploaded HR files. The matching logic and alias map are toolbox IP lifted from a donor ETL; nothing here is vendor-survey-encumbered.
  • Privacy note: Worker Resolution does not itself anonymize — PII detection, tokenization, and the min-N gate live in data-anonymizer. Worker Resolution resolves identities; the privacy layer is a separate, composable step.
  • Pricing tiers and packaged offerings are (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram showing Worker Resolution as the upstream hygiene layer beneath the buyer-facing analytics surfaces).

7. The vision

One reliable keyed record per person from any pile of HR files — every merge auditable to its source, every dropped row explained, and the join logic configurable rather than buried in a macro.

The near-term direction is to close the parity loop with the donor (ETL_Pipeline.gs vendored for a literal diff so the alias map and vault rules are provably faithful), and to make the review-queue admission policy a real third behavior rather than a strict-anchor alias. Persisted runs, a reviewer UI for the failures roster, and tenant-scoped alias-map learning are natural extensions but are (TBD / not yet built).

8. Current status

Grounded in the real code state (contract 1.0.0, status: "live", src/spokes/worker-resolution/, CHANGELOG 1.0.0 — 2026-05-24):

  • Shipped: the full two-pass self-healing join (resolveSources) — header normalization via the donor-mirrored alias map, scored identifier/lifecycle header picking, snowball learning index to a fixpoint, deterministic ladder merge, HRIS vault with exception allowlist + FILL_HOLES/OVERWRITE/IGNORE field actions, rehire-aware termination guard, column lineage, aliases-learned log, and a reason-coded failures roster. Live routes: POST /api/spokes/worker-resolution/resolve (service-key gated) and GET /api/spokes/worker-resolution/health. MCP tools worker-resolution.resolve and worker-resolution.health registered. Stateless — worker_resolution schema carries a heartbeat only.
  • Caveats / in flight: the review-queue admission policy is currently clamped to strict-anchor behavior (clampAdmission in core/resolve.ts); the donor Apps Script is not yet vendored for a literal alias-map diff (CHANGELOG note); registered consumers (performix, vela) are status: "planned", so no production consumer is wired yet.
  • Planned / (TBD): persisted runs, a failures-roster reviewer surface, and tenant-scoped alias learning.

Visual — Tier A (live capture). GET /api/spokes/worker-resolution/health reports the spoke's real status, contract version, and schema reachability at request time.


Worked example used above is real fixture I/O from src/spokes/worker-resolution/integration/performix/fixtures/ (sample-request.jsonsample-response.json): two source files — an active-anchor HRIS row (Employee ID: E900, Email, Department: Finance) and a supplemental payroll row carrying only Email + Territory: EMEA — resolve to a single master identity. No shared ID existed between the files, so the merge matched on work email (methodCounts.work-email: 1), the supplemental Territory filled a hole on the anchor record (no vault block), and lineage records that Email appeared in both sources while Territory came only from payroll. No figure here is invented.