Job Family Agent — plain-language explainer

Job Family Agent is the toolbox's canonical answer to "what is this job?" — it turns a messy job title or description into a standard occupation code, a job family, and (through the JobFrame canon) a governed Family × Focus × Level profile.

A People Analytics Toolbox component, consumed from meta-factory. The job-family capability was originally built in meta-factory-prod and folded into the toolbox as a live spoke in 2026-05-15 (PAT-71); the toolbox is now its canonical home. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/job-family-agent/, contract 1.2.0); anything not yet built is marked (TBD).


1. What is it?

Job Family Agent is a job-taxonomy service: give it a job title or a chunk of a job description, and it tells you which standard occupation it maps to, which job family and function it belongs to, and — for the deeper layer — which governed Family × Focus × Level profile best describes it.

It carries two things behind one HTTP + MCP surface. First, a canonical occupation registry: 1,016 SOC codes from O*NET 28.3 / SOC 2018, 23 job families, and 26 job functions, plus a classifier that resolves free text against them. Second, the JobFrame canon — a governed work-intelligence model where the central object is a profile keyed FAMILY.FOCUS.LEVEL (for example SWE.GEN.P6), with title aliases, components, and a mapping loop that resolves observed titles to those profiles.

Visual — Tier A (live core capture). A real call to the spoke's classifyJobText core (heuristic classifier), grounded in the bundled occupation registry:

POST /api/spokes/job-family-agent/classify
{ "text": "Senior Backend Software Engineer building distributed systems in Go",
  "context": { "industry": "Software" } }
→ {
    "jobFamily":   { "id": "jf.computer-mathematical", "confidence": 1 },
    "jobFunction": { "id": "jfn.engineering",          "confidence": 1 },
    "socMatches":  [ { "socCode": "15-1299.02", "confidence": 1 }, ... ]
  }

(Real output of classifyJobText against data/occupation_registry.json, O*NET 28.3. The full eight-SOC ranking — and what it reveals — is the worked example below.)

2. What problem does it solve — and why is it different?

The pain it removes: every HR system spells the same job a different way, and none of them agree with the labor-market data you want to join to. "Sr. SWE II," "Staff Engineer," and "Backend Developer" are the same work; the OEWS wage files, your survey vendor, and your HRIS each file them under a different label. Without a canonical spine, every cross-source join is a hand-built crosswalk that rots.

The difference, stated as a shift:

  • FROM free-text titles that don't reconcile — one per system, none mappable to market data, every join rebuilt by hand.
  • TO a single canonical address for each job — a SOC code, a family/function, and a FAMILY.FOCUS.LEVEL profile key — that every other spoke and every external data source can join on.

How it differs from the obvious substitutes:

  • vs. doing it by hand in spreadsheets — manual title-to-SOC mapping across thousands of HRIS rows is the single most tedious task in job architecture; Job Family Agent resolves the canonical occupation and family in single-digit milliseconds and keeps the mapping auditable.
  • vs. a generic BI tool or a one-off LLM prompt — those give you an answer with no governed taxonomy behind it and no stable id to join on. Job Family Agent's outputs are addresses into a canon (jf.computer-mathematical, SWE.GEN.P6) that the rest of the toolbox already speaks, and it is honest that today's classifier is a heuristic, not ML (see §3).

Visual — Tier B (FROM→TO typographic block). The shift above is the visual; a rendered comparison block is a follow-up.

3. How does it work?

Inputs → method → outputs, concretely, across the two layers:

  • Input: free text (a title and/or a job-description snippet, with optional industry/level context) for the classifier; or an observed title for the JobFrame resolve-title loop.
  • Method (taxonomy classifier): classifyJobText is a deliberately transparent token-overlap heuristic, not an ML or LLM classifier. It (1) detects any literal SOC code in the text via regex (an exact match scores 1.0 and overrides the rest); (2) scores the input's token set against all 1,016 SOC title+description blobs; and (3) scores it against the 23 family and 26 function blobs (name + definition + alternative strings). It returns the top-8 SOC candidates plus the best family and function.
  • Method (JobFrame canon): an observed title is normalized and matched against the canon's title-alias index to ranked Family × Focus × Level profile candidates, each with a confidence band and a recommendedAction (auto_accept | needs_review | ask_clarification | insufficient_evidence). Profiles can also be assembled top-down (construct) or exported as JSON/Markdown (export).
  • Output: for classify, a ClassifyResponse (socMatches[] + jobFamily + jobFunction); for the canon, a ResolveTitleResponse, a CanonicalProfile, or a JobExport.

Real data sources and the science backing. The occupation registry is O*NET 28.3 / SOC 2018 (1,016 codes, updated_at: 2025-11-30), licensed O*NET CC BY 4.0 + SOC public domain. The 23 families and 26 functions are the toolbox's canonical segmentation library, each with a verbatim definition and alternative-title strings. The JobFrame canon was rebuilt data-first from a deep-research corpus plus segmentation datasets (1,184 family×level profiles, verbatim content), with the Universal Level Framework (S1–S5, P1–P8, M1–M6, E1–E6) and source-level mappings drawn from Radford / Mercer / WTW ladders — not invented (docs/jobframe/REBUILD-PLAN.md).

Differentiation beat. The practitioner's real question is not "give me one answer" but "can I trust and defend this mapping?" Job Family Agent answers honestly on two fronts: the classifier is labelled a heuristic (not dressed up as AI), and the JobFrame loop never silently auto-accepts an ambiguous title — its recommendedAction routes thin evidence to human review rather than guessing.

Visual — Tier A (live core capture, the full ranking). See the worked example below — it shows the heuristic's real, imperfect SOC ranking and why the family/function answer is the trustworthy part today.

4. What does it enable?

Concrete uses a practitioner would recognize:

  • Normalize an HRIS title column — turn thousands of free-text titles into canonical SOC codes + families so downstream analysis isn't fighting spelling variants.
  • Join people data to labor-market data — a SOC code is the spine wage-benchmark and wage-compliance already use, so a resolved title flows straight into market pay and the legal floor.
  • Build a governed job architecture — assemble draft profiles top-down from Family × Focus × Level (construct) instead of writing every job description from a blank page.
  • Bulk-map an HRIS dataset with review — run a mapping pass, get ranked candidates per row, and accept-or-correct each into a tenant overlay (the HRIS mapping routes).
  • Resolve one messy title on demandresolve-title takes "Sr. SWE II" and returns ranked profile candidates with a recommended action, over HTTP or as an MCP tool.
  • Segment by what-work-someone-does — feed the canonical family/function ids into segmentation-studio so cohorts are built on a real job taxonomy, not org-chart proxies.

Visual — Tier B (typographic capability flow). messy title / JD → { classify · resolve-title } → { SOC code · jf.* family · jfn.* function · FAMILY.FOCUS.LEVEL profile } → joins into wage / segmentation / comp.

5. How it fits in the toolbox

Data flow and contracts:

  • Consumes — bundled O*NET 28.3 / SOC 2018 occupation registry, the canonical job-family / job-function segmentation library, and the JobFrame deep-research corpus (all in-repo under data/). No live external API on the hot path; the data is bundled JSON loaded at module init, so classify runs without I/O.
  • Emits — the HTTP + MCP contract at src/spokes/job-family-agent/contracts/types.ts (taxonomy + classify) and contracts/canonical-profile.ts (the JobFrame canon shapes). Consumers (Performix, vela, future apps) vendor a copy; they never import at runtime.
  • Feedswage-benchmark and wage-compliance (SOC is the join key for market pay and the legal floor); segmentation-studio (its canonical job-family / job-function dimensions reference these jf.* / jfn.* ids directly, no translation layer); anycomp (comp models keyed on SOC look up the occupation record + alternative titles); and reincarnation (a future cache use case for scoping items per family).
  • Consumers — Performix (PFX-5, in progress: post-PAT-71 it calls this toolbox for job-family classification, not meta-factory) and vela (planned).

Visual — Tier B (typographic data-flow). O*NET/SOC registry + family/function library + JobFrame corpus → Job Family Agent → { wage-benchmark · wage-compliance · segmentation-studio · anycomp }.

6. Commercialization / packaging

Job Family Agent is a service component, not a standalone product — it is the canonical job spine that compensation, planning, and segmentation offerings compose on top of. It sits behind buyer-facing surfaces (job architecture, pay tooling, the JobFrame canon UI) rather than being sold on its own.

  • Data-license posture: the occupation registry derives from O*NET (CC BY 4.0) and SOC (public domain) — usable with attribution, which is what lets the SOC layer be exposed openly. The toolbox/meta-factory editorial enrichments (family/function definitions, alternative strings) are first-party. The JobFrame canon separates the toolbox's own structure and definitions from any vendor-derived content; vendor-survey level ladders (Radford / Mercer / WTW) inform the mappings and carry their own licensing constraints handled in the canon's provenance.
  • Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.

Visual — (TBD — product-tier placement diagram).

7. The vision

One canonical address for every job — resolved from any messy title, joinable to any labor-market or people dataset, and governed as a living Family × Focus × Level canon rather than a frozen spreadsheet.

The arc runs from the shipped taxonomy + heuristic classifier toward the full JobFrame work-intelligence vision: the deep-research corpus fully seeded into the canon, the classifier upgraded from token-overlap heuristic to LLM-backed matching (PAT-67), the HRIS mapping loop closing review→overlay at scale, and a public "Google-Finance-for-jobs" surface on top. The governing discipline is data-first: build the canon from real corpora and source ladders, generate only at the edges. Full vision, build plan, and roadmap: docs/jobframe/.

Visual — (TBD — the JobFrame canon → mapping-loop → public-surface roadmap diagram).

8. Current status

Grounded in the real code state (registry status: "live", contract 1.2.0, src/spokes/job-family-agent/):

  • Shipped: the canonical occupation registry (1,016 SOC codes, O*NET 28.3), 23 families, 26 functions; the token-overlap heuristic classifier (classify, public + IP-rate-limited at 100 req/min); the full GET taxonomy surface (soc/list, soc/[code], families/list, families/[id], functions/list, functions/[id]); the JobFrame canon contracts + schema; resolve-title, construct, and export; the HRIS import / bulk-mapping / mapping-decision and single-JD analyze routes; JobFrame UI surfaces (/jobframe/canon, /mapping, /jd, /construct). MCP transport mirrors the HTTP surface (9 tools registered). The canon was rebuilt data-first and the level semantics corrected (P1 Entry … P8 Fellow) in the 1.2.0 bump.
  • In flight / planned: PAT-67 LLM-backed classifier (the heuristic is explicitly v1); full deep-research corpus seed promoted from draft to validated (294 net-new families · 415 focuses · 1,185 profiles built but operator-applied, all status: draft); the job_family_agent Postgres tables are reserved as a future cache surface (heartbeat-only seed today; data lives as bundled JSON); the public JobFrame site and governance console.

Visual — Tier A (live capture). GET /api/spokes/job-family-agent/health and /families/list report the real shipped taxonomy at request time.


Worked example (load-bearing, real output)

Input — a realistic HRIS title that no SOC code spells the same way:

classifyJobText({
  text: "Senior Backend Software Engineer building distributed systems in Go",
  context: { industry: "Software" }
})

Real output of the spoke's classifyJobText core against data/occupation_registry.json (O*NET 28.3), titles resolved from the same registry:

  • Job family: jf.computer-mathematical (confidence 1.0) — correct.
  • Job function: jfn.engineering (confidence 1.0) — correct.
  • Top-8 SOC matches (in ranked order):
    1. 15-1299.02 Geographic Information Systems Technologists and Technicians — 1.0
    2. 47-4011.01 Energy Auditors — 0.978
    3. 11-3021.00 Computer and Information Systems Managers — 0.938
    4. 15-1253.00 Software Quality Assurance Analysts and Testers — 0.804
    5. 15-2021.00 Mathematicians — 0.677
    6. 15-1252.00 Software Developers — 0.653
    7. 37-2019.00 Building Cleaning Workers, All Other — 0.65
    8. 31-2011.00 Occupational Therapy Assistants — 0.642

What a practitioner does with it — and what it honestly shows. The family and function are dead-on, and that is the part you join on for wage and segmentation work. The SOC ranking, by contrast, is noisy: the genuinely correct code, 15-1252.00 Software Developers, lands sixth, behind a GIS technologist and an energy auditor that share surface tokens ("systems," "building," "distributed"). This is exactly the documented limitation — the classifier is a token-overlap heuristic, not ML — and the explainer shows the real, imperfect ranking rather than a cherry-picked clean one. The practitioner takeaway: today, trust the family/function layer for joins, treat the SOC ranking as candidates for review, and route ambiguous titles through the JobFrame resolve-title loop (which returns a recommendedAction) rather than auto-accepting position #1. The PAT-67 LLM upgrade is the queued fix for the SOC-ranking noise. No figure here is invented — every confidence and title is real output of the shipped core.