Job Family Agent — plain-language explainer
Job Family Agent is the toolbox's canonical answer to "what is this job?" — it turns a messy job title or description into a standard occupation code, a job family, and (through the JobFrame canon) a governed Family × Focus × Level profile.
A People Analytics Toolbox component, consumed from meta-factory. The job-family capability was originally built in meta-factory-prod and folded into the toolbox as a live spoke in 2026-05-15 (PAT-71); the toolbox is now its canonical home. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the spoke's own code and contracts (src/spokes/job-family-agent/, contract 1.2.0); anything not yet built is marked (TBD).
1. What is it?
Job Family Agent is a job-taxonomy service: give it a job title or a chunk of a job description, and it tells you which standard occupation it maps to, which job family and function it belongs to, and — for the deeper layer — which governed Family × Focus × Level profile best describes it.
It carries two things behind one HTTP + MCP surface. First, a canonical occupation registry: 1,016 SOC codes from O*NET 28.3 / SOC 2018, 23 job families, and 26 job functions, plus a classifier that resolves free text against them. Second, the JobFrame canon — a governed work-intelligence model where the central object is a profile keyed FAMILY.FOCUS.LEVEL (for example SWE.GEN.P6), with title aliases, components, and a mapping loop that resolves observed titles to those profiles.
Visual — Tier A (live core capture). A real call to the spoke's classifyJobText core (heuristic classifier), grounded in the bundled occupation registry:
POST /api/spokes/job-family-agent/classify
{ "text": "Senior Backend Software Engineer building distributed systems in Go",
"context": { "industry": "Software" } }
→ {
"jobFamily": { "id": "jf.computer-mathematical", "confidence": 1 },
"jobFunction": { "id": "jfn.engineering", "confidence": 1 },
"socMatches": [ { "socCode": "15-1299.02", "confidence": 1 }, ... ]
}
(Real output of classifyJobText against data/occupation_registry.json, O*NET 28.3. The full eight-SOC ranking — and what it reveals — is the worked example below.)
2. What problem does it solve — and why is it different?
The pain it removes: every HR system spells the same job a different way, and none of them agree with the labor-market data you want to join to. "Sr. SWE II," "Staff Engineer," and "Backend Developer" are the same work; the OEWS wage files, your survey vendor, and your HRIS each file them under a different label. Without a canonical spine, every cross-source join is a hand-built crosswalk that rots.
The difference, stated as a shift:
- FROM free-text titles that don't reconcile — one per system, none mappable to market data, every join rebuilt by hand.
- TO a single canonical address for each job — a SOC code, a family/function, and a
FAMILY.FOCUS.LEVELprofile key — that every other spoke and every external data source can join on.
How it differs from the obvious substitutes:
- vs. doing it by hand in spreadsheets — manual title-to-SOC mapping across thousands of HRIS rows is the single most tedious task in job architecture; Job Family Agent resolves the canonical occupation and family in single-digit milliseconds and keeps the mapping auditable.
- vs. a generic BI tool or a one-off LLM prompt — those give you an answer with no governed taxonomy behind it and no stable id to join on. Job Family Agent's outputs are addresses into a canon (
jf.computer-mathematical,SWE.GEN.P6) that the rest of the toolbox already speaks, and it is honest that today's classifier is a heuristic, not ML (see §3).
Visual — Tier B (FROM→TO typographic block). The shift above is the visual; a rendered comparison block is a follow-up.
3. How does it work?
Inputs → method → outputs, concretely, across the two layers:
- Input: free text (a title and/or a job-description snippet, with optional
industry/levelcontext) for the classifier; or an observed title for the JobFrameresolve-titleloop. - Method (taxonomy classifier):
classifyJobTextis a deliberately transparent token-overlap heuristic, not an ML or LLM classifier. It (1) detects any literal SOC code in the text via regex (an exact match scores 1.0 and overrides the rest); (2) scores the input's token set against all 1,016 SOC title+description blobs; and (3) scores it against the 23 family and 26 function blobs (name + definition + alternative strings). It returns the top-8 SOC candidates plus the best family and function. - Method (JobFrame canon): an observed title is normalized and matched against the canon's title-alias index to ranked Family × Focus × Level profile candidates, each with a confidence band and a
recommendedAction(auto_accept|needs_review|ask_clarification|insufficient_evidence). Profiles can also be assembled top-down (construct) or exported as JSON/Markdown (export). - Output: for classify, a
ClassifyResponse(socMatches[]+jobFamily+jobFunction); for the canon, aResolveTitleResponse, aCanonicalProfile, or aJobExport.
Real data sources and the science backing. The occupation registry is O*NET 28.3 / SOC 2018 (1,016 codes, updated_at: 2025-11-30), licensed O*NET CC BY 4.0 + SOC public domain. The 23 families and 26 functions are the toolbox's canonical segmentation library, each with a verbatim definition and alternative-title strings. The JobFrame canon was rebuilt data-first from a deep-research corpus plus segmentation datasets (1,184 family×level profiles, verbatim content), with the Universal Level Framework (S1–S5, P1–P8, M1–M6, E1–E6) and source-level mappings drawn from Radford / Mercer / WTW ladders — not invented (docs/jobframe/REBUILD-PLAN.md).
Differentiation beat. The practitioner's real question is not "give me one answer" but "can I trust and defend this mapping?" Job Family Agent answers honestly on two fronts: the classifier is labelled a heuristic (not dressed up as AI), and the JobFrame loop never silently auto-accepts an ambiguous title — its recommendedAction routes thin evidence to human review rather than guessing.
Visual — Tier A (live core capture, the full ranking). See the worked example below — it shows the heuristic's real, imperfect SOC ranking and why the family/function answer is the trustworthy part today.
4. What does it enable?
Concrete uses a practitioner would recognize:
- Normalize an HRIS title column — turn thousands of free-text titles into canonical SOC codes + families so downstream analysis isn't fighting spelling variants.
- Join people data to labor-market data — a SOC code is the spine
wage-benchmarkandwage-compliancealready use, so a resolved title flows straight into market pay and the legal floor. - Build a governed job architecture — assemble draft profiles top-down from Family × Focus × Level (
construct) instead of writing every job description from a blank page. - Bulk-map an HRIS dataset with review — run a mapping pass, get ranked candidates per row, and accept-or-correct each into a tenant overlay (the HRIS mapping routes).
- Resolve one messy title on demand —
resolve-titletakes "Sr. SWE II" and returns ranked profile candidates with a recommended action, over HTTP or as an MCP tool. - Segment by what-work-someone-does — feed the canonical family/function ids into
segmentation-studioso cohorts are built on a real job taxonomy, not org-chart proxies.
Visual — Tier B (typographic capability flow). messy title / JD → { classify · resolve-title } → { SOC code · jf.* family · jfn.* function · FAMILY.FOCUS.LEVEL profile } → joins into wage / segmentation / comp.
5. How it fits in the toolbox
Data flow and contracts:
- Consumes — bundled O*NET 28.3 / SOC 2018 occupation registry, the canonical job-family / job-function segmentation library, and the JobFrame deep-research corpus (all in-repo under
data/). No live external API on the hot path; the data is bundled JSON loaded at module init, so classify runs without I/O. - Emits — the HTTP + MCP contract at
src/spokes/job-family-agent/contracts/types.ts(taxonomy + classify) andcontracts/canonical-profile.ts(the JobFrame canon shapes). Consumers (Performix, vela, future apps) vendor a copy; they never import at runtime. - Feeds —
wage-benchmarkandwage-compliance(SOC is the join key for market pay and the legal floor);segmentation-studio(its canonical job-family / job-function dimensions reference thesejf.*/jfn.*ids directly, no translation layer);anycomp(comp models keyed on SOC look up the occupation record + alternative titles); andreincarnation(a future cache use case for scoping items per family). - Consumers — Performix (
PFX-5, in progress: post-PAT-71 it calls this toolbox for job-family classification, not meta-factory) and vela (planned).
Visual — Tier B (typographic data-flow). O*NET/SOC registry + family/function library + JobFrame corpus → Job Family Agent → { wage-benchmark · wage-compliance · segmentation-studio · anycomp }.
6. Commercialization / packaging
Job Family Agent is a service component, not a standalone product — it is the canonical job spine that compensation, planning, and segmentation offerings compose on top of. It sits behind buyer-facing surfaces (job architecture, pay tooling, the JobFrame canon UI) rather than being sold on its own.
- Data-license posture: the occupation registry derives from O*NET (CC BY 4.0) and SOC (public domain) — usable with attribution, which is what lets the SOC layer be exposed openly. The toolbox/meta-factory editorial enrichments (family/function definitions, alternative strings) are first-party. The JobFrame canon separates the toolbox's own structure and definitions from any vendor-derived content; vendor-survey level ladders (Radford / Mercer / WTW) inform the mappings and carry their own licensing constraints handled in the canon's provenance.
- Anything about pricing tiers or packaged offerings is (TBD) — not earned yet, so not stated.
Visual — (TBD — product-tier placement diagram).
7. The vision
One canonical address for every job — resolved from any messy title, joinable to any labor-market or people dataset, and governed as a living Family × Focus × Level canon rather than a frozen spreadsheet.
The arc runs from the shipped taxonomy + heuristic classifier toward the full JobFrame work-intelligence vision: the deep-research corpus fully seeded into the canon, the classifier upgraded from token-overlap heuristic to LLM-backed matching (PAT-67), the HRIS mapping loop closing review→overlay at scale, and a public "Google-Finance-for-jobs" surface on top. The governing discipline is data-first: build the canon from real corpora and source ladders, generate only at the edges. Full vision, build plan, and roadmap: docs/jobframe/.
Visual — (TBD — the JobFrame canon → mapping-loop → public-surface roadmap diagram).
8. Current status
Grounded in the real code state (registry status: "live", contract 1.2.0, src/spokes/job-family-agent/):
- Shipped: the canonical occupation registry (1,016 SOC codes, O*NET 28.3), 23 families, 26 functions; the token-overlap heuristic classifier (
classify, public + IP-rate-limited at 100 req/min); the full GET taxonomy surface (soc/list,soc/[code],families/list,families/[id],functions/list,functions/[id]); the JobFrame canon contracts + schema;resolve-title,construct, andexport; the HRIS import / bulk-mapping / mapping-decision and single-JD analyze routes; JobFrame UI surfaces (/jobframe/canon,/mapping,/jd,/construct). MCP transport mirrors the HTTP surface (9 tools registered). The canon was rebuilt data-first and the level semantics corrected (P1 Entry … P8 Fellow) in the1.2.0bump. - In flight / planned: PAT-67 LLM-backed classifier (the heuristic is explicitly v1); full deep-research corpus seed promoted from draft to validated (
294 net-new families · 415 focuses · 1,185 profilesbuilt but operator-applied, allstatus: draft); thejob_family_agentPostgres tables are reserved as a future cache surface (heartbeat-only seed today; data lives as bundled JSON); the public JobFrame site and governance console.
Visual — Tier A (live capture). GET /api/spokes/job-family-agent/health and /families/list report the real shipped taxonomy at request time.
Worked example (load-bearing, real output)
Input — a realistic HRIS title that no SOC code spells the same way:
classifyJobText({
text: "Senior Backend Software Engineer building distributed systems in Go",
context: { industry: "Software" }
})
Real output of the spoke's classifyJobText core against data/occupation_registry.json (O*NET 28.3), titles resolved from the same registry:
- Job family:
jf.computer-mathematical(confidence 1.0) — correct. - Job function:
jfn.engineering(confidence 1.0) — correct. - Top-8 SOC matches (in ranked order):
15-1299.02Geographic Information Systems Technologists and Technicians — 1.047-4011.01Energy Auditors — 0.97811-3021.00Computer and Information Systems Managers — 0.93815-1253.00Software Quality Assurance Analysts and Testers — 0.80415-2021.00Mathematicians — 0.67715-1252.00Software Developers — 0.65337-2019.00Building Cleaning Workers, All Other — 0.6531-2011.00Occupational Therapy Assistants — 0.642
What a practitioner does with it — and what it honestly shows. The family and function are dead-on, and that is the part you join on for wage and segmentation work. The SOC ranking, by contrast, is noisy: the genuinely correct code, 15-1252.00 Software Developers, lands sixth, behind a GIS technologist and an energy auditor that share surface tokens ("systems," "building," "distributed"). This is exactly the documented limitation — the classifier is a token-overlap heuristic, not ML — and the explainer shows the real, imperfect ranking rather than a cherry-picked clean one. The practitioner takeaway: today, trust the family/function layer for joins, treat the SOC ranking as candidates for review, and route ambiguous titles through the JobFrame resolve-title loop (which returns a recommendedAction) rather than auto-accepting position #1. The PAT-67 LLM upgrade is the queued fix for the SOC-ranking noise. No figure here is invented — every confidence and title is real output of the shipped core.