AI done right — the proof, not the pitch

Three case studies in using AI responsibly for people analytics.

Plenty of vendors say “AI-powered.” This page shows the discipline that actually differentiates the work: staged and curated data, methods you can read, your company’s data kept out of the model, summarization treated as secondary, and a stated range of uncertainty on every number. Each study below is grounded in a live toolbox capability — shown, not claimed.

3 case studies · each grounded in a live spoke · 5 honesty rails demonstrated per study

See the case studies Read the AI posture

The bar

Five disciplines. Every study clears all five.

Responsible AI in people analytics is not a single virtue you can pick. Drop any one of these and the posture collapses into the chatbot-on-your-HRIS failure mode. Each case study below is annotated against all five — with evidence, not assertions.

01
Staged, curated data — not a raw dump
The AI works on cleaned, canonical inputs inside an orchestrated pipeline — never on whatever mess the upstream system produced.
02
Not a black box
The method and the assumptions are visible. Every output traces to a rule, a source, or a calculation you can read.
03
Tenant data never exposed
AI is pointed at public and toolbox-owned content. Your roster, pay, and survey answers stay inside a deterministic database boundary.
04
Summarization is secondary
The LLM-written 'explain this' panel is the easy part. The real computation underneath is what makes the summary trustworthy.
05
Error bars, never bare point estimates
Every number ships with its uncertainty and its data caveats — confidence bands, low-sample flags, conflicts surfaced, not hidden.

The case studies

How the discipline holds up — in production.

Boring enough that a CISO believes them; concrete enough that a leader can picture deploying them. Each one shows what gets produced, where the AI actually sits in the pipeline, and the per-rail evidence.

Case study 01 · the canonical example · wage-complianceLive spoke

Reading wage law with AI — and checking its work before it counts

AI extracts wage rules from public ordinances; a human confirms every extraction; your workforce is then evaluated by a deterministic, citation-backed query.

What the pipeline produces

Jurisdictions tracked: US federal · 50 states · DC · local ordinances
AI's input: Public ordinances & gov sites only
Before a rule counts: Human review + full citation
Your data in a model prompt: Never

The hard part is not the model call — it’s the staging, the cross-checking, and the confidence-scored that stands between an extraction and a published rule.

The orchestrated pipeline

01.Source discovery (DOL FLSA · state-labor sites · NCSL · UC Berkeley Labor Center · ordinance PDFs)
02.Acquisition + snapshot of the source document
03.AI extraction (Claude API, citation required, confidence-scored)
04.Normalization into the canonical wage-rule schema
05.Cross-source validation (multiple sources must agree)
06.Conflict detection (versioned with a rule-change event)
07.Confidence scoring (source × extraction × recency × cross-source)
08.Human review queue (a person confirms or corrects)
09.Canonical publication (validation_status: validated)
10.Deterministic evaluation of your workforce against the published rules

Where the AI actually sits

AI sits at one node: reading a public ordinance and proposing a structured rule with the citation attached. Everything before it (acquisition) and after it (validation, conflict detection, , human review, and the actual compliance check on your data) is deterministic engineering.

The five rails, evidenced

Staged, curated data — not a raw dump

The extractor never sees a raw document dump. Each source is acquired, snapshotted, and fed in one ordinance at a time, and its output is normalized into a single canonical schema before anything downstream touches it.

Not a black box

Every published rule carries its source citation and a versioned change history. When the dashboard says an employee is below the floor, it shows the — which jurisdiction, which rule, which citation — not just a verdict.

Tenant data never exposed

The AI extractor operates on public ordinances and government sites only. The compliance check itself is a database lookup: given an employee’s normalized location and wage, SQL resolves the rule and TypeScript computes the gap. rows never enter a model prompt.

Summarization is secondary

The “explain this finding” panel renders the jurisdiction trace and citation into plain language. Useful — and the easy part. What makes it trustworthy is the staged extraction, validation, and confidence scoring behind it.

Error bars, never bare point estimates

Rules carry a confidence score, not a false air of certainty. Low-confidence extractions are held in review rather than published; conflicts between sources are surfaced as explicit rule-change events. The uncertainty is on the table.

What it does not do

This does not decide whether to remediate, or how. It tells you exactly which employees are below exactly which rule, with the citation attached — the validity ceiling is the human reviewer’s judgment, and the remediation decision is yours.

You’re not asking an AI whether your workforce is compliant. You’re asking the toolbox to keep the rule graph current; a deterministic, citation-backed evaluation tells you who is noncompliant under which rule. The AI keeps the graph fresh. You decide what to do.

See the wage-compliance solution

Case study 02 · the one people expect to be a black box · job-family-agentLive spoke

Mapping messy job titles — without letting a model guess on your roster

AI helps author the canonical job taxonomy once; mapping your titles into it is a deterministic, evidence-cited lookup — not a per-row model guess.

How a title gets classified

Runtime match on your titles: Deterministic alias lookup — no LLM
Confidence band on every match: high · medium · low
Unmatched titles: Returned as 'insufficient evidence', not guessed
Canon scale: 1,000+ SOC codes · families · functions

The counter-intuitive part: the responsible version of “AI job classification” uses no model at all at the moment it touches your roster. The match is a cited, auditable lookup with an explicit confidence band.

The orchestrated pipeline

01.AI-assisted authoring of the canonical taxonomy (families, functions, levels, aliases) — done once, against public SOC/O*NET + toolbox content
02.Human governance of the canon (definitions reviewed, aliases approved)
03.Your titles normalized (casing, punctuation, known variants)
04.Deterministic alias match into canonical profiles — no LLM at runtime
05.Confidence band assigned per match (high ≥ 0.85 · medium ≥ 0.6 · low)
06.Low-confidence / unmatched titles flagged for review, never auto-assigned
07.Reviewer confirms or corrects; corrections feed back as new tenant aliases

Where the AI actually sits

AI sits upstream of your data entirely — it assists in authoring the taxonomy and aliases against public references. The that runs on your roster is deterministic token-overlap and alias matching with explicit confidence — not a model guess on each employee.

The five rails, evidenced

Staged, curated data — not a raw dump

Your titles are normalized before matching, and the thing they match against is a curated, human-governed canon — not a free-text prompt. Single job descriptions are analyzed with keyword heuristics, not an opaque model call.

Not a black box

Every match returns the candidate, the basis for the match (which alias fired), and a confidence band. An unmatched title returns “insufficient evidence” — the system declines to guess rather than inventing a plausible-looking answer.

Tenant data never exposed

Because the runtime path is a deterministic lookup, your roster never goes to a model. The AI’s work happened earlier, on public taxonomies and toolbox-owned content, long before your titles arrived.

Summarization is secondary

Plain-language explanations of why a title mapped where it did are a readable surface over the deterministic evidence — the match decision is made by the lookup, not by the words describing it.

Error bars, never bare point estimates

The confidence band isthe error bar. Medium and low matches route to a human instead of being published as fact, and the share of titles that couldn’t be matched is reported rather than papered over.

What it does not do

Runtime classification is alias-and-overlap matching, not semantic understanding — an embedding fallback for genuinely novel titles is a later wave. Today, a title with no known alias is honestly returned as unmatched, which is the safe failure mode for a system that touches headcount.

Everyone assumes “AI title classification” means a model judging your employees one row at a time. Done responsibly, it’s the opposite: AI builds the map once, in the open; your roster is matched by a cited lookup that knows when to say “I don’t know.”

See the job-family-agent spoke

Case study 03 · the decision the buyer most wants automated · anycompLive spoke

Compensation scenarios that show their math — and never hand you one answer

A deterministic optimizer and simulator turn comp strategy into several costed, tradeoff-explicit scenarios — so a human chooses, with the arithmetic carried for them.

What the decision layer returns

Output shape: Several scenarios — never one option
Optimizer & simulator: Deterministic — no model in the math
Each scenario: Costed, with tradeoffs made explicit
Example budget delta: +$1.4M vs. −0.6% attritionillustrative

The numbers in that last row are illustrative, not a client result. The point is the shape: the engine hands a human a comparison of costed scenarios with the tradeoffs surfaced — it does not pick the “right” one for you.

The orchestrated pipeline

01.Strategy + priorities captured (what the comp program is trying to do)
02.Priorities mapped to objective weights over costed measures
03.Deterministic optimizer searches the parameter space under constraints
04.Simulator projects each candidate scenario's cost and effect
05.Several scenarios generated side-by-side with per-field deltas
06.Value-of-information ranking on which unknowns most change the call
07.Human selects a scenario; the explain panel narrates the tradeoffs

Where the AI actually sits

The optimizer, simulator, and scenario comparison are deterministic math — seeded and reproducible. An LLM appears only in the explain layer: narrating, in plain language, the tradeoffs the engine already computed. The model does not choose the compensation outcome.

The five rails, evidenced

Staged, curated data — not a raw dump

Scenarios are generated from a structured strategy and a costed measures catalog, not a free-text wish. The optimizer operates inside explicit constraints, so the inputs are curated before any computation runs.

Not a black box

Each scenario is a side-by-side of cycle parameters with a per-field delta and a diverges flag — you can read exactly which lever moved and what it cost. The reasoning is arithmetic, not an opaque verdict.

Tenant data never exposed

The optimization runs as deterministic computation over your structured comp parameters inside the database boundary. The LLM in the explain panel describes the result; your individual compensation rows are not its prompt context.

Summarization is secondary

The plain-English “here’s why scenario B costs more but retains your senior engineers” narration is the readable surface. The decision content — the costed scenarios — comes from the deterministic engine, not the summary.

Error bars, never bare point estimates

The engine returns several scenarios precisely so no single point estimate masquerades as the answer, and the ranking is explicit about which unknowns the recommendation is most sensitive to.

What it does not do

It does not decide your comp strategy or sign off on a budget. It compresses the arithmetic of comparing options — the choice between scenarios, and the judgment about constraints, stays with the compensation owner.

A black-box comp tool hands you one number and asks you to trust it. This hands you several costed scenarios with the tradeoffs on the table, and lets a human choose — because “analytics is decisions, not a verdict widget.”

See the anycomp spoke

More studies land here as new toolbox capabilities clear all five rails. The bar is high on purpose: a capability that uses AI well on four counts but fails one is a footnote, not a case study. Numbers marked illustrative show the shape of an output, never a real client result — and we never invent client names.

What this page is not

This is not a per-solution sell page.

The per-solution pages — /wage-compliance and the individual spoke pages — sell a specific capability to its specific buyer. This page answers a different question, for a different reader: the leader doing category-level diligence who wants to know whether the AI here can be trusted with which data, on what terms, with what guardrails. The studies above are the evidence; each links back to the capability it describes. For the abstract version of the argument, see the AI-in-HR posture page.

Want to pressure-test these guardrails against your environment?

The most common conversation we have with CHROs and CISOs is not about features — it’s about which AI postures their privacy framework and audit obligations can actually support. Bring the case study that most resembles your problem; we’ll walk through the data boundaries and the method.

Book a diligence call See the rigor catalog