Glass Ox — plain-language explainer

Glass Ox turns a messy, inherited dataset into a canonical model through a chain of steps you can see through and that test themselves — so an AI agent can't silently default 84% of jobs to one level and have nobody notice.

A People Analytics Toolbox component. Built to the portfolio Explainer Standard v1.0. Every claim below is grounded in the component's own code — the engine at src/lib/glass-ox/ (primitive 1.0.0) and the durable-history spoke at src/spokes/glass-ox/ (wire contract 0.1.0, registry status coming-soon). Anything not yet built is marked (TBD).


1. What is it?

Glass Ox is the toolbox's tested, transparent data-coding engine: it runs each step of a data workflow — load, parse, join, map, code — as a unit that records what it did (rows in / rows out, join shape, drops with reasons, per-field coverage and concentration) and runs fail-loud assertions against that record, so a step that quietly went wrong stops the line instead of poisoning everything downstream.

It comes in two layers, both real:

  • The engine (src/lib/glass-ox/, a cross-cutting primitive — not a spoke) — runStep, the assertion catalog, buildRunReport, and the toDataLensStage render adapter. This is the authoritative piece that actually executes and judges a workflow.
  • The spoke (src/spokes/glass-ox/) — the Postgres schema, the versioned wire contract, and the read APIs that turn one-off run reports into a queryable history (GET /runs, GET /runs/[runId], GET /plans).

The name decodes the thesis (per docs/glass-ox/CONCEPT.md): an OX is the workhorse that does the heavy, unglamorous coding labor — join discipline, crosswalk fitting, quarantine bookkeeping; "Glass" is that you can see straight through it. Black Box → Glass Ox.

Visual — Tier B (in-repo reference). The engine is six fail-loud checks in src/lib/glass-ox/assertions.ts, each returning an AssertionOutcome:

  • conservationrowsOut + quarantined + dropped == rowsIn (catches silent row loss).
  • cardinality — observed join shape matches the declared one (catches a 1:1 that fans out).
  • coverage — a field or join meets a minimum fill rate (catches the 4.9% join).
  • concentration — no field silently collapses onto one value past a ceiling (the 84%-"P3" detector).
  • profile-before-trust — inherited data is profiled before it is trusted.
  • provenance — every value is sourced or explicitly flagged, never defaulted or invented.

2. What problem does it solve — and why is it different?

The pain it removes: black-box AI and code fail silently, and silent failure is the default mode, not an edge case — a step produces a clean-looking table that is quietly wrong, gets inherited and trusted, and poisons the analysis for weeks before anyone notices.

The grounding incident is real (documented in docs/glass-ox/CONCEPT.md and reproduced in code): an AI agent "reclassified" a CompAnalyst extract to keep moving forward and defaulted 84% of jobs to level "P3". The correct level was in the source data the whole time. The agent didn't lie — it did what agents do: it invented a value to proceed rather than halt on a step it couldn't confirm.

The difference, stated as a shift:

  • FROM a workflow whose only output is the final table, where a collapse, a fan-out, or a silent drop is invisible until it has already done damage.
  • TO a workflow where every step carries its own manifest and every step is tested by default — so the bad step halts loudly with the observed-vs-expected numbers attached.

How it differs from the two existing answers, each of which solves half the problem:

  • vs. a black-box AI agent — fast and AI-composed, but no visibility and no testing: the 84%-"P3" collapse sails through.
  • vs. Alteryx / classic ETL — strong visibility (browse-everywhere), but testing is manual and easy to skip, and every step is human-dragged rather than AI-authored.

Glass Ox is the only posture that is both AI-authored and tested-by-default. That tested-trust layer is precisely what AI agents lack.

Visual — Tier B (FROM→TO typographic block). The shift above is the visual; a rendered comparison block is a follow-up.

3. How does it work?

Inputs → method → outputs, concretely.

  • Input: a workflow declared as a chain of steps. Each step is runStep({ name, action, inputs, transform, declare }) — the transform does the real work (filter, join, map), and declare names what must hold: which fields to profile, minimum coverage, maximum concentration, expected join cardinality, and the provenance source.
  • Method — manifest then judge. For each step the engine: (1) runs the transform; (2) profiles the output via the shared src/lib/data-profiler — the same profiler every spoke uses — to derive each field's coverage / concentration / distinct / modal; (3) builds the StepManifest (rows in/out, joins, drops-by-reason); (4) runs the declared assertions; (5) rolls the worst severity into the step's status (ok | warn | halt). Crucially, runStep never throws on a failed assertion — the failure is recorded in the manifest and the caller decides whether to stop. buildRunReport then chains the step manifests into a DAG-ordered RunReport, using src/lib/plan-runner's topological sort (a cycle throws PlanStepCycleError).
  • Output: a RunReport{ runId, name, steps[], overallStatus, reportedAt } — where each step manifest carries its assertions, field profiles, drops-by-reason, and join shape. The spoke persists this across four tables (runs + run_steps + run_assertions + run_quarantines) with counts denormalized on the run row, and serves it back through the read APIs.

Differentiation beat: the practitioner's real question isn't "did the workflow finish" — it's "can I trust the table it produced?" Glass Ox answers that per step: an ok step is a profiled, asserted, provenanced unit; a halt step tells you the rule that fired, what it observed, and what it expected — in the manifest, not in a log nobody reads.

Built ON the existing organs (composed, not reinvented): the profiler (data-profiler), the DAG order + run store (plan-runner), the provenance invariant (src/lib/provenance), and the render target (src/lib/data-lens, whose contract already names "a Glass Ox step" as a canonical emitter). Glass Ox owns no profiler and no DAG sort of its own.

Visual — Tier B (step flow). One step's lifecycle: transform → profile (coverage/concentration/distinct/modal) → manifest (rowsIn/Out · joins · drops-by-reason) → assertions → status (ok | warn | halt).

4. What does it enable?

Concrete uses a practitioner would recognize:

  • Catch the silent default before it ships — declare maxConcentration on a level field, and a step that collapses 84% of rows onto "P3" halts instead of inheriting downstream.
  • Catch the silent join collapse — declare the expected cardinality, and a join that claims 1:1 but fans out (or matches a tiny slice) trips the cardinality and coverage checks.
  • Quarantine with a reason, never drop silently — rows with no definition match are counted and reason-coded (no_definition_match), so conservation still balances and the bias is visible.
  • Code an inherited extract into canonical form, provably — the reference plan codes a raw CompAnalyst extract into JobFrame canon with every step glass-clear.
  • Render a real run on the canvastoDataLensStage maps each manifest into a DataLensStage so the Data Lens / canvas surface shows the actual run report, including the red HALT node.
  • Query run historyGET /runs lists persisted run summaries newest-first, filterable by plan, status, or tenant; GET /runs/[runId] rehydrates the full step DAG.

5. How it fits in the toolbox

Data flow and dependencies:

  • Consumes (engine): the toolbox's existing cross-cutting primitives — data-profiler (the profile step), plan-runner (DAG order + run-store posture), provenance (the sourced-or-flagged invariant), and data-lens (the render target). No spoke core/ is touched; the boundary is the primitive layer.
  • Emits: the RunReport shape. The engine's canonical TypeScript interfaces live in src/lib/glass-ox/contract.ts (1.0.0); the spoke mirrors them as a wire-format Zod contract in src/spokes/glass-ox/contracts/types.ts (0.1.0) so HTTP routes can validate request/response bodies without dragging the primitive's runtime into the contract surface. Field names are intentionally identical so consumers can substitute either type.
  • Feeds (intended): the data/ingestion layer of the spokes that code data — segmentation-studio ingestion, job-family-agent, the wage-* spokes, the canonical-segmentation libraries. They own what the canonical model is; Glass Ox owns how you get there provably. Registry-declared consumers calculus and performix are both marked planned.
  • No cross-spoke foreign keys: a run report may describe work that touched other spokes' data, but lineage stays as opaque strings inside the manifest JSON — per the toolbox's no-cross-spoke-internals rule.

Visual — Tier B (typographic data-flow). declared steps → Glass Ox engine (profile · manifest · assert) → RunReport → { glass_ox durable history · Data Lens canvas · the coding layer of segmentation / job-family / wage-* }.

6. Commercialization / packaging

Glass Ox is, today, a toolbox-internal capability — the tested-data-coding discipline underneath the spokes that ingest and code data, not yet a standalone sold product.

  • The concept dossier (docs/glass-ox/CONCEPT.md) records a working brand ("Glass Ox," an "AI-native Alteryx") and a placement path from primitive → spoke → surface → a potential public product, with the thesis that "coding the data is the job" and "transparency + verification is the moat." That is positioning, not a shipped offering.
  • Pricing, tiers, and packaging are (TBD) — not earned yet, so not stated.
  • Data-license posture: Glass Ox processes whatever data a workflow feeds it (uploaded HRIS/survey extracts, public reference data); it adds no new data dependency of its own and asserts provenance rather than introducing it. License constraints follow the source datasets, not the engine.

Visual — (TBD — product-tier placement diagram once a packaged offering exists).

7. The vision

An AI-native data-coding workhorse you can see straight through — where the AI composes and explains the workflow, the assertions keep it honest, and a human confirms each glass-clear step, so the 80% of real analytical value that lives in creating the trustworthy table is finally both fast and provable.

The adoption path in the engine README is concrete and ordered: harness (shipped) → re-run the comp-market pipeline through it → adopt toolbox-wide on the data/ingestion layer of every spoke → the glass-box canvas surface → AI authoring from the tested step library. The durable asset is the step library plus the assertion catalog. The spoke's own build plan (docs/glass-ox/BUILD-PLAN.md) walks the slices that promote it: Slice 2 read APIs + persistence (shipped, now coming-soon), Slice 3 MCP + a POST plan-run, Slice 4 the flip to live after the surface swap.

Visual — (TBD — the slice roadmap as a primitive→spoke→surface→product progression).

8. Current status

Grounded in the real code state (engine 1.0.0, spoke wire contract 0.1.0, registry status coming-soon):

  • Shipped — engine: runStep, the six-check assertion catalog, buildRunReport (DAG-ordered via plan-runner), toDataLensStage, and a memoized real-scale worked example (examples/companalyst-coding.ts). The already-live surface at src/app/(surfaces)/glass-ox/ reads the primitive directly.
  • Shipped — spoke (Slice 2): the glass_ox schema (heartbeat + plans + runs + run_steps + run_assertions + run_quarantines), persistence + read helpers (persistGlassOxRun, listRuns, getRunById, listPlans), and four public read routes — GET /health, GET /runs, GET /runs/[runId], GET /plans — with the companalyst-coding reference plan seeded so the plans list is never empty on a fresh deploy. Wired into the health aggregate (src/lib/health/aggregate.ts) and drizzle.config.ts.
  • In flight / planned: Slice 3 — MCP module (glass-ox.{health,runs.list,runs.get,plans.list,plans.run}) and a POST synchronous plan-run that persists via persistGlassOxRun; Slice 4 — the flip to live after the surface swap. No MCP tools are registered yet (register-tools.ts has no glass-ox entry), and there is no POST route — consistent with coming-soon.
  • Doc drift note: the spoke README still reads status: reserved (Slice 1). The registry — the source of truth — has it at coming-soon with the four read contracts wired and routes present; this explainer follows the registry and the code.

Visual — Tier A (live capture, once the spoke is deployed). GET /api/spokes/glass-ox/plans returns the seeded reference plan:

GET /api/spokes/glass-ox/plans
→ {
    "plans": [
      {
        "id": "glass-ox.plan.companalyst-coding",
        "slug": "companalyst-coding",
        "name": "CompAnalyst coding",
        "description": "Reference Glass Ox plan — coding a raw CompAnalyst extract into JobFrame canon.",
        "donor": "src/lib/glass-ox/examples/companalyst-coding.ts",
        "createdAt": "1970-01-01T00:00:00.000Z"
      }
    ]
  }

(Shape and values from COMPANALYST_CODING_PLAN in src/spokes/glass-ox/db/storage.ts — the seed every GET /plans call ensures.)


Worked example — the CompAnalyst coding run, judged by the harness

This is the load-bearing example, and it is real: every row count, field profile, and assertion outcome below is computed by the harness in src/lib/glass-ox/examples/companalyst-coding.ts from the rows each transform emits — nothing is hand-set. The dataset is generated deterministically at the real run's scale (983,412 rows in, from classify_companalyst.py on pay_companalyst_national.csv).

Five steps run through runStep, then chain into one RunReport (runId: "rpt_ca_companalyst_coding"):

  1. load — load CompAnalyst national pay → raw pay table. 983,412 rows out. Source step (conservation is vacuous), job_code profiled at ≥99.9% coverage. Status: ok.
  2. parse-native-level — parse the native level code out of the ||| segment → ca_level_code. 100% of kept rows carry a code; maxConcentration on ca_level_code is 0.5 and the modal native code lands ~12%, well under the ceiling. Status: ok.
  3. map-universal-level — map native level → universal P/M/E/S via the documented crosswalk → universal_level. Modal universal level "P3" lands ~12.7%, honestly spread. Status: ok.
  4. code-family-function-focus — join the CompAnalyst definitions union by Job Code → family · function · focus. 5,866 rows have no definition row and are quarantined with reason no_definition_match (out + quarantined = in; 977,546 coded out). Declared many:1; the observed join matches. Status: ok.
  5. reclassify-level-blackbox (the contrast step) — a black-box reclassifier that ignores the parse/map/code work and collapses level to 84% "P3," with no provenance declared. Its declared maxConcentration: { level: 0.5 } fires: 0.84 ≥ 0.50. Status: halt.

Because one step halts, the RunReport.overallStatus is halt. What a practitioner does with it: the red HALT node on the canvas is the harness's genuine verdict — the 84%-"P3" silent default that poisoned the pay model for weeks is now caught at the step that produced it, with the observed concentration (0.84) and the expected ceiling (0.50) attached. The four honest steps stand; the dishonest one is stopped.

No figure in this explainer is invented. The run-scale numbers (983,412 → 977,546; 5,866 quarantined; 84% P3) and the assertion outcomes are produced by the Glass Ox harness over its deterministic in-repo dataset, and the cited grounding incident is recorded in docs/glass-ox/CONCEPT.md.