Spoke
data-anonymizer
Privacy as a contract — 24-rule PII catalog, deterministic tokenization, min-N gate.
Character
Problem
External. PII discipline in HR tools is an afterthought. The settings live in admin sub-menus. Min-N anonymity rules vary by team. Tokenization is inconsistent. The privacy review process is asynchronous and slow.
Internal. You're seen as the bottleneck — the analyst who blocks share-outs over privacy concerns. You'd rather be the analyst who unblocks share-outs because the privacy primitive is in the toolbox.
Philosophical. Privacy should be a contract, not a setting. Min-N gates should be enforced at the API boundary, not flagged in a quarterly audit. Tokenization should be deterministic and cache-backed so the same person produces the same token across calls.
Guide
Abstract
Background. People analytics ships to partially public contexts—deck PDFs, LLM explorers, Slack screenshots—while privacy controls hide in admin menus unreachable to analysts automating disclosures.
Methodology. A catalogued rule engine emits span-level PII detections, deterministic keyed tokenisation with durable identity maps for reproducible joins, and aggregation gates expressed as explicit min-n-check predicates. Tenant overlays extend or attenuate catalogue rows without rewriting core validators.
Scope. Stateless helper—does not own workforce truth tables or replace enterprise DLP egress; complements segmentation-studio identifiers and calculus exports.
Contribution. Behaviour is enforced through explicit contracts so CI/CD consumers lint the payloads agents operationalise—no invisible UI knobs. MCP + HTTP parity keeps humans and autonomous workflows aligned.
Evidence / Provenance. Toolbox-native catalogue evolution recorded in PAT privacy memos and CHANGELOG iterations alongside consulting redaction discipline.
Plan
- 01
List active rules
Hit
data-anonymizer.pii-rules— global plus tenant overrides. Discovery you can call from anywhere, no auth. - 02
Redact text before sharing
Call
data-anonymizer.redact— returns redacted text with span audit metadata (category, risk, original offsets). - 03
Tokenize identifiers deterministically
data-anonymizer.tokenize— HMAC-keyed, per-tenant cache. Same input ID always returns the same token across calls. - 04
Gate small segments
Wrap any aggregation through
data-anonymizer.min-n-checkbefore publishing. The privacy gate is a contract — you can't accidentally skip it.
Call to Action
Direct. Try the API. Hit the pii-rules endpoint free.
Transitional. Read the privacy methodology (Phase 3). See the rule catalog inline.
Spoke I/O (visual language v1)
Every toolbox spoke shares the same abstract choreography: typed inputs on the left, distilled verbs in the center, typed outputs on the right, and (when relevant) cross-spoke HTTP composition along the bottom rail. Source package: @people-analytics-toolbox/spoke-illustrations.
Try it now
Copy this curl. Paste in any terminal. Public read — no auth needed.
data-anonymizer.pii-rules
GETList the active PII detection rules — header + content patterns, with risk + category. Public read.
curl -sS "https://people-analytics-toolbox.vercel.app/api/spokes/data-anonymizer/pii-rules"
Vendor the contract
The Zod contract is the source of truth. Vendor a copy into your consumer app — you keep it; we don't break it underneath you. Re-vendor when the version bumps.
// In your consumer app:
import { z } from "zod";
// Vendor a copy of these contracts from the toolbox repo at:
// src/spokes/data-anonymizer/contracts/types.ts
import {
PiiRulesResponseSchema,
RedactionRequestSchema,
RedactionResponseSchema,
TokenizationRequestSchema,
TokenizationResponseSchema,
MinNCheckRequestSchema,
MinNCheckResponseSchema,
CONTRACT_VERSION,
} from "./vendored/data-anonymizer/types";
// Then call the toolbox over HTTP or MCP.
// See docs/EXTERNAL-CONSUMERS.md for onboarding.Source path: src/spokes/data-anonymizer/contracts/types.ts · GitHub
Failure
Synthetic-looking tokens collide with absent governance; auditors discover drifting min-n practice because each team improvised—your brand becomes synonymous with leakage.
Success
One explainable ledger of redactions and tokens; partner teams rerun the same primitives pre-deck automatically. Demonstrable compliance lineage when regulators ask—not theater.