Spoke

data-anonymizer

Privacy as a contract — 24-rule PII catalog, deterministic tokenization, min-N gate.

Character

A PA professional who has to share data — with vendors, with internal partners, in board decks, in published analyses. You know one mistake creates a compliance crisis and you'd rather not be the person whose name is on it.

Problem

External. PII discipline in HR tools is an afterthought. The settings live in admin sub-menus. Min-N anonymity rules vary by team. Tokenization is inconsistent. The privacy review process is asynchronous and slow.

Internal. You're seen as the bottleneck — the analyst who blocks share-outs over privacy concerns. You'd rather be the analyst who unblocks share-outs because the privacy primitive is in the toolbox.

Philosophical. Privacy should be a contract, not a setting. Min-N gates should be enforced at the API boundary, not flagged in a quarterly audit. Tokenization should be deterministic and cache-backed so the same person produces the same token across calls.

Guide

Data-anonymizer is the safety layer that strips identifying details out of text and replaces sensitive IDs with stable stand-ins — and it refuses to surface results when a group is too small to be safely shared (a privacy guarantee formally called ). data-anonymizer — the toolbox's cross-cutting privacy primitive. 24-rule PII detection catalog (with tenant overrides). Deterministic -keyed tokenization with persistent map cache. Redaction-with-spans for arbitrary text. Min-N privacy gate as a contract. Substitution-strategy registry (mask, pseudonymize, synthetic-realistic). Planned upgrades add and for cross-customer egress.

Abstract

Background. People analytics ships to partially public contexts—deck PDFs, LLM explorers, Slack screenshots—while privacy controls hide in admin menus unreachable to analysts automating disclosures.

Methodology. A catalogued rule engine emits span-level PII detections, deterministic keyed tokenisation with durable identity maps for reproducible joins, and aggregation gates expressed as explicit min-n-check predicates. Tenant overlays extend or attenuate catalogue rows without rewriting core validators.

Scope. Stateless helper—does not own workforce truth tables or replace enterprise DLP egress; complements segmentation-studio identifiers and calculus exports.

Contribution. Behaviour is enforced through explicit contracts so CI/CD consumers lint the payloads agents operationalise—no invisible UI knobs. MCP + HTTP parity keeps humans and autonomous workflows aligned.

Evidence / Provenance. Toolbox-native catalogue evolution recorded in PAT privacy memos and CHANGELOG iterations alongside consulting redaction discipline.

Plan

  1. 01

    List active rules

    Hit data-anonymizer.pii-rules — global plus tenant overrides. Discovery you can call from anywhere, no auth.

  2. 02

    Redact text before sharing

    Call data-anonymizer.redact — returns redacted text with span audit metadata (category, risk, original offsets).

  3. 03

    Tokenize identifiers deterministically

    data-anonymizer.tokenize — HMAC-keyed, per-tenant cache. Same input ID always returns the same token across calls.

  4. 04

    Gate small segments

    Wrap any aggregation through data-anonymizer.min-n-check before publishing. The privacy gate is a contract — you can't accidentally skip it.

Call to Action

Direct. Try the API. Hit the pii-rules endpoint free.

Transitional. Read the privacy methodology (Phase 3). See the rule catalog inline.

Spoke I/O (visual language v1)

Every toolbox spoke shares the same abstract choreography: typed inputs on the left, distilled verbs in the center, typed outputs on the right, and (when relevant) cross-spoke HTTP composition along the bottom rail. Source package: @people-analytics-toolbox/spoke-illustrations.

Data anonymizerINPUTSMAIN ACTIONSOUTPUTSRaw text payloadsstring | span[]Tenant catalogsPiiRuleSetDetect PIITokenize & redactMin-N gateRedaction mapRedactionSpan[]Stable tokensSurrogateKeyMapservice-key gated writesaudit-friendly spansCOMPOSES WITHsegmentation-studiopreference-modeler

Try it now

Copy this curl. Paste in any terminal. Public read — no auth needed.

data-anonymizer.pii-rules

GET

List the active PII detection rules — header + content patterns, with risk + category. Public read.

curl -sS "https://people-analytics-toolbox.vercel.app/api/spokes/data-anonymizer/pii-rules"

Vendor the contract

The Zod contract is the source of truth. Vendor a copy into your consumer app — you keep it; we don't break it underneath you. Re-vendor when the version bumps.

// In your consumer app:
import { z } from "zod";

// Vendor a copy of these contracts from the toolbox repo at:
//   src/spokes/data-anonymizer/contracts/types.ts

import {
  PiiRulesResponseSchema,
  RedactionRequestSchema,
  RedactionResponseSchema,
  TokenizationRequestSchema,
  TokenizationResponseSchema,
  MinNCheckRequestSchema,
  MinNCheckResponseSchema,
  CONTRACT_VERSION,
} from "./vendored/data-anonymizer/types";

// Then call the toolbox over HTTP or MCP.
// See docs/EXTERNAL-CONSUMERS.md for onboarding.

Source path: src/spokes/data-anonymizer/contracts/types.ts · GitHub

Failure

Synthetic-looking tokens collide with absent governance; auditors discover drifting min-n practice because each team improvised—your brand becomes synonymous with leakage.

Success

One explainable ledger of redactions and tokens; partner teams rerun the same primitives pre-deck automatically. Demonstrable compliance lineage when regulators ask—not theater.