[01] / BANKING · LENDING · MARKETS
Financial Services
Where 'wrong' is a regulatory finding, not a UX bug.
We work with banks, lenders, asset managers, and fintech infrastructure providers. Every system we ship in this sector is designed assuming a regulator will read its outputs.
11 engagements
7 deployed
$4B in annual originations underwritten through one of our systems
Problems
Underwriting & credit decisioning
Adverse-action explanations, fair-lending parity, model-risk-management documentation. We build eval suites that double as MRM artifacts.
Research summarization
Buy-side analyst tooling. Source-cited summaries with hallucination thresholds in basis points, not percentages.
Compliance & KYC review
Document-review agents with privilege-leak guardrails and full audit trail per case.
Trading-desk copilots
Latency-bound inference (sub-200ms), structured output validation, kill-switch as first-class citizen.
We will
- Engagements with model-risk-management or compliance involvement on day one
- Production access including read replicas of customer data (under DPA)
- Sponsors at VP+ level who can sign off on the eval bar
We will not
- Robo-advice without licensed humans in the loop
- Crypto trading systems with consumer-facing risk
- Anything we wouldn't show our own MRM lead
CFPB
OCC
FINRA
SEC
FCA (UK)
BaFin (DE)
[02] / CLINICAL · PAYER · LIFE SCIENCES
Healthcare
PHI in, evidence-cited outputs, every action logged.
Health systems, payers, and life-sciences companies. We are HIPAA-trained; we sign BAAs; we ship inside customer VPCs. We do not work on direct-to-consumer medical advice.
6 engagements
4 deployed
Clinical-decision-support agent reviewed in 4 IRB-equivalent panels before shipping
Problems
Clinical-decision support
Differential generation, guideline citation, contraindication checks. Always physician-in-the-loop; system never finalizes a treatment recommendation.
Payer prior-authorization review
Document-heavy review with policy-citation requirements; rejection reasons must be appealable in writing.
Clinical trial protocol drafting
Eligibility criteria normalization, protocol-deviation detection, recruitment matching.
Medical record summarization
Encounter-by-encounter summaries with PHI-aware redaction and provenance.
We will
- Clinician sponsor named on day one
- BAA signed before discovery
- Outputs that augment a credentialed human, never replace one
We will not
- Direct-to-patient symptom triage without clinicians
- Diagnostic claims that haven't gone through your regulatory pathway
- Wellness chatbots
HIPAA / HHS OCR
FDA (where applicable)
EMA
MHRA (UK)
state medical boards
[03] / LAW FIRMS · IN-HOUSE · LEGAL TECH
Legal
Privilege never leaks. Citations are real or it's a defect.
AmLaw firms, in-house legal teams, and legal-tech platforms. Privilege protection and citation faithfulness are pass-fail; there is no soft fail.
5 engagements
3 deployed
Privilege-leak rate driven from 2.4% → 0.0% across 12K production queries
Problems
Contract review & redlining
Clause-level diff against playbooks, deviation flagging, citation back to authoritative templates.
Privilege-aware research
Cross-matter retrieval with strict information-barrier enforcement at the index layer.
Discovery & document review
First-pass relevance and privilege classification with attorney sampling and audit.
Brief & memo drafting
Citation-checked drafts; every cited authority verified live, not from the model's memory.
We will
- Information-barrier requirements known before discovery
- Partner-level sponsor and a dedicated KM lead on the engagement
- Citation-faithfulness threshold set in writing (we recommend ≥99.5%)
We will not
- Anything that automates final legal advice
- Cross-matter contamination by design
- Citation tools that don't verify against live sources
State bar associations
ABA Model Rules
ICO (UK)
SRA (UK)
[04] / E-COMMERCE · BRAND · MARKETPLACE
Retail & Consumer
Margin-aware automation. Hallucinations cost money in this sector — literally.
Retailers, marketplaces, and direct-to-consumer brands. The work concentrates around customer support, merchandising, and operations — places where automation has direct P&L impact.
4 engagements
4 deployed
Identified $112K of unauthorized refunds in a single 30-day audit window
Problems
Customer-support automation
Tier-1 ticket resolution with explicit refund / discount / store-credit authority bounded in policy.
Merchandising & content
Product-attribute extraction, taxonomy normalization, copy generation with brand-voice eval.
Returns & fraud triage
Pattern detection on returns, account-linkage, abuse signals.
Pricing & promotion analysis
Decision support, never autonomous re-pricing without human approval.
We will
- Finance & ops both at the kickoff
- Policy authority for the agent written in advance
- Real production traffic for shadow evaluation
We will not
- Autonomous price changes
- Influencer-style content at brand scale
- Hot-takes on consumer-facing trends
FTC
state consumer-protection AGs
GDPR (where applicable)
PCI-DSS
[05] / INDUSTRIAL · ENERGY · LOGISTICS
Manufacturing & Industrial
Telemetry-rich. Explainability-mandatory. Downtime costs more than the engagement.
Industrial operators, OEMs, energy producers, and logistics platforms. Most engagements work on telemetry, anomaly detection, and operations support — places where wrong calls have safety implications.
3 engagements
2 deployed
Logistics platform at 1.2M shipments/day stewarded for 9 months
Problems
Anomaly detection on telemetry
Pattern detection across PLC / SCADA streams with operator-readable explanations and shift-handoff continuity.
Operations & maintenance copilots
Field-tech assistance with inventory-aware suggestions and serviceable-parts validation.
Logistics dispatch & exception handling
Exception triage with cost-aware routing recommendations; never autonomous routing changes without dispatcher approval.
Safety-incident analysis
Post-incident summarization with full chain of evidence; an artifact for regulators and insurers, not a generator of conclusions.
We will
- Operations leadership on the engagement
- OT-network access scoped through your security team
- Deterministic safety overrides at the system boundary
We will not
- Autonomous control of physical systems
- Anything that bypasses your SOC / OT segmentation policy
- Greenfield 'AI factory' projects without existing telemetry
OSHA
MSHA
FERC / NERC (energy)
DOT / FMCSA (transport)
EU NIS2
[06] / B2B SAAS · INFRA · DEVELOPER TOOLS
Technology
Your engineers are sharp. We bring eval discipline they haven't built yet.
B2B SaaS companies, infrastructure providers, and developer-tool companies. These engagements skew toward platform and harness work; the customer's engineering team is strong but has not yet built the evaluation discipline production AI demands.
8 engagements
6 deployed
Mean time-from-idea-to-shadow-deploy of 12 days across 4 internal teams
Problems
Customer-facing AI features
Search, summarization, in-product agents — eval-gated, cost-budgeted, observable.
Internal developer tooling
Code search, runbook copilots, on-call assistants. Highest ROI; usually shipped in <8 weeks.
Eval & observability platform
We build the harness for product engineers to use. Reusable across features once it exists.
Migration off vendor lock-in
Multi-model gateway, prompt portability, request signing. Decoupling the product from any one model provider.
We will
- Engineering leadership wants the discipline as much as the deliverable
- Engagement explicitly intended to up-skill your team
- Real production traffic available for evaluation, not just synthetic data
We will not
- Pure prompt engineering with no eval surface
- 'AI strategy' without a target system
- Vendor-procurement RFPs
GDPR
CCPA
SOC 2
ISO 27001
industry-specific (where customer base demands it)