[01] / Most engagements
Eval-first builds
We build the system around the test suite, not the other way around.
A discovery + build engagement that ships a production AI system behind a measurement harness. The eval is the spec — written before the prompt, run on every commit, and handed to your team as the runbook for everything that comes after.
Included
- Two-week discovery with a signed problem-and-success spec
- Eval harness — regression, adversarial, drift, cost suites
- System build: agents, fine-tunes, retrieval, gateway, guardrails
- Observability: tracing, cost dashboards, on-call runbook
- Two-week supervised handoff with your engineering team
- 60-day defect-fix window after handoff
Not included
- Frontend / product UI work outside the AI surface
- Long-tail data labeling (we partner; we don't staff it)
- On-call after the 60-day window (move to Embed)
A mid-market lender's underwriting copilot. Twelve weeks. Eval suite of 412 cases gates every deploy.