Engineering / 2026-04-18 / 12 min
Eval-driven development: writing the test before the prompt
How we structure evaluation harnesses so a regression on a single prompt does not become a midnight rollback.
M. Voronov// field journal
Engineering, research, and process writing from the team, published when there is something worth saying, not on a content calendar.
// archive
Engineering, research, platform notes, process notes, and case-study writing from the team.
Engineering / 2026-04-18 / 12 min
How we structure evaluation harnesses so a regression on a single prompt does not become a midnight rollback.
M. VoronovResearch / 2026-04-11 / 9 min
A field-tested comparison across three production engagements. The answer is not the one the papers suggest.
A. LefèvrePlatform / 2026-04-04 / 7 min
Cost ceilings, failover, and the boring infrastructure that keeps a multi-model deployment from quietly bankrupting you.
T. IwasakiCase study / 2026-03-27 / 11 min
What a multi-agent procurement system actually looks like when you wire it into SAP and let it learn from outcomes.
K. OkaforEngineering / 2026-03-20 / 14 min
Notes from shipping a 70B-class model behind a hospital firewall. The constraints made the system better.
S. HartmanProcess / 2026-03-13 / 6 min
A redacted version of the document that ends every discovery engagement. This is what we sign before we build.
K. OkaforCase study / 2026-03-06 / 10 min
Loan officers still decide. The tool just removes the parts they hate. Lessons from a mid-market lender deployment.
M. VoronovEngineering / 2026-02-27 / 8 min
Why bolting safety onto inference is a recipe for whack-a-mole, and what we ship instead.
A. LefèvreCompany / 2026-02-20 / 5 min
No leetcode. One system-design interview, one paper-reading interview, one project debrief. We tell candidates what we are looking for in advance.
S. Hartman