Engineering / 2026-03-20 / 14 min

On-prem LLMs are not a step backward

Notes from shipping a 70B-class model behind a hospital firewall. The constraints made the system better.

S. Hartman Engineering

Notes from shipping a 70B-class model behind a hospital firewall. The constraints made the system better.

Production AI work has a way of punishing abstractions. The useful lesson usually appears after a model has met a real workflow, a real constraint, and a stakeholder who can say precisely what would make the system unsafe.

At Kryse we write these notes as field documentation: what we saw, what we measured, what failed, and what pattern we would reuse. The goal is not novelty. The goal is a system another senior engineer could operate without theatrical confidence.

The durable pattern is simple: define the failure modes, turn them into evals, wire those evals into the release path, and make the human handoff explicit before the model does anything expensive.