Research / 2026-04-11 / 9 min
When RFT outperforms DPO — and when it does not
A field-tested comparison across three production engagements. The answer is not the one the papers suggest.
A field-tested comparison across three production engagements. The answer is not the one the papers suggest.
Production AI work has a way of punishing abstractions. The useful lesson usually appears after a model has met a real workflow, a real constraint, and a stakeholder who can say precisely what would make the system unsafe.
At Kryse we write these notes as field documentation: what we saw, what we measured, what failed, and what pattern we would reuse. The goal is not novelty. The goal is a system another senior engineer could operate without theatrical confidence.
The durable pattern is simple: define the failure modes, turn them into evals, wire those evals into the release path, and make the human handoff explicit before the model does anything expensive.