Research / 2026-04-11 / 9 min

When RFT outperforms DPO — and when it does not

A field-tested comparison across three production engagements. The answer is not the one the papers suggest.

A. Lefèvre Research

A field-tested comparison across three production engagements. The answer is not the one the papers suggest.

Production AI work has a way of punishing abstractions. The useful lesson usually appears after a model has met a real workflow, a real constraint, and a stakeholder who can say precisely what would make the system unsafe.

At Kryse we write these notes as field documentation: what we saw, what we measured, what failed, and what pattern we would reuse. The goal is not novelty. The goal is a system another senior engineer could operate without theatrical confidence.

The durable pattern is simple: define the failure modes, turn them into evals, wire those evals into the release path, and make the human handoff explicit before the model does anything expensive.