DPO fine-tuning now available - first impressions
Dr. Anna KowalskiMar 10, 2026
OpenAI recently added Direct Preference Optimization (DPO) to the fine-tuning API. I've been testing it for preference alignment and here are my first impressions.
Data format
Instead of (input, output) pairs, you provide (input, chosen, rejected) triples:
{
"input": [{"role": "user", "content": "Explain quantum computing"}],
"preferred_output": [{"role": "assistant", "content": "Clear, concise explanation..."}],
"non_preferred_output": [{"role": "assistant", "content": "Overly verbose, inaccurate..."}]
}
Early results show DPO is particularly effective for:
Less effective for:
4.5k views24 replies66 likes
Log in to reply to this topic.