DPO fine-tuning now available - first impressions

Dr. Anna Kowalski
Dr. Anna KowalskiMar 10, 2026

OpenAI recently added Direct Preference Optimization (DPO) to the fine-tuning API. I've been testing it for preference alignment and here are my first impressions.

Data format

Instead of (input, output) pairs, you provide (input, chosen, rejected) triples:

{
  "input": [{"role": "user", "content": "Explain quantum computing"}],
  "preferred_output": [{"role": "assistant", "content": "Clear, concise explanation..."}],
  "non_preferred_output": [{"role": "assistant", "content": "Overly verbose, inaccurate..."}]
}

Early results show DPO is particularly effective for:

  • Tone and style alignment
  • Reducing verbosity
  • Following specific formatting preferences
  • Less effective for:

  • Teaching new knowledge
  • Improving factual accuracy
  • 4.5k views24 replies66 likes

    Log in to reply to this topic.