OpenAI Developer Community

Fine-tuned model performance degradation after 3 epochs

Chris NakamuraJul 25, 2024

I'm fine-tuning GPT-4o-mini on a customer service dataset (5000 examples) and seeing performance peak at epoch 2-3, then degrade significantly.

Training metrics

Epoch 1: Loss 1.23, eval accuracy 78%
Epoch 2: Loss 0.87, eval accuracy 84%
Epoch 3: Loss 0.62, eval accuracy 86%
Epoch 4: Loss 0.41, eval accuracy 79% (degrading!)
Epoch 5: Loss 0.28, eval accuracy 72%

Classic overfitting curve. My questions: 1. Is there a way to set early stopping in the fine-tuning API? 2. Should I increase my dataset size, or is 5000 sufficient? 3. Any recommendations for learning rate multiplier?

4.9k views28 replies67 likesSolved

1 Reply

Victor Huang Jul 29

Hybrid search is definitely the way to go. One addition: we also add a keyword extraction step that pulls out technical terms and searches them separately. This catches cases where BM25 and vector search both miss niche terminology.