Fine-tuned model performance degradation after 3 epochs
Chris NakamuraJul 25, 2024
I'm fine-tuning GPT-4o-mini on a customer service dataset (5000 examples) and seeing performance peak at epoch 2-3, then degrade significantly.
Training metrics
- Epoch 1: Loss 1.23, eval accuracy 78%
- Epoch 2: Loss 0.87, eval accuracy 84%
- Epoch 3: Loss 0.62, eval accuracy 86%
- Epoch 4: Loss 0.41, eval accuracy 79% (degrading!)
- Epoch 5: Loss 0.28, eval accuracy 72%
Classic overfitting curve. My questions: 1. Is there a way to set early stopping in the fine-tuning API? 2. Should I increase my dataset size, or is 5000 sufficient? 3. Any recommendations for learning rate multiplier?
4.9k views28 replies67 likesSolved
1 Reply
Hybrid search is definitely the way to go. One addition: we also add a keyword extraction step that pulls out technical terms and searches them separately. This catches cases where BM25 and vector search both miss niche terminology.
Log in to reply to this topic.