Training data format for function calling fine-tuning

Priya Sharma
Priya SharmaFeb 28, 2025

I'm trying to fine-tune a model to be better at function calling for my specific use case but I'm struggling with the training data format.

The docs show:

{
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What's the weather in NYC?"},
    {"role": "assistant", "tool_calls": [{
      "id": "call_1",
      "type": "function",
      "function": {"name": "get_weather", "arguments": "{\"location\": \"NYC\"}"}
    }]},
    {"role": "tool", "tool_call_id": "call_1", "content": "72F, sunny"},
    {"role": "assistant", "content": "It's 72°F and sunny in NYC."}
  ],
  "tools": [{"type": "function", "function": {...}}]
}

But when I upload this, I get a validation error. What am I missing?

3.6k views16 replies44 likesSolved
1 Reply
Aisha Mohammed
Aisha MohammedAccepted AnswerMar 2

For my use case (product descriptions), text-embedding-3-small was actually sufficient. The quality difference between small and large is negligible for short texts (<200 words). Saved us 5x on embedding costs.

Log in to reply to this topic.