Batch API: Processing 1M+ documents - lessons learned
Lucia FernandezMar 18, 2026
I just finished processing 1.2 million legal documents through the Batch API and wanted to share some lessons learned:
What worked
Gotchas
My pipeline
import json
from openai import OpenAIclient = OpenAI()
Create JSONL file
with open("batch_input.jsonl", "w") as f:
for doc in documents:
request = {
"custom_id": doc["id"],
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{...}],
"max_tokens": 1000
}
}
f.write(json.dumps(request) + "\n")
Happy to answer questions about the process!
7.2k views42 replies134 likes
1 Reply
Did you control for prompt complexity? Some of the GPT-4 Turbo advantage on refactoring could be due to the model having more tokens to 'think' rather than better reasoning.
Log in to reply to this topic.