OpenAI Developer Community

I just finished processing 1.2 million legal documents through the Batch API and wanted to share some lessons learned:

What worked

Batch API's 50% cost discount is massive at scale

JSONL format is easy to generate and parse

24-hour completion window was always met (usually 4-6 hours)

Gotchas

Max 50,000 requests per batch — I needed 24 batches

No streaming, so you need to poll for completion

Failed requests don't get retried automatically

My pipeline

import json
from openai import OpenAI
client = OpenAI()
Create JSONL file
with open("batch_input.jsonl", "w") as f:
    for doc in documents:
        request = {
            "custom_id": doc["id"],
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4o-mini",
                "messages": [{...}],
                "max_tokens": 1000
            }
        }
        f.write(json.dumps(request) + "\n")

Happy to answer questions about the process!

Batch API: Processing 1M+ documents - lessons learned

What worked

Gotchas

My pipeline

Create JSONL file