Cost analysis: Assistants API vs. building your own RAG pipeline

Robert Chang
Robert ChangJan 12, 2026

I've run both the Assistants API with file search and a custom RAG pipeline (LangChain + Pinecone + GPT-4o) for the same use case: customer support over 200 product docs.

Cost comparison (per 1000 queries)

| Component | Assistants API | Custom RAG | |-----------|---------------|------------| | LLM calls | $12.50 | $8.20 | | File search/embeddings | $2.10 | $0.40 (Pinecone) | | Storage | $0.10/GB/day | $0.08/GB/month | | Dev time | 2 days | 3 weeks |

The Assistants API is ~30% more expensive per query but saved us weeks of development. For our volume (~5K queries/day), the custom pipeline breaks even in about 3 months.

My recommendation: start with Assistants API, migrate to custom RAG once you hit scale.

6.3k views35 replies91 likes
3 Replies
Victor Huang

Great benchmarking! For semantic chunking, what library are you using? I've been experimenting with LlamaIndex's SemanticSplitter and the results are promising.

Raj Krishnan

I used a custom implementation based on sentence-transformers embeddings. The idea is to compute cosine similarity between adjacent sentences and split where similarity drops below a threshold.

LlamaIndex's implementation is similar but more polished. Definitely recommend it for production use.

Tom Andersson

Have you tested with different embedding dimensions? I found that using 256 dims for chunking decisions (cheaper and faster) and 3072 for the final embeddings works well.

Log in to reply to this topic.