text-embedding-3-large vs text-embedding-3-small: when to use which?

Aisha MohammedFeb 1, 2026

I'm building a RAG system for a startup and trying to decide between text-embedding-3-small and text-embedding-3-large. The cost difference is 5x.

My use case: semantic search over ~50K product descriptions (average 200 words each).

Questions: 1. Is the quality difference noticeable for short text retrieval? 2. Can I use the dimensions parameter with the large model to reduce storage while keeping quality? 3. Has anyone benchmarked these on product search specifically?

4.2k views21 replies55 likesSolved

2 Replies

Logan K.StaffAccepted AnswerAug 2

The discrepancy is expected! The API token count includes:

1. Message formatting tokens (each message has overhead: <|im_start|>role\ncontent<|im_end|>) 2. System message overhead 3. Special tokens for function/tool definitions if present

For accurate counting, use the num_tokens_from_messages function from the OpenAI cookbook, which accounts for message formatting.

Max Petersen Aug 4

Found the cookbook function. Token counts now match within 1-2 tokens. The per-message overhead is ~4 tokens each, which adds up quickly in multi-turn conversations.