text-embedding-3-large vs text-embedding-3-small: when to use which?
I'm building a RAG system for a startup and trying to decide between text-embedding-3-small and text-embedding-3-large. The cost difference is 5x.
My use case: semantic search over ~50K product descriptions (average 200 words each).
Questions:
1. Is the quality difference noticeable for short text retrieval?
2. Can I use the dimensions parameter with the large model to reduce storage while keeping quality?
3. Has anyone benchmarked these on product search specifically?
The discrepancy is expected! The API token count includes:
1. Message formatting tokens (each message has overhead: <|im_start|>role\ncontent<|im_end|>)
2. System message overhead
3. Special tokens for function/tool definitions if present
For accurate counting, use the num_tokens_from_messages function from the OpenAI cookbook, which accounts for message formatting.
Found the cookbook function. Token counts now match within 1-2 tokens. The per-message overhead is ~4 tokens each, which adds up quickly in multi-turn conversations.
Log in to reply to this topic.