File search with vector stores: performance optimization tips
Tom AnderssonJan 20, 2025
I've been working with the Assistants API's file search tool and wanted to share some performance tips after uploading ~500 documents.
Tips
1. Chunk your uploads: Upload files in batches of 20-30. The vector store indexing seems to slow down with large batches.
2. Use specific file names: The file name is used as metadata and affects search relevance.
3. PDF > DOCX: PDFs consistently parse better than DOCX files.
4. Monitor indexing status:
vector_store = client.beta.vector_stores.retrieve(vs_id)
print(f"Status: {vector_store.file_counts}")
file_counts: FileCounts(cancelled=0, completed=487, failed=13, in_progress=0, total=500)
5. Set chunking strategy: For technical docs, smaller chunks (800 tokens) with overlap work better than defaults.
What strategies have worked for you?
4.8k views26 replies73 likes
1 Reply
For file search optimization, I'd also recommend:
Log in to reply to this topic.