File search with vector stores: performance optimization tips

Tom Andersson
Tom AnderssonJan 20, 2025

I've been working with the Assistants API's file search tool and wanted to share some performance tips after uploading ~500 documents.

Tips

1. Chunk your uploads: Upload files in batches of 20-30. The vector store indexing seems to slow down with large batches.

2. Use specific file names: The file name is used as metadata and affects search relevance.

3. PDF > DOCX: PDFs consistently parse better than DOCX files.

4. Monitor indexing status:

vector_store = client.beta.vector_stores.retrieve(vs_id)
print(f"Status: {vector_store.file_counts}")

file_counts: FileCounts(cancelled=0, completed=487, failed=13, in_progress=0, total=500)

5. Set chunking strategy: For technical docs, smaller chunks (800 tokens) with overlap work better than defaults.

What strategies have worked for you?

4.8k views26 replies73 likes
1 Reply
Raj Krishnan

For file search optimization, I'd also recommend:

  • Adding metadata to your files (title, category, date) — the search uses this for ranking
  • Keeping chunks under 800 tokens for technical docs
  • Running periodic re-indexing if your documents change frequently
  • Log in to reply to this topic.