OpenAI Developer Community

File search with vector stores: performance optimization tips

Tom AnderssonJan 20, 2025

I've been working with the Assistants API's file search tool and wanted to share some performance tips after uploading ~500 documents.

Tips

1. Chunk your uploads: Upload files in batches of 20-30. The vector store indexing seems to slow down with large batches.

2. Use specific file names: The file name is used as metadata and affects search relevance.

3. PDF > DOCX: PDFs consistently parse better than DOCX files.

4. Monitor indexing status:

vector_store = client.beta.vector_stores.retrieve(vs_id)
print(f"Status: {vector_store.file_counts}")
file_counts: FileCounts(cancelled=0, completed=487, failed=13, in_progress=0, total=500)

5. Set chunking strategy: For technical docs, smaller chunks (800 tokens) with overlap work better than defaults.

What strategies have worked for you?

4.8k views26 replies73 likes

1 Reply

Raj Krishnan Jan 22

For file search optimization, I'd also recommend:

Adding metadata to your files (title, category, date) — the search uses this for ranking

Keeping chunks under 800 tokens for technical docs

Running periodic re-indexing if your documents change frequently