Understanding token counting for prompt caching

Max Petersen
Max PetersenFeb 14, 2026

I'm trying to understand how prompt caching affects my token usage and billing. My system prompt is ~4000 tokens and I'm making thousands of calls per day.

Questions: 1. Does the cached portion count toward my TPM rate limit? 2. Is the 50% discount applied automatically or do I need to opt in? 3. How long does a cached prompt stay in the cache? 4. If I change even one token in the system prompt, does the entire cache invalidate?

I've read the docs but some of these details aren't clear. Would appreciate any clarification from the community or OpenAI staff.

5.4k views27 replies89 likes
1 Reply
Emma Rodriguez

For table extraction, I've had much better results with this prompt structure:

Extract the table data. For each cell, provide the exact text as it appears.
Output as a JSON array of objects where keys are column headers.
If a cell is empty, use null. If text is unclear, use [unclear].

The key insight is telling the model to handle edge cases explicitly.

Log in to reply to this topic.