Our AI app gets a lot of repeated questions, can we cache the answers?

Question

Accepted Answer

1. Detect cacheable questions: a hash over (normalised prompt, model, parameters). A chatbot getting 'what are your opening hours' ten times a day is a perfect cache hit.
2. Store in Redis, Vercel KV or a small Postgres table with TTL. For factual questions like opening hours or policy, TTL of hours or days is fine, for personal data always none or very short TTL.
3. Enable prompt caching at the API level where available (Anthropic prompt caching, OpenAI prompt caching). The same system prompt or context block becomes cheaper on follow-up requests.
4. Monitor hit rate. A good cache hit rate for FAQ-style questions often sits above 30 percent. Below 5 percent your key strategy is probably too strict, fix the normalisation.
5. Never cache blindly for multi-user inputs with PII or customer context. Include user-id or context-id in the key to avoid leaks between users.

When to bring us in: 
Want us to add the cache layer to your AI app and measure the saving, we can do it in a day.

Our AI app gets a lot of repeated questions, can we cache the answers?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely