Skip to content

Our AI app gets a lot of repeated questions, can we cache the answers?

Yes, and it is usually the biggest cost saving in a live AI app. Two layers help: prompt caching on the vendor side (Anthropic, OpenAI cache input tokens) and application cache on your side (same question reuses the answer). Together they can mean a 2 to 10 times saving.

Try this first

  1. 1Detect cacheable questions: a hash over (normalised prompt, model, parameters). A chatbot getting 'what are your opening hours' ten times a day is a perfect cache hit.
  2. 2Store in Redis, Vercel KV or a small Postgres table with TTL. For factual questions like opening hours or policy, TTL of hours or days is fine, for personal data always none or very short TTL.
  3. 3Enable prompt caching at the API level where available (Anthropic prompt caching, OpenAI prompt caching). The same system prompt or context block becomes cheaper on follow-up requests.
  4. 4Monitor hit rate. A good cache hit rate for FAQ-style questions often sits above 30 percent. Below 5 percent your key strategy is probably too strict, fix the normalisation.
  5. 5Never cache blindly for multi-user inputs with PII or customer context. Include user-id or context-id in the key to avoid leaks between users.

When to bring us in

Want us to add the cache layer to your AI app and measure the saving, we can do it in a day.

See also

None of the above fits?

Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.

Who are you?

For the AI question we need your email and company, so we can follow up if the AI gets stuck, and to prevent abuse.

Limited to 2 questions per hour and 5 per day, kept lean so the AI stays useful. For more, contacting us directly works better for you and us.

Or skip the DIY entirely

Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.