Skip to content

My AI bill is creeping up, how do I keep it under control?

LLM steps in workflows look cheap until volume grows. Token usage is roughly linear in input plus output, and some flows call the model multiple times per record. Cost control starts with measuring.

Try this first

  1. 1Log input tokens, output tokens, and model per LLM step. Dashboard daily totals and per-flow cost.
  2. 2Set a hard budget alert at the provider (OpenAI, Anthropic) at day or month level. An 'oops, forgot to cap it' won't refund.
  3. 3Pick the right model per task: classification often runs on a small/fast model (Haiku, GPT-4o-mini), reasoning needs a larger model. Mix them.
  4. 4Cache identical queries: if 5 records produce the same prompt, store the answer keyed on a prompt hash and skip the second call.
  5. 5Trim the prompt: long system prompts and irrelevant context don't shrink tokens, they grow them. Remove what doesn't contribute.

When to bring us in

If your AI bill spans flows and you can't tell which flow consumes what, we can set up the attribution layer.

See also

None of the above fits?

Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.

Who are you?

For the AI question we need your email and company, so we can follow up if the AI gets stuck, and to prevent abuse.

Limited to 2 questions per hour and 5 per day, kept lean so the AI stays useful. For more, contacting us directly works better for you and us.

Or skip the DIY entirely

Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.