Skip to content

My document is bigger than the model context window, now what?

Context windows are larger than before but still finite. Stuffing a 200-page report into one prompt sometimes works but yields slow and unreliable answers. Three patterns fix this: chunk plus retrieval, hierarchical summarisation, or a long-context model for specific tasks.

Try this first

  1. 1First decide the task: are you searching for a fact (RAG fits) or synthesising over the whole document (summarisation fits)? The right pattern depends on the goal.
  2. 2For search: chunk and embed the document like RAG, and pull only relevant pieces. You never search a 200-page report by pasting everything.
  3. 3For synthesis: use hierarchical summarisation. Generate a summary per chapter, then a summary of summaries. The chain costs more tokens but gives coherent results.
  4. 4For specific tasks (legal lookup, contract comparison) long-context models (Claude Sonnet, Gemini Pro with large window) are an option. Mind the cost: one 500K-token prompt can be pricier than a whole day of normal chat.
  5. 5Measure 'lost in the middle': long contexts often forget the middle. Test by placing known facts at different positions and seeing what the model recalls.

When to bring us in

Want us to decide whether RAG, hierarchical summarisation or long-context is right for your case, we can map it out.

See also

None of the above fits?

Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.

Who are you?

For the AI question we need your email and company, so we can follow up if the AI gets stuck, and to prevent abuse.

Limited to 2 questions per hour and 5 per day, kept lean so the AI stays useful. For more, contacting us directly works better for you and us.

Or skip the DIY entirely

Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.