My document is bigger than the model context window, now what?

Question

Accepted Answer

1. First decide the task: are you searching for a fact (RAG fits) or synthesising over the whole document (summarisation fits)? The right pattern depends on the goal.
2. For search: chunk and embed the document like RAG, and pull only relevant pieces. You never search a 200-page report by pasting everything.
3. For synthesis: use hierarchical summarisation. Generate a summary per chapter, then a summary of summaries. The chain costs more tokens but gives coherent results.
4. For specific tasks (legal lookup, contract comparison) long-context models (Claude Sonnet, Gemini Pro with large window) are an option. Mind the cost: one 500K-token prompt can be pricier than a whole day of normal chat.
5. Measure 'lost in the middle': long contexts often forget the middle. Test by placing known facts at different positions and seeing what the model recalls.

When to bring us in: 
Want us to decide whether RAG, hierarchical summarisation or long-context is right for your case, we can map it out.

My document is bigger than the model context window, now what?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely