Skip to content

How do I split my documents so AI gives good answers?

Too-large chunks add context noise and waste tokens. Too-small chunks lose coherence so the answer gets stitched from fragments. Rule of thumb: chunks around 500 to 1000 tokens with 10 to 20 percent overlap, and chunks that respect structure (paragraphs, headings).

Try this first

  1. 1Start with structural splitting: by heading or paragraph, not by fixed character count. If you have flat text without headings, split on paragraph boundaries.
  2. 2Aim around 800 tokens per chunk with 100 to 200 tokens of overlap. Overlap prevents facts from landing exactly on the split line and splintering.
  3. 3Add a metadata block per chunk: source file, heading path, date, author. That gives the query layer enough for filters or source citations.
  4. 4Test with questions that fall on chunk boundaries. Lists and tables break most often. Optionally add a second pass that keeps tables as one chunk.
  5. 5Re-chunk when answers feel disjointed or off-target. Chunking is not one-shot: it is the hyperparameter you tweak most often.

When to bring us in

Working with PDFs with complex layout, tables or forms, a dedicated ingest tool (Unstructured, LlamaParse) helps. We can look at the right pick.

See also

None of the above fits?

Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.

Who are you?

For the AI question we need your email and company, so we can follow up if the AI gets stuck, and to prevent abuse.

Limited to 2 questions per hour and 5 per day, kept lean so the AI stays useful. For more, contacting us directly works better for you and us.

Or skip the DIY entirely

Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.