How do I split my documents so AI gives good answers?
Too-large chunks add context noise and waste tokens. Too-small chunks lose coherence so the answer gets stitched from fragments. Rule of thumb: chunks around 500 to 1000 tokens with 10 to 20 percent overlap, and chunks that respect structure (paragraphs, headings).
Try this first
- 1Start with structural splitting: by heading or paragraph, not by fixed character count. If you have flat text without headings, split on paragraph boundaries.
- 2Aim around 800 tokens per chunk with 100 to 200 tokens of overlap. Overlap prevents facts from landing exactly on the split line and splintering.
- 3Add a metadata block per chunk: source file, heading path, date, author. That gives the query layer enough for filters or source citations.
- 4Test with questions that fall on chunk boundaries. Lists and tables break most often. Optionally add a second pass that keeps tables as one chunk.
- 5Re-chunk when answers feel disjointed or off-target. Chunking is not one-shot: it is the hyperparameter you tweak most often.
When to bring us in
Working with PDFs with complex layout, tables or forms, a dedicated ingest tool (Unstructured, LlamaParse) helps. We can look at the right pick.
See also
- Can I paste a customer file or email into ChatGPT?Depends on the account and settings. Free ChatGPT and a Team tenant behave very differently from what most people assume.
- I want a one-page AI policy for my teamA real one-pager beats a thick document nobody reads. Four headers and concrete examples.
- How do I tell if an AI answer is made up?Models sound confident even when they are wrong. A few habits catch most mistakes.
None of the above fits?
Describe your situation below. We pass your input plus the steps you already saw to our AI and return tailored next-step advice. If it's too risky to DIY, we'll say so.
Or skip the DIY entirely
Our Managed IT clients do not look these things up. One point of contact, a fixed monthly price, resolved within working hours.