How do I split my documents so AI gives good answers?

Question

Accepted Answer

1. Start with structural splitting: by heading or paragraph, not by fixed character count. If you have flat text without headings, split on paragraph boundaries.
2. Aim around 800 tokens per chunk with 100 to 200 tokens of overlap. Overlap prevents facts from landing exactly on the split line and splintering.
3. Add a metadata block per chunk: source file, heading path, date, author. That gives the query layer enough for filters or source citations.
4. Test with questions that fall on chunk boundaries. Lists and tables break most often. Optionally add a second pass that keeps tables as one chunk.
5. Re-chunk when answers feel disjointed or off-target. Chunking is not one-shot: it is the hyperparameter you tweak most often.

When to bring us in: 
Working with PDFs with complex layout, tables or forms, a dedicated ingest tool (Unstructured, LlamaParse) helps. We can look at the right pick.

How do I split my documents so AI gives good answers?

Try this first

When to bring us in

See also

None of the above fits?

Who are you?

Or skip the DIY entirely