Chunking Is Product Design (And Most RAG Systems Prove It)
Words you need
- Chunk – A small piece of a document. You cut a long text into chunks so the system can find and use the right piece when someone asks a question.
- Token – Roughly a word or part of a word for the AI. Models read and count text in tokens, not always whole words.
- RAG – Retrieval-Augmented Generation. In plain English: the system fetches the right chunks first, then the AI answers using those chunks instead of guessing from memory.
- Metadata – Extra info you save with each chunk (e.g. title, section name, when it was created). You use it to filter and show where an answer came from.
People treat chunking like "cut the text into same-size pieces."
It's not.
I've seen teams spend weeks on fancy retrieval and reranking, then discover the real bottleneck was how they cut the documents. Chunking comes first. Get it wrong and the rest barely matters.
What you'll learn
- What chunking really is: cutting a long document into smaller pieces so each piece still makes sense on its own. (Not just "500 words each.")
- Why one way of cutting does not work for everything. Articles, how-to guides, code, and short posts need different rules.
- What to save for each piece (title, section, version) so you can fix things later when you change the rules.
- How to check if your chunking works: use real questions and see if the pieces you get back are the right ones.
Chunking is how your product "remembers" things. It decides what "one piece of context" means. It decides what the AI can show as a source. So yeah, chunking is a product decision. If you want the full picture of how RAG fits together (retrieval, reranking, filters), start with the series map and then RAG at inference time.
What chunking is (in plain English)
Chunking means you take one long document and cut it into smaller pieces. Each piece should still make sense by itself. When a user asks a question, the system looks for the best pieces and feeds them to the AI. The AI then answers using those pieces.
If you only cut by "how many words" and ignore where sentences and sections start and end, you get half-ideas. The system might return "step 6" without "step 1". Or half a definition. So chunking is not "text splitting". It's a product decision: what counts as "one piece of context" for your content.
The lie: "just do 500 tokens with overlap"
A lot of tutorials say: cut every 500 tokens and overlap a bit. That exists because it's easy to code, not because it's right.
What goes wrong:
- You cut the definition away from the example. The AI gets one without the other.
- You cut the code away from the explanation. The AI sees code it can't explain.
- You return half a thought. The AI guesses the rest. Often wrong.
Adding overlap (so pieces share some text) does not fix this. It just hides it.
Chunking goals (what you want each piece to be)
Pick one main goal per type of content:
- Articles (blog posts, essays): Keep one section or one idea per piece. Don't cut in the middle of a section.
- Docs and runbooks (how-to guides) – Runbooks are step-by-step guides for fixing or running something. Keep one step or one procedure per piece. Don't give "step 6" without "step 1".
- Code: Keep one function or one block together. Don't cut the function name away from the function body.
- Short posts (tweets, updates): Often the whole post is one piece. Don't cut it into smaller bits; you lose the point.
If you use the same rule for everything, your answers will feel random and generic.
Key takeaway: One main goal per content type. Articles: one section per piece. How-to guides: one step per piece. Code: one function per piece.
What to save for each piece
For every piece (chunk), save at least:
- The text of the piece.
- A stable ID (a name that never changes for this piece).
- Where it came from: document ID, and "title path" (e.g. "Article name > Section name > Subsection").
- What type it is (article, how-to, code, post).
- Which set of chunking rules made it (a version number or rule name).
Why? Later you will change the rules. If you can't tell "this piece was made with the old rules" vs "this piece was made with the new rules", you can't compare. You can't debug.
The "title path" trick
If your content has structure (title, sections, subsections), save that for each piece. For example: "Article title > Section name > Subsection name".
Then when you search, you can:
- Filter by section (e.g. only "Safety" section).
- Show the user where the answer came from (e.g. "From the section 'How to chunk articles'").
- Group pieces by topic.
You don't need to change the AI model for this. Just save the path. Quality goes up because the user sees where the answer came from.
Articles: cut by headings, then by size
For articles and blog posts, do this:
- Cut at headings first. One section = one piece when the section is short enough.
- If a section is very long, cut by paragraphs. Still keep whole paragraphs together.
The goal: the AI gets a full section or a full idea, not a random slice in the middle of a sentence.
How-to guides and runbooks: cut by steps
Runbooks and how-to guides are step-by-step. If you cut by "500 words", you split:
- "Why we do this" away from "How to do it".
- "What can go wrong" away from the step.
Then the AI might give step 6 without step 1. Useless.
Cut by steps instead. One step (or one warning, or one "before you start" block) per piece. Save labels like "step", "warning", "prerequisite" so you can filter later.
Code: cut at function boundaries
Code has structure. There are functions, classes, and blocks. If you cut by "500 words" you might cut:
- The function name away from the function body.
- The "what this does" comment away from the code.
If your tools allow it, cut at function or class boundaries. One function per piece. Keep the comment with the code it describes. If you can't do that, at least cut at blank lines or at { and }. Still better than cutting in the middle of a line.
Short posts: keep them whole, use tags
A tweet or a short update is already one piece. Don't cut it into smaller bits. You'll lose the point.
What helps instead: save extra info (metadata: platform, date, hashtags, who could see it). Then you can search and filter by those. The post itself stays one piece.
How to test if your chunking works
Don't trust a single number or score. Test with real questions.
Do this: write down 20 questions your users would ask. Run your system and get the top 10 pieces it returns for each question. Read those pieces. Ask: "Would I answer this question using only these pieces?"
If the answer is no, fix the way you cut the documents. Don't jump to "train the model more" or "add a fancy reranker". Fix the chunking first.
What to do next
Try this: Pick one type of content (e.g. your blog or your internal docs). Try two ways of cutting: (1) by headings or steps, (2) by fixed size (e.g. every 500 words). For the same 10 real questions, see which set of pieces gives you better answers. Use the winner as your baseline. Then add a "version" or "rule name" to every piece so when you change the rules later, you can compare.
Which kind of content do you have the most of: articles, how-to guides, code, or short posts?
