Chunking is the most-skipped optimization in production RAG. Teams pick LangChain's default (1000 chars, 200 overlap), get acceptable demo results, and never revisit. Then production retrieval precision flatlines at 0.6-0.7 when 0.85+ is achievable with proper chunking. After tuning chunking on 15+ production RAG systems at NKKTech, here are the four strategies that matter, the tradeoffs, and the chunk size + overlap parameters that actually move retrieval precision.
Why Chunking Determines Retrieval Quality
Retrieval precision (the fraction of top-k chunks that are actually relevant) caps everything downstream in RAG. If precision is 0.6, the LLM has 4 irrelevant chunks out of 10 to ignore — it usually does, but sometimes incorporates them and you get a wrong answer. If precision is 0.85, the LLM has 1-2 irrelevant chunks, almost always ignores them, and the answer is reliably grounded. The single biggest lever on precision (after embedding model choice) is chunking. Bad chunking — chunks that split a logical unit in the middle, or merge unrelated topics — makes the embedding less semantically pure, which makes retrieval less precise. Good chunking respects the document's logical structure, keeps semantic units intact, and includes enough overlap that retrieval near a boundary still surfaces the right chunk.
Strategy 1: Fixed-Size Chunking (Naive Baseline)
Split the document into N-character chunks, with K-character overlap between adjacent chunks. The default in LangChain (1000 chars, 200 overlap) and most quickstarts. Works for: prototyping when you don't yet know your document structure, dense narrative text without strong logical boundaries (long-form articles, novels). Breaks for: anything with structured content — code, tables, lists, headers, FAQs. Fixed-size will happily split a Q&A pair across two chunks, leaving the question in chunk A and the answer in chunk B. Retrieval finds the question, returns chunk A, the LLM has no answer. We see this failure mode on every fixed-size deployment we audit. Don't use fixed-size in production unless the content is genuinely unstructured prose.
📥 Free Download: Vietnam Offshore Dev Cost Guide 2026
Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.
Ready to build?
NKKTech delivers AI Development projects from $30K.
Fixed scope. Senior Vietnam engineers. 14-day kickoff.
Strategy 2: Semantic Chunking
Split the document into chunks where each chunk is semantically coherent — computed by embedding successive sentences and detecting boundaries where the cosine similarity drops below a threshold. Tools: LangChain's SemanticChunker, custom implementations. Works for: long-form articles, blog posts, transcripts where you want each chunk to be 'a topic'. Breaks for: structured content (the semantic similarity heuristic doesn't know what a markdown table is), short documents where there's only one topic, content where the semantic boundaries are too fine-grained (every paragraph becomes a chunk). Cost: semantic chunking requires embedding every sentence at chunking time, which is ~5-10x more expensive than fixed-size for one-time indexing. Not a problem for static knowledge bases; problematic for high-churn content.
Strategy 3: Recursive Character Splitting
Split on a hierarchy of separators: try double-newline first, then single-newline, then sentence-end punctuation, then word boundary, only fall back to character-position if nothing else works. LangChain's RecursiveCharacterTextSplitter is the default implementation. Works well for: most prose + lightly-structured content (markdown with headers, plain text with paragraphs, basic HTML). Better than fixed-size because it respects natural boundaries (paragraphs, sentences). Breaks for: heavily structured content (code, tables, JSON) where you need format-aware chunking; multi-language documents where sentence-end detection is unreliable (Chinese, Japanese). Our default for most B2B knowledge bases — it's a good balance of quality and implementation simplicity.
Strategy 4: Hybrid (What We Actually Ship)
Production RAG systems we ship usually combine multiple strategies based on content type: format-aware splitter for code/tables (treat each function or row as its own chunk), recursive character splitter for prose, header-aware splitter for markdown (keep each section together, prepend section title to each chunk for context). Plus: parent-child chunking — store small chunks for retrieval but return the larger parent chunk to the LLM, so retrieval is precise but the LLM has surrounding context. LlamaIndex and LangChain both support this via 'small-to-big' retrieval. The exact mix depends on your corpus: a documentation site might be 70% markdown header-aware + 20% code-block-aware + 10% recursive prose. We always audit a client's corpus before deciding the chunking strategy; one size fits no one.
Picking Chunk Size + Overlap
Chunk size: 200-400 tokens is the sweet spot for most B2B knowledge bases. Smaller (100-200) gives better retrieval precision but worse LLM context (the LLM sees fragments). Larger (500-1000) gives better LLM context but worse retrieval precision (the chunk has multiple topics, embedding is less semantically pure). 200-400 balances both. Overlap: 10-20% of chunk size, typically 30-80 tokens. Higher overlap reduces 'boundary loss' (when the query matches content that's split exactly at a chunk boundary) but increases storage cost. We default to 15% overlap. For the broader RAG architectural picture — vector database choice, eval methodology, embedding model selection — see our RAG Implementation Playbook for 2026. Chunking is one of seven decisions covered there.
📥 Free Download: Vietnam Offshore Dev Cost Guide 2026
Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.
Ready to build?
NKKTech delivers AI Development projects from $30K.
Fixed scope. Senior Vietnam engineers. 14-day kickoff.

10+ years building AI systems for Toyota, Sony, and Rakuten in Japan. Founded NKKTech in 2018 with a senior-only engineering model.
Want to build this with NKKTech?
Auditing a RAG system with mediocre precision? Book a free 30-minute review with a NKKTech RAG engineer. We'll look at your chunking + retrieval pipeline and suggest the highest-ROI changes.
Book a Free Call