RAG Chunking Strategies: Fixed, Semantic, Recursive, Hybrid (2026)

Tony Nguyen

CEO & Founder, NKKTech Global · LinkedIn

Chunking is the most-skipped optimization in production RAG. Teams pick LangChain's default (1000 chars, 200 overlap), get acceptable demo results, and never revisit. Then production retrieval precision flatlines at 0.6-0.7 when 0.85+ is achievable with proper chunking. After tuning chunking on 15+ production RAG systems at NKKTech, here are the four strategies that matter, the tradeoffs, and the chunk size + overlap parameters that actually move retrieval precision.

Why Chunking Determines Retrieval Quality

Retrieval precision (the fraction of top-k chunks that are actually relevant) caps everything downstream in RAG. If precision is 0.6, the LLM has 4 irrelevant chunks out of 10 to ignore — it usually does, but sometimes incorporates them and you get a wrong answer. If precision is 0.85, the LLM has 1-2 irrelevant chunks, almost always ignores them, and the answer is reliably grounded. The single biggest lever on precision (after embedding model choice) is chunking. Bad chunking — chunks that split a logical unit in the middle, or merge unrelated topics — makes the embedding less semantically pure, which makes retrieval less precise. Good chunking respects the document's logical structure, keeps semantic units intact, and includes enough overlap that retrieval near a boundary still surfaces the right chunk.

Strategy 1: Fixed-Size Chunking (Naive Baseline)

Split the document into N-character chunks, with K-character overlap between adjacent chunks. The default in LangChain (1000 chars, 200 overlap) and most quickstarts. Works for: prototyping when you don't yet know your document structure, dense narrative text without strong logical boundaries (long-form articles, novels). Breaks for: anything with structured content — code, tables, lists, headers, FAQs. Fixed-size will happily split a Q&A pair across two chunks, leaving the question in chunk A and the answer in chunk B. Retrieval finds the question, returns chunk A, the LLM has no answer. We see this failure mode on every fixed-size deployment we audit. Don't use fixed-size in production unless the content is genuinely unstructured prose.

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.

Download Free Guide

Ready to build?

NKKTech delivers AI Development projects from $30K.

Fixed scope. Senior Vietnam engineers. 14-day kickoff.

Get a Fixed AI Development Proposal See AI Development case studies

Strategy 2: Semantic Chunking

Split the document into chunks where each chunk is semantically coherent — computed by embedding successive sentences and detecting boundaries where the cosine similarity drops below a threshold. Tools: LangChain's SemanticChunker, custom implementations. Works for: long-form articles, blog posts, transcripts where you want each chunk to be 'a topic'. Breaks for: structured content (the semantic similarity heuristic doesn't know what a markdown table is), short documents where there's only one topic, content where the semantic boundaries are too fine-grained (every paragraph becomes a chunk). Cost: semantic chunking requires embedding every sentence at chunking time, which is ~5-10x more expensive than fixed-size for one-time indexing. Not a problem for static knowledge bases; problematic for high-churn content.

Strategy 3: Recursive Character Splitting

Split on a hierarchy of separators: try double-newline first, then single-newline, then sentence-end punctuation, then word boundary, only fall back to character-position if nothing else works. LangChain's RecursiveCharacterTextSplitter is the default implementation. Works well for: most prose + lightly-structured content (markdown with headers, plain text with paragraphs, basic HTML). Better than fixed-size because it respects natural boundaries (paragraphs, sentences). Breaks for: heavily structured content (code, tables, JSON) where you need format-aware chunking; multi-language documents where sentence-end detection is unreliable (Chinese, Japanese). Our default for most B2B knowledge bases — it's a good balance of quality and implementation simplicity.

Strategy 4: Hybrid (What We Actually Ship)

Production RAG systems we ship usually combine multiple strategies based on content type: format-aware splitter for code/tables (treat each function or row as its own chunk), recursive character splitter for prose, header-aware splitter for markdown (keep each section together, prepend section title to each chunk for context). Plus: parent-child chunking — store small chunks for retrieval but return the larger parent chunk to the LLM, so retrieval is precise but the LLM has surrounding context. LlamaIndex and LangChain both support this via 'small-to-big' retrieval. The exact mix depends on your corpus: a documentation site might be 70% markdown header-aware + 20% code-block-aware + 10% recursive prose. We always audit a client's corpus before deciding the chunking strategy; one size fits no one.

Picking Chunk Size + Overlap

Chunk size: 200-400 tokens is the sweet spot for most B2B knowledge bases. Smaller (100-200) gives better retrieval precision but worse LLM context (the LLM sees fragments). Larger (500-1000) gives better LLM context but worse retrieval precision (the chunk has multiple topics, embedding is less semantically pure). 200-400 balances both. Overlap: 10-20% of chunk size, typically 30-80 tokens. Higher overlap reduces 'boundary loss' (when the query matches content that's split exactly at a chunk boundary) but increases storage cost. We default to 15% overlap. For the broader RAG architectural picture — vector database choice, eval methodology, embedding model selection — see our RAG Implementation Playbook for 2026. Chunking is one of seven decisions covered there.

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

Real developer rates, project cost breakdowns, and a budget planning template. Used by 200+ startup founders.

Download Free Guide

Ready to build?

NKKTech delivers AI Development projects from $30K.

Fixed scope. Senior Vietnam engineers. 14-day kickoff.

Get a Fixed AI Development Proposal See AI Development case studies

Tony Nguyen

CEO & Founder, NKKTech Global

10+ years building AI systems for Toyota, Sony, and Rakuten in Japan. Founded NKKTech in 2018 with a senior-only engineering model.

AI DevelopmentLLM SystemsOffshore EngineeringEnterprise AI

Connect on LinkedIn →

Đọc bài hướng dẫn pillar

RAG Implementation Playbook: From PoC to Production in 2026

Production RAG isn't a notebook with LangChain and Pinecone. Deep technical playbook covering chunking, embeddings, vector database choice, hybrid retrieval, generation layer, evaluation, operations, and cost — based on 20+ production RAG deployments by NKKTech.

20 min · pillar guide

Thêm trong pillar này

🔍

Want to build this with NKKTech?

Auditing a RAG system with mediocre precision? Book a free 30-minute review with a NKKTech RAG engineer. We'll look at your chunking + retrieval pipeline and suggest the highest-ROI changes.

Book a Free Call

RAG Chunking Strategies: Fixed, Semantic, Recursive, Hybrid (2026)

Tony Nguyen

CEO & Founder, NKKTech Global · LinkedIn

Why Chunking Determines Retrieval Quality

Strategy 1: Fixed-Size Chunking (Naive Baseline)

Strategy 2: Semantic Chunking

Strategy 3: Recursive Character Splitting

Strategy 4: Hybrid (What We Actually Ship)

Picking Chunk Size + Overlap

RAG Chunking Strategies: Fixed, Semantic, Recursive, Hybrid (2026)

Why Chunking Determines Retrieval Quality

Strategy 1: Fixed-Size Chunking (Naive Baseline)

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

NKKTech delivers AI Development projects from $30K.

Strategy 2: Semantic Chunking

Strategy 3: Recursive Character Splitting

Strategy 4: Hybrid (What We Actually Ship)

Picking Chunk Size + Overlap

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

NKKTech delivers AI Development projects from $30K.

RAG Implementation Playbook: From PoC to Production in 2026

Hybrid Retrieval: When Pure Semantic Search Fails (and How to Fix It)

Vector Database Comparison 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

RAG Evaluation Metrics Explained: Precision, Faithfulness, Answer Relevance

Want to build this with NKKTech?

Keep Reading

Enterprise Custom Software Development Company Singapore

The Strategic Blueprint for AI Engineering Best Practices 2026

2026 Guide: Hiring Vietnam Software Engineers

Turn These Insights Into Results

Ready to Start Building?

RAG Chunking Strategies: Fixed, Semantic, Recursive, Hybrid (2026)

Why Chunking Determines Retrieval Quality

Strategy 1: Fixed-Size Chunking (Naive Baseline)

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

NKKTech delivers AI Development projects from $30K.

Strategy 2: Semantic Chunking

Strategy 3: Recursive Character Splitting

Strategy 4: Hybrid (What We Actually Ship)

Picking Chunk Size + Overlap

📥 Free Download: Vietnam Offshore Dev Cost Guide 2026

NKKTech delivers AI Development projects from $30K.

RAG Implementation Playbook: From PoC to Production in 2026

Hybrid Retrieval: When Pure Semantic Search Fails (and How to Fix It)

Vector Database Comparison 2026: Pinecone vs Weaviate vs pgvector vs Qdrant

RAG Evaluation Metrics Explained: Precision, Faithfulness, Answer Relevance

Want to build this with NKKTech?

Keep Reading

Enterprise Custom Software Development Company Singapore

The Strategic Blueprint for AI Engineering Best Practices 2026

2026 Guide: Hiring Vietnam Software Engineers

Turn These Insights Into Results

Ready to Start Building?