Chunking strategies for RAG determine where your source documents are split, how those units are embedded, and what your retrieval system actually surfaces at query time. The choice matters — but most teams discover that chunk size is not the binding constraint. Production RAG failures usually trace back to stale, ungoverned, or semantically thin source data rather than suboptimal boundary placement.
This guide profiles all 9 major chunking strategies, from the simplest fixed-size baseline to metadata-enriched retrieval units with ownership, freshness, lineage, and policy context attached. Each strategy includes an honest account of when to use it, what it costs, and where it breaks down.
What you’ll find here:
- An At A Glance comparison table for all 9 strategies
- Per-strategy profiles: how it works, when to use it, pros, cons, and enterprise considerations
- A decision framework for choosing the right strategy for your use case
- The upstream factor that bounds every strategy: governed context
Build Your AI Context Stack
Get the blueprint for implementing context graphs across your enterprise. This guide walks through the four-layer architecture — from metadata foundation to agent orchestration — with practical implementation steps for 2026.
Get the Stack GuideChunking strategies at a glance
Permalink to “Chunking strategies at a glance”| Strategy | Best for | Complexity | Enterprise fit |
|---|---|---|---|
| Fixed-size | Prototypes, speed-first indexing | Low | Limited — mid-sentence splits degrade precision |
| Recursive character splitting | General-purpose baseline | Low | Good starting point for most content |
| Document-aware | Structured docs (PDFs, code, markdown) | Medium | High — respects natural document semantics |
| Semantic chunking | Mixed-topic or narrative documents | Medium-High | Good where indexing latency is acceptable |
| Hierarchical parent-child | Production RAG needing precision + context | Medium | High — the de facto production pattern |
| Agentic chunking | High-value corpora where quality trumps cost | High | Selective — use for critical, high-signal documents |
| Late chunking | Long documents with cross-section dependencies | Medium-High | Good for research papers, contracts, manuals |
| Contextual retrieval | Any pipeline needing lower retrieval failure rate | Medium | High — low risk, meaningful accuracy gain |
| Metadata-enriched | Enterprise RAG with governance requirements | High | Highest — the only approach that survives at scale |
1. Fixed-size chunking
Permalink to “1. Fixed-size chunking”How it works
Permalink to “How it works”Fixed-size chunking splits source text at a fixed character or token count — typically 256 to 1024 tokens — with optional overlap (commonly 10–20% of chunk size) to reduce mid-concept cuts. It requires no understanding of document structure or sentence boundaries.
When to use it
Permalink to “When to use it”Use fixed-size chunking for rapid prototyping, to establish a retrieval baseline, or when indexing speed matters more than retrieval precision. It works reasonably well on homogeneous, well-structured text where sentences are short and topics are consistent.
Pros:
- Trivial to implement — available in every RAG framework out of the box
- Deterministic and fast — no embedding compute required at chunking time
- Predictable index size and cost
Cons:
- Splits sentences mid-thought, separating subject from predicate, or context from conclusion
- No awareness of document structure — a table header and its rows may end up in different chunks
- Retrieval precision degrades on heterogeneous documents with varying topic density
Enterprise considerations: Fixed-size chunking is a starting point, not a production strategy. Teams at BBVA describe spending 3–4 months per use case iterating on chunking — the lesson is not that fixed-size is wrong to start with, but that it will require iteration. Set up an evaluation framework before committing to fixed-size in production.
2. Recursive character text splitting
Permalink to “2. Recursive character text splitting”How it works
Permalink to “How it works”Recursive character text splitting works through a priority hierarchy of separators. It tries to split on paragraph breaks first, then sentence breaks, then word boundaries, then individual characters — stopping once chunks fall within the target size. This means it only uses a coarser split when finer splits are insufficient.
When to use it
Permalink to “When to use it”Recursive splitting is the default strategy in LangChain and LlamaIndex because it handles general-purpose text better than fixed-size while adding minimal implementation overhead. Use it as the default for prose documents when you haven’t yet profiled your retrieval failures.
Pros:
- Better semantic coherence than fixed-size — prefers natural sentence boundaries
- Zero extra compute at chunking time
- Available as a drop-in replacement for fixed-size in most frameworks
Cons:
- Still unaware of document-level structure (headings, tables, code blocks)
- The separator hierarchy is configurable but not self-tuning — you need to set it per content type
- Long paragraphs may still produce oversized chunks that get force-split at the word level
Enterprise considerations: Recursive splitting is a reasonable default for unstructured prose. For structured content — product documentation, data dictionaries, compliance manuals — document-aware chunking will outperform it without significantly higher complexity.
3. Document-aware / structure-aware chunking
Permalink to “3. Document-aware / structure-aware chunking”How it works
Permalink to “How it works”Document-aware chunking parses the document’s structural elements before splitting: headers, sections, paragraphs, tables, code blocks, and list items. It uses these structural signals as natural chunk boundaries. A table is kept as a unit; a code block is never split mid-function; an H2 section boundary becomes a reliable split point.
When to use it
Permalink to “When to use it”Use document-aware chunking for structured content types: Markdown documentation, PDFs with detectable structure, HTML pages, Jupyter notebooks, technical manuals, and data dictionaries. It has the highest retrieval effectiveness-to-cost ratio for these formats.
Pros:
- Preserves semantic units that humans naturally recognize as coherent (a table, a section, a code example)
- Retrieval precision is significantly better than fixed-size or recursive splitting on structured documents
- No LLM compute required — parsing is done at the document structure level
Cons:
- Requires document parsers (e.g., Unstructured.io, pdfminer, pandoc) — adds a preprocessing dependency
- Quality depends heavily on source document quality; poorly formatted PDFs produce poor structure detection
- Doesn’t help with unstructured narrative text where structure doesn’t reflect topic shifts
Enterprise considerations: For enterprise data catalogs and knowledge bases, document-aware chunking is the right default. Your data assets are already structured — data dictionaries, lineage graphs, governance policies. Respecting that structure during chunking means your retrieval system surfaces complete, coherent units rather than fragments.
4. Semantic chunking
Permalink to “4. Semantic chunking”How it works
Permalink to “How it works”Semantic chunking uses embedding similarity to find topic boundaries. It embeds groups of sentences, computes the cosine similarity between adjacent groups, and draws a chunk boundary wherever the similarity drops below a threshold. The result: chunks that end and begin at genuine topic shifts rather than at arbitrary character counts.
When to use it
Permalink to “When to use it”Semantic chunking performs best on long narrative documents where topic shifts occur at irregular intervals — research papers, lengthy blog posts, customer call transcripts, and meeting notes. It outperforms character-based methods when document topics vary significantly within a single file.
Pros:
- Chunk boundaries align with semantic meaning, not arbitrary character limits
- Fewer instances of a single concept being split across multiple chunks
- Retrieval precision is higher than fixed-size or recursive methods on heterogeneous documents
Cons:
- Requires embedding every sentence (or sentence window) during indexing — 3–10x slower than fixed-size
- Threshold sensitivity: too high a threshold produces huge chunks; too low produces fragmented micro-chunks
- More expensive to re-index when source documents change
Enterprise considerations: Semantic chunking is valuable but compute-intensive. For large document corpora — millions of pages — the indexing cost may be prohibitive unless you apply it selectively to high-value documents. Hybrid RAG pipelines often combine document-aware chunking for structured content with semantic chunking for narrative content.
Chunking strategies by relative implementation cost (x-axis) and semantic quality of chunk boundaries (y-axis). Metadata-enriched chunking is orthogonal — it adds governance context on top of any strategy.
5. Hierarchical / parent-child chunking
Permalink to “5. Hierarchical / parent-child chunking”How it works
Permalink to “How it works”Hierarchical chunking maintains two levels of chunks for the same document. Small “child” chunks (typically 128–256 tokens) are embedded and indexed for retrieval — they’re precise enough to match specific queries. When a child chunk is retrieved, the retrieval system returns its larger “parent” chunk (512–1024 tokens) to the LLM instead. The parent carries the surrounding context that makes the child’s meaning fully interpretable.
The sentence window variant works similarly: retrieve on small sentence-level chunks but return a window of surrounding sentences (e.g., 3 sentences before and after) during generation.
When to use it
Permalink to “When to use it”Hierarchical chunking is the most widely adopted production pattern in 2025–2026 because it resolves the fundamental precision-context trade-off: small chunks for finding, larger chunks for understanding. Use it for any RAG pipeline where both retrieval precision and generation quality matter.
Pros:
- High retrieval precision from small child chunks
- Rich generation context from parent chunks — avoids the “isolated fragment” hallucination pattern
- The sentence-window variant requires no hierarchical index — simpler to implement
- Consistently ranks at the top of retrieval evaluation benchmarks (ARAGOG, 2024)
Cons:
- Requires maintaining two index layers (child chunks for retrieval, parent chunks or document store for generation)
- More complex to update when source documents change — both levels must be refreshed
- Parent chunk size must be tuned to the content type; wrong sizing defeats the purpose
Enterprise considerations: One of the most common failure modes SE teams observe is that “it always gave the response from the first chunk and ignored the second and third chunks — that’s where hallucinations happen.” Hierarchical chunking directly addresses this by surfacing richer context at generation time. For production deployments, this is the pattern to default to.
6. Agentic chunking
Permalink to “6. Agentic chunking”How it works
Permalink to “How it works”Agentic chunking uses an LLM to read each document and propose chunk boundaries based on semantic logic. The LLM identifies proposition-level units of meaning — coherent claims, procedures, or concepts that stand alone — and draws boundaries accordingly. Unlike any heuristic approach, it applies genuine semantic understanding to the task.
When to use it
Permalink to “When to use it”Agentic chunking is appropriate for high-value, high-stakes corpora where retrieval quality is critical and indexing cost is manageable: legal contracts, regulatory filings, clinical guidelines, product specification documents. It is not practical for bulk indexing of millions of pages.
Pros:
- Highest chunk quality of any method — boundaries reflect actual semantic logic
- Particularly effective on documents where topic density is uneven or where critical information appears in isolated sentences
- Produces fewer hallucinations downstream because chunks are semantically complete units
Cons:
- 10–50x more expensive than fixed-size chunking at indexing time — every document requires LLM inference
- Slow: indexing a large document corpus takes hours, not minutes
- Model-dependent: the quality of chunks depends on the quality of the LLM used for boundary detection
Enterprise considerations: Agentic chunking is a selective tool. Apply it to a curated, certified subset of your knowledge base — the documents whose accuracy most directly impacts business decisions. Combine it with a metadata lakehouse approach that tracks which documents have been agentically chunked, when, and by which model version.
7. Late chunking
Permalink to “7. Late chunking”How it works
Permalink to “How it works”Late chunking reverses the standard pipeline order. Standard chunking: split first, then embed each chunk independently. Late chunking: embed the entire document (or a long passage) using a long-context model first, then split the resulting token embeddings into chunks. Because all token embeddings were computed with full document context, each resulting chunk carries cross-document signals that independent short-chunk embeddings cannot capture.
When to use it
Permalink to “When to use it”Late chunking is most valuable for long documents with strong cross-section dependencies: research papers, legal contracts, technical manuals, and narrative documents where early sections define terminology used in later sections. It’s less beneficial for documents where sections are genuinely independent.
Pros:
- Each chunk’s embedding captures long-range semantic dependencies from the full document
- Better performance on questions that require synthesizing information from non-adjacent sections
- No additional LLM inference required at chunking time — only embedding model inference
Cons:
- Requires a long-context embedding model (e.g., jina-embeddings-v3, nomic-embed-text-v1.5) that can process full documents
- Long-context embedding models are slower and more memory-intensive than standard embedding models
- The benefit diminishes for short documents or documents with independent sections
Enterprise considerations: Late chunking is an emerging technique that pairs well with advanced RAG architectures in organizations with complex, long-form knowledge assets. Evaluate it alongside hierarchical chunking — in practice, the two address different failure modes.
8. Contextual retrieval
Permalink to “8. Contextual retrieval”How it works
Permalink to “How it works”Contextual retrieval, developed by Anthropic in 2024, prepends a short context summary to each chunk before embedding. An LLM reads the full document and the target chunk and generates a 1–2 sentence summary explaining where the chunk fits within the document (e.g., “This chunk is from the ‘Risk Factors’ section of Q3 2024 earnings, discussing supply chain exposure in Southeast Asia”). That summary is prepended to the chunk text before embedding and BM25 indexing.
The result: each chunk’s embedding carries document-level positional context that the raw chunk text lacks. Anthropic’s research found this reduces retrieval failure rates by up to 67%.
When to use it
Permalink to “When to use it”Contextual retrieval is one of the highest ROI chunking improvements available. It works on top of any base chunking strategy — you can apply it to fixed-size, recursive, or document-aware chunks. Use it whenever your RAG system is retrieving the wrong chunks despite having the right documents in the index.
Pros:
- Reduces retrieval failures by up to 67% (Anthropic, 2024)
- Works as a layer on top of any base chunking strategy — no need to redesign the pipeline
- Particularly effective when combined with hybrid BM25 + vector retrieval
- Anthropic offers prompt caching to reduce context prepending cost by ~80%
Cons:
- Requires LLM inference for every chunk during indexing — cost scales with corpus size
- Context summary quality depends on the LLM used; weaker models produce less useful summaries
- Re-indexing when documents update requires regenerating context summaries
Enterprise considerations: At scale, contextual retrieval costs can be significant. A 1,000-document corpus at average 50 chunks per document = 50,000 LLM calls for context generation. Anthropic’s prompt caching reduces this substantially. For teams building on the enterprise context layer, contextual retrieval complements metadata-enriched chunking by adding document-positional context alongside governance metadata.
9. Metadata-enriched chunking
Permalink to “9. Metadata-enriched chunking”How it works
Permalink to “How it works”Metadata-enriched chunking attaches structured metadata to each chunk before indexing. The metadata describes not just what the chunk says but where it came from, who owns it, how fresh it is, and whether it is trusted for AI use. A metadata-enriched retrieval unit carries fields like:
- Owner and steward: who is accountable for this data asset
- Lineage: which upstream sources feed this document
- Freshness score: when it was last verified as current
- Data quality score: pass/fail on the organization’s quality rules
- Classification: PII sensitivity, regulatory scope (HIPAA, SOX, GDPR)
- Policy context: access restrictions, approved use cases
At query time, the retriever filters on metadata before performing vector search. A query from a healthcare application only retrieves chunks with a HIPAA classification and a freshness score above threshold. A query about financial data only retrieves chunks from certified, non-deprecated sources.
An IEEE study found metadata-enriched retrieval achieves 82.5% precision versus 73.3% for content-only retrieval — a 12.5-percentage-point improvement from context alone. Atlan’s research shows that governed metadata improves AI agent SQL accuracy by 38% and overall AI accuracy by up to 5x.
When to use it
Permalink to “When to use it”Metadata-enriched chunking is not an alternative to the other strategies — it’s an additional layer that improves any of them. Apply it in every enterprise RAG deployment where data governance matters, which is to say: every enterprise RAG deployment.
Pros:
- 82.5% retrieval precision vs. 73.3% content-only (IEEE study)
- Enables governed filtering: only certified, compliant, fresh data enters the RAG context
- Directly addresses the root cause of hallucinations from stale or low-quality source data
- Supports audit trails for AI-generated outputs — every retrieved chunk is traceable to its source
- Up to 35% accuracy improvement with context-graph-grounded retrieval (Atlan Context Graph)
Cons:
- Requires an upstream metadata infrastructure — data catalogs, governance policies, quality rules
- Metadata must be maintained actively; stale metadata is worse than no metadata (false confidence)
- Initial setup requires collaboration between ML teams and data governance teams
Enterprise considerations: This is the strategy that separates RAG deployments that survive production from those that don’t. AstraZeneca engineers describe the failure mode directly: “They’re not enriching that metadata with all the semantics and context — that’s when hallucinations happen.” West Health’s architects intuitively understand the answer: “I’m assuming that’s how you are helping curate the RAG chunk data store in a way that makes the chunks more contextual.”
Atlan’s approach is to convert flat chunks into governed retrieval units with ownership, freshness, lineage, and policy context attached — delivered to AI agents at inference time via the Atlan MCP server. This is the difference between RAG architecture that works in a demo and one that works in production.
Inside Atlan AI Labs and The 5x Accuracy Factor
Learn how context engineering drove 5x AI accuracy in real customer systems. Explore real experiments, quantifiable results, and a repeatable playbook for closing the gap between AI demos and production-ready systems.
Download E-BookDecision framework: which chunking strategy for which use case
Permalink to “Decision framework: which chunking strategy for which use case”| Use case | Recommended strategy | Why |
|---|---|---|
| Rapid prototype / proof of concept | Recursive character splitting | Fast, zero extra cost, good enough to validate the use case |
| Structured documentation (markdown, PDFs, code) | Document-aware chunking | Respects natural semantic units; high precision at low cost |
| Unstructured narrative content | Semantic or hierarchical chunking | Topic-shift awareness improves retrieval on heterogeneous text |
| Production RAG needing precision + context | Hierarchical parent-child | The de facto standard; resolves the precision-context trade-off |
| High-value, low-volume corpora | Agentic chunking | Quality justifies the cost when accuracy is business-critical |
| Long documents with cross-section dependencies | Late chunking | Long-range context in embeddings improves synthesis questions |
| Any pipeline with high retrieval failure rate | Contextual retrieval (add as a layer) | 67% fewer retrieval failures; works on top of any base strategy |
| Enterprise production RAG | Metadata-enriched chunking (add to all above) | Governance, freshness, lineage, and policy filtering for every chunk |
| Complex multi-hop queries across large corpora | Context Graph (structured subgraphs) | Atlan’s approach; up to 35% accuracy improvement over flat chunks |
A note on sequencing: these strategies are not mutually exclusive. A mature enterprise RAG pipeline might use document-aware chunking as its base, hierarchical indexing for precision-context balance, contextual retrieval to add document-level context, and metadata-enriched filtering to govern what enters the retrieval scope — all at once. The question is which problems you’re solving in which order.
For teams early in their RAG journey, start with recursive splitting, measure where retrieval fails using a proper evaluation framework, and add complexity only where failure modes justify it. For teams running enterprise RAG platforms at scale, metadata-enriched chunking with governed filtering is not optional — it is the only approach that sustains accuracy as data changes and expands.
Real stories from real customers: Context-enriched retrieval in production
Permalink to “Real stories from real customers: Context-enriched retrieval in production”"We're excited to build the future of AI governance with Atlan. All of the work that we did to get to a shared language at Workday can be leveraged by AI via Atlan's MCP server…as part of Atlan's AI Labs, we're co-building the semantic layer that AI needs with new constructs, like context products."
— Joe DosSantos, VP Enterprise Data & Analytics, Workday
"Atlan is much more than a catalog of catalogs. It's more of a context operating system…Atlan enabled us to easily activate metadata for everything from discovery in the marketplace to AI governance to data quality to an MCP server delivering context to AI models."
— Sridher Arumugham, Chief Data & Analytics Officer, DigiKey
Why chunking alone won’t save your RAG pipeline
Permalink to “Why chunking alone won’t save your RAG pipeline”Chunking strategy matters. But it is not the variable that separates RAG systems that work in production from those that don’t. The real variable is context quality.
Consider what chunking cannot fix: a chunk drawn from a data asset that was deprecated six months ago will retrieve confidently and generate plausibly — and will be wrong. A chunk from a document whose owner left the company will carry no signal about whether it reflects current policy. A chunk that has passed through three data transformations with no lineage tracking cannot be traced when its output is questioned in a compliance audit.
The insight from teams operating RAG at scale is consistent: “Your AI is only as smart as the context you give it.” Optimizing chunk size is a local improvement. Governing the knowledge layer that feeds your retrieval index is a systemic one.
Atlan’s approach addresses this directly. Rather than treating chunks as isolated text units, the Context Graph assembles structured subgraphs with ownership, freshness, lineage, and policy context attached. The Context Engineering Studio lets teams test and improve retrieval behavior against real queries before deploying. The MCP server delivers governed retrieval context to AI agents at inference time — not raw chunks, but retrieval units that carry the governance metadata the agent needs to use them correctly.
The teams that move from 3–4 months per RAG use case to reliable, repeatable deployment share a common pattern: they stopped tuning chunk size and started governing the knowledge layer. Data quality upstream is not a nice-to-have for RAG — it is the upstream constraint on everything downstream.
For organizations ready to move from isolated chunking experiments to production-grade retrieval, the path forward combines the best chunking strategies from this guide with the governance infrastructure that makes those chunks trustworthy at scale.
FAQs
Permalink to “FAQs”- What is chunking in RAG?
Chunking is the process of splitting source documents into smaller units — chunks — before embedding and indexing them for retrieval. Each chunk becomes a discrete retrieval unit. The chunking strategy determines how boundaries are drawn: by character count, sentence boundaries, document structure, semantic similarity, or LLM-proposed logic. Poor chunking causes retrieval failures, hallucinations, and lost context in generation.
- What is the best chunk size for RAG?
There is no universal best chunk size. Smaller chunks (128–256 tokens) improve retrieval precision but lose surrounding context. Larger chunks (512–1024 tokens) preserve more context but reduce precision. Hierarchical parent-child chunking addresses this trade-off by retrieving small precise chunks while surfacing their larger parent context during generation. Most practitioners start at 512 tokens with 10–20% overlap and iterate from there based on evaluation results.
- What is semantic chunking?
Semantic chunking uses embedding similarity to detect topic shifts in the text. Rather than splitting at a fixed character count or sentence boundary, it groups sentences whose embeddings are similar and starts a new chunk when the embedding distance jumps above a threshold. This preserves meaning across chunk boundaries but requires embedding computations during indexing, making it slower and more expensive than character-based methods.
- What is contextual retrieval?
Contextual retrieval is a technique developed by Anthropic in 2024 that prepends a short LLM-generated context summary to each chunk before embedding. The summary explains where the chunk sits within its source document. This reduces retrieval failure rates by up to 67% compared to standard chunking because the embedding carries document-level context rather than just the chunk’s local text.
- What is agentic chunking?
Agentic chunking uses an LLM to propose chunk boundaries based on semantic logic. The LLM reads the document and decides where one coherent unit of meaning ends and another begins. This produces the highest-quality chunks of any method because it applies genuine semantic understanding — not a heuristic. The trade-off is cost: every document requires LLM inference during indexing, making it 10–50x more expensive than fixed-size chunking.
- How does metadata-enriched chunking improve RAG?
Metadata-enriched chunking attaches ownership, lineage, data quality scores, classification, and policy context to each chunk before it is embedded and indexed. The retriever can then filter on metadata at query time — only returning chunks from certified, non-deprecated, compliant sources. An IEEE study found metadata-enriched retrieval achieves 82.5% precision versus 73.3% for content-only retrieval. Atlan research shows governed metadata improves AI agent SQL accuracy by 38%.
- What is late chunking?
Late chunking reverses the standard order of operations. Instead of chunking first and then embedding each chunk independently, late chunking embeds the entire document (or a long passage) first using a long-context model, then splits the resulting token embeddings into chunks. Because the embeddings were computed in full-document context, each chunk’s embedding carries long-range semantic signals that independent short-chunk embeddings miss.
Sources
Permalink to “Sources”-
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval, arXiv 2024
-
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models, jina.ai 2024
-
Metadata-Enhanced RAG for Enterprise Knowledge Retrieval, IEEE 2024
-
Evaluating the Ideal Chunk Size for a RAG System Using LlamaIndex, Medium 2023
-
Agentic Chunking for Document Intelligence, LangChain Blog 2024
-
What Is RAG? How Retrieval-Augmented Generation Works in 2026
