Home/Blog/ai
aiaiengineering9 min read

RAG isn't a system, it's a 12-stage failure surface

Where retrieval pipelines actually break in production, and the eval harness that catches it before users do.

DK
Daniel Kim
Editor at Skill Trek
MAY 1, 2026
RAG isn't a system, it's a 12-stage failure surface

Every RAG demo works. The chunking is clean, the embeddings are fresh, the LLM says coherent things. Then you put it in front of real users and the retrieval collapses on 30% of queries within the first week.

Where it actually breaks

The failure surface isn't the retrieval step. It's the twelve decisions that happen before and after it: chunk size, overlap, embedding model, index type, similarity threshold, reranking, context window packing, prompt format, output parsing, citation extraction, groundedness verification, and the feedback loop back to retrieval.

rag_index.py
from ragatouille import RAGPretrainedModel

RAG = RAGPretrainedModel.from_pretrained("colbert-ir/colbertv2.0")
RAG.index(collection=docs, index_name="my-kb", max_document_length=180)
results = RAG.search(query="what is contextual retrieval?", k=5)

The fix isn't a better vector store. It's treating each of those twelve stages as a testable unit and building an eval harness that catches regressions before they reach users.

DK

Daniel Kim

Applied ML engineer. Writes about LLMs, RAG, and production AI systems.

More from Daniel Kim