Breaking Down RAG Systems
Hey there! Everyone's actively diving into the LLM transformation wave these days! From company leaders wanting to boost efficiency, to teachers using it for research, to everyday folks seizing opportunities for financial freedom—I'm one of them, transitioning from traditional development to LLMs. After all, traditional software is officially micro, while LLM development is on the rise!
So, I want to share the insights and experiences from my learning process, hoping to help you all out. Today's first post: let's talk about breaking down RAG systems.
What Do Typical RAG System Products Look Like?
Right now, typical products like Cursor's RAG system work like this: after you upload files, it automatically vectorizes them and stores them in a vector database; when you ask questions, Cursor directly retrieves relevant information from the database and returns it.
Cursor Codebase Index Interface

Another product, Tencent's ima, works similarly: you can organize WeChat Official Account articles into a knowledge base, and when you ask questions, iMA automatically retrieves and answers.
Tencent ima

How to Break Down RAG Systems? Core Modules Revealed
Simply put, RAG is like a building block process, assembling these five pieces:
- Vectorization module: converts text to vectors
- Document loading and splitting module: processes original files
- Vector database: stores vectorized content
- Vector retrieval module: similarity-based search
- LLM module: completes Q&A
The overall workflow is clear from the diagram below:
RAG System Example Diagram

This diagram compares pure LLM output vs. RAG output (the difference is obvious!), visually demonstrating the capability upgrade.
Friends Often Ask: What's the Difference Between RAG and "Web Search"?
At this point, many people wonder: what's the difference between this RAG and the "Web Search" feature in Deepseek?
Deepseek Web Search Feature

Great question! They're completely different:
- RAG is based on a private vector database, with retrieval relying on vector similarity.
- Web search? The LLM calls search engines to query the web in real-time.
The key is that search engine algorithms are much more complex, with "reranking" as a killer technique—bringing the most useful results to the front, not just the "most relevant." Simple RAG relies only on similarity; without reranking, answer quality might be far worse than web search. This is RAG's challenge!
So, RAG's actual survival space isn't that large. Many products boast "exclusive vertical knowledge bases," but think about it: search engines (like Google) are so mature that as long as your knowledge base is public, Google can give you the most vertical, reliable extracted results. RAG's truly meaningful place is only in private knowledge bases—provided they're unique and original enough, like highly original code or distinctive materials. Otherwise, web search might actually be better.
This highlights Cursor's brilliance: its codebase index feature specifically targets private projects. Codebases are both private and complex (projects iterated over many years are far more difficult than open-source projects), making it a benchmark for RAG implementation!
Quick Summary
The core of RAG systems is to buff (enhance) the LLM's capabilities, with challenges in corpus collection and retrieval quality. When building one, we must clarify our purpose: don't do RAG for RAG's sake, but use it to solve problems. Today we covered the conceptual framework; next time, let's use the tinyRAG project to build a complete system in practice!