James BoothJun 10, 20264 min read
RAG is not training: a plain-English guide for owners
There is one sentence that reliably identifies a vendor who should not be building your AI system: "We'll train ChatGPT on your data."
That sentence is wrong twice. Nobody outside the model labs is training ChatGPT on anything. And the thing the vendor is almost certainly proposing, connecting a model to your documents so it can answer from them, is not training at all. It is retrieval, a fundamentally different mechanism with different costs, different risks, and different answers to the questions an owner actually cares about: where does my data live, and can I get it back out.
If you are evaluating AI proposals, the distinction between retrieval and training is the single highest-leverage piece of technical literacy you can acquire. Here it is in plain English.
Two different problems
A language model can be inadequate for your business in two distinct ways. It can fail to know things: your contracts, your pricing history, your standard operating procedures. Or it can fail to behave the way you need: wrong tone, inconsistent formatting, too slow or too expensive for a high-volume task.
Knowing is a retrieval problem. Behaving is, occasionally, a fine-tuning problem. Vendors who conflate the two either do not understand the difference or hope you don't.
How retrieval actually works
Retrieval-augmented generation, RAG, works like an open-book exam. Your documents are split into passages, each passage is converted into a searchable mathematical representation, and the results are stored in a database. Often that database is just an extension to Postgres called pgvector, running inside infrastructure you already operate, which is exactly how our own knowledge vault works. When someone asks a question, the system finds the most relevant passages and hands them to the model, which answers from what it was just shown and cites where the answer came from.
Note what never happens: the model itself is never modified. Your documents are not absorbed into anyone's weights. Delete a document from the database and the system genuinely no longer knows it. Add one and the knowledge is live in minutes. Ask where the answer came from and the system can point at the passage.
Production retrieval needs more than the demo version. Pure vector search misses exact terms, the SKUs, person names and legal citations your team searches for daily, so real systems combine it with classic keyword search. A reranker then makes a second, more careful pass over the top results, which is usually the cheapest large quality win available. And how the documents are split in the first place is where most of the quality is actually won or lost. These are the unglamorous details that separate a system your team trusts from a chatbot they tried twice.
What fine-tuning is actually for
Fine-tuning modifies a model's behavior by showing it thousands of examples of inputs and ideal outputs. It is a legitimate tool with a narrow purpose: it is a cost, latency and consistency optimization, not a knowledge store. The honest use cases are teaching a consistent tone of voice, making structured outputs reliable, or specializing a small cheap model to replace a large expensive one on a single repetitive task.
What fine-tuning cannot honestly promise is to make a model know your documents. It generalizes patterns; it does not memorize your price list. It cannot cite a source. It cannot forget a document you need deleted, which matters the moment privacy law or a departing client requires deletion. And it goes stale the day after training while your business keeps moving.
The decision guide fits in one sentence: if you need the system to know your information, that is retrieval; if you need it to behave differently, try better prompting first and fine-tune only when prompting demonstrably fails; if you need both, retrieval comes first.
How we run it ourselves
We use this architecture on ourselves before recommending it to anyone. Our agency knowledge vault is a retrieval system on Supabase Postgres with pgvector: company procedures, engineering learnings and project history, searchable by our agents, with a memory layer that persists what they discover across sessions. It runs in its own isolated database instance, fully separated from every client's data, the same isolation rule every client build gets. We have never fine-tuned a model on it, because nothing about the problem called for it.
The three questions that settle it
You do not need to be technical to screen vendors on this. Ask three questions. Where do my documents physically live? What happens when I delete one? Can the system show me the source for an answer it gave?
A retrieval system answers all three cleanly: in a database you can see, the knowledge is gone, and yes with citations. "We trained the AI on your data" answers none of them, and a vendor who cannot answer them has told you everything you need to know about the rest of the build.
The insights are free. If you want to know whether your knowledge problem is a retrieval problem, the audit is free too, and the answer is yours to keep either way.