Retrieval-Augmented Generation (RAG) and Why It Matters in LLMs

Retrieval-Augmented Generation (RAG) is the hot topic of LLMs. If you're designing Chatbots, RAG is the solution to accurate and precise conversations. Here's everything you need to know about it.

Mar 04, 2024

You go to ChatGPT and ask “What is the largest company by revenue?” You wait a bit and you get “The largest company by revenue is Apple.” But is it right? Isn’t Microsoft number 1 now with its heavy investment in AI? Could it be Nvidia with its rocketing stock? You ask another question, “Who is the father of Quantum Physics?” — A scientific question. And it answers “Albert Einstein is often considered the father of quantum physics?” Is it thought? What about Max Planck? Or Niels Bohr, Werner Heisenberg, Erwin Schrödinger, and Paul Dirac?

Now this conversation is hypothetical and you may not get this string of questions and answers from ChatGPT, But if you have worked with it you already know what I want to get at. LLMs cannot be trusted with their answers. If they’re not connected to the internet, their answers can be outdated. On the other hand, you cannot tell when they are telling you the right answer and when they are making it up…

LLMs Don’t Know What They’re Talking About

For the most part, this is true, LLMs generate a series of words based on the corpus of data they have been trained on and some fine-tuning in later stages. They don’t benefit from an inherent database of knowledge. If you ask them “What planet has the most number of moons?” and they answer “Jupyter”, it’s not because they have gone on and looked it up in any astronomical sources or Googled it, but they merely answer because, in the giant sweep of the internet that was used to train them, Jupyter is named as the planet with the biggest number of moons!

Now let’s say that tomorrow the scientists look with their telescopes and discover 35 new moons running around Saturn (actually Saturn is the right answer with 146 discovered moons to date). What happens then?

LLMs suffer from a number of shortcomings:

They hallucinate! They will give you beautiful and confident answers that in reality, they’re not so confident about. If you don’t have prior knowledge of a subject, you can be easily fooled by their beautiful wordings and high-spirited answers.
The information is Outdated! Unless we’re talking about something like Grok which is connected to X user data, most LLMs like GPT3.5 are cut off from the internet. They answer based on the data from a few weeks or months ago. If you ask them about a recent discovery or today’s news, they simply don’t have any clue about it.

What if LLMs Had Access to a Knowledge-base?

This is exactly what Retrieval-Augmented Generation (RAG) is about. I was first introduced to this term by this tweet by Santiago (give it a look).

RAG provides some sort of database that when you ask a question from your ChatBot, uses it to give you precise answers (look at the header image again). RAG does what it does in generally 5 steps:

Retrieval: RAG starts by searching through a database or a collection of documents to find information relevant to the input question or prompt. It uses techniques like keyword matching, semantic similarity, or other methods to find the most relevant documents.
Selection: Once it retrieves a set of documents, RAG selects the most relevant ones based on certain criteria (top-k documents). This could involve scoring each document based on its relevance to the input and choosing the top-ranked ones.
Integration: After selecting the relevant documents, RAG integrates the information from these documents into its model. It might preprocess the text, extract key information, or represent the documents in a way that the model can understand.
Generation: With the integrated information, RAG generates an answer or response to the input question or prompt. It uses techniques from language generation models like GPT to produce human-like text based on the integrated information.
Refinement: Finally, RAG might refine its generated response based on feedback or additional context. It could adjust the answer to make it more accurate, coherent, or relevant.

You can guess, by reading the above steps, that there are a variety of fields that go hand in hand to set up RAG:

Knowledge Base and Databases to store the needed information and every nuance in setting them up,
searching algorithms for the right information and documents,
document embedding, and so on.
Thanks for reading QiuByte! Subscribe for free to receive new posts and support my work.

In fact, just the retrieval step is full of details and considerations. This blog post goes into detail on various methods of retrieval.

What Are the Cons?

Like anything else, RAG doesn’t come without some shortcomings. Namely, when you ask “Tell me the recipe for a Persian Omelet”, it will turn your question into an embedding. Then, using for example a semantic similarity search, it will search in the whole vector database and return the top_k chunks of information that are most relevant to your query. Next step, the LLM uses those chunks and your query to form a useful response. You can see that in all of those steps, there is a bit of information that is summarized or lost and this can cause loss of information.

For a better look at a more detailed list of shortcomings of RAG, take a look at this post: Disadvantages of RAG

Why is RAG important?

In the original blog post by the Meta AI team you can already see an answer to this:

RAG’s true strength lies in its flexibility. Changing what a pretrained language model knows entails retraining the entire model with new documents. With RAG, we control what it knows simply by swapping out the documents it uses for knowledge retrieval.

In short, with RAG there is less need to retrain those M(B)ILLIONS of parameters for an up-to-date knowledge base, instead of retraining and fine-tuning LLMs, you update the Knowledge Base. This is especially important concerning the huge costs and carbon footprint of retraining these models.

LLMs are now the Taylor Swift of AI. Popular AF. But to make the most out of them, we must understand what they’re good at and what they fail at. LLMs are most powerful when powered by other tools that enhance their abilities, and one of them is certainly RAG. It helps you build a ChatBot that is an expert in a certain domain, knows the products of a company to help its customers, knows the detailed science of a field to help the experts, etc. It’s a field that is being researched and developed further, and with more breakthroughs, practical AI becomes more and more practical.

🤖If you like this article, make sure to subscribe to my newsletter to get informed about future posts!