跳转到主要内容

category

If you’re building any meaningful product/feature with LLMs (large language models), you’ll probably use the technique called RAG (retrieval-augmented generation). It can allow you to integrate external data that was not available in the LLM’s training data into the LLM’s text generation process, which can greatly reduce the nightmare of hallucination and improve the relevance of the text responses.

The idea of RAG seems simple enough: find and retrieve the most relevant text chunk and plug it in the original prompt to the LLM, so that the LLM has access to those pieces of reference text and can use them to generate a response. But it can be quite difficult to get a high-quality RAG pipeline that produces the exact results you want in production-ready products.

In this article, let’s explore the techniques to improve RAG results for your LLM apps, from the most basic to more advanced. I’ll also provide some hands-on tips/cases from my own experience building products with RAG in real life.

Basic RAG

Let’s start with the very basic naive RAG for those who are just starting out with RAG. It is a simple three-step process: indexing, retrieval, and generation.

Illustration created by Author. Icons from https://flaticon.com/