What is Retrieval Augmented Generation (RAG)?
RAG stands for "Reference-based Generative model with Attention", which can also be called "Retrieval-Augmented Generation", is an approach that combines retrieval techniques and generative models, mainly used in natural language processing tasks, such as text generation, dialog systems, machine translation, etc. RAG models retrieve relevant information from external knowledge bases and combine them into generative models. "RAG is an approach that combines retrieval techniques and generative modeling, which is mainly used in natural language processing tasks, such as text generation, dialog systems, machine translation, etc. RAG models generate more accurate and enriched outputs by retrieving relevant information from external knowledge bases and combining it with the input text. This approach improves the accuracy and interpretability of the model because it can explicitly indicate which external knowledge the generated text is relevant to.RAG models are particularly useful when dealing with tasks that require a lot of background knowledge, such as question and answer systems or conversational agents in specialized domains.
The effect achieved by this example
In the process of using the Big Language Model, you will find that the Big Language Model is strong in general knowledge, but if you ask something related to private data, it doesn't know. For example, there is a piece of private text data as shown below:
Xiao-X founded a company called "Xiao-X's World" in 2000, which is headquartered in Wuhan, Hubei Province and has 300 employees. X's favorite programming language is C#, and X's favorite book is The Ordinary World.
This is just a simple example, so the text is taken very short first, and can actually be replaced with some of your private documents, and then let the big language model answer based on your private documents, now if you ask the big language model, "What is the name of the company that Little X founded?" Now if you ask the big language model, "What's the name of the company that X founded?", "What's X's favorite programming language? Now if you ask the big language model, "What is the name of the company that X founded?", "What is X's favorite programming language?", etc., which can only be answered based on the private documentation, the big language model does not know about it, but by using the RAG, you can let the big language model answer questions such as these based on the private documentation.
The idea of implementation is to convert text into vectors by embedding models, store the vectors in a database, and retrieve the most relevant documents or snippets from the knowledge base based on the vector representation of the input query. The relevant snippets obtained, are embedded in Prompt and the big language model is allowed to answer based on the obtained snippets.
Getting Started
Install the required nuget packages:
First start by initializing a Kernel, here I use the big language model of the open source Qwen/Qwen2-7B-Instruct provided by the Silicon Mobility Platform.
private readonly Kernel _kernel;
public SemanticKernelService()
{
var handler = new OpenAIHttpClientHandler();
var builder = ()
.AddOpenAIChatCompletion(
modelId: "Qwen/Qwen2-7B-Instruct",
apiKey: "api key",
httpClient: new HttpClient(handler));
var kernel = ();
_kernel = kernel;
}
As the Silicon Mobility Platform has provided a format compatible with OpenAI, it is only necessary to pass in an HttpClient to forward the request to the Silicon Mobility Platform's api, the OpenAIHttpClientHandler class is shown below:
public class OpenAIHttpClientHandler : HttpClientHandler
{
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
UriBuilder uriBuilder;
switch (?.LocalPath)
{
case "/v1/chat/completions":
uriBuilder = new UriBuilder()
{
// Here's what you want to change. URL
Scheme = "https",
Host = "",
Path = "v1/chat/completions",
};
= ;
break;
case "/v1/embeddings":
uriBuilder = new UriBuilder()
{
// Here's what you want to change. URL
Scheme = "https",
Host = "",
Path = "v1/embeddings",
};
= ;
break;
}
HttpResponseMessage response = await (request, cancellationToken);
return response;
}
}
Now the text needs to be converted into a vector and an ISemanticTextMemory needs to be constructed first:
Now let's see how to build an ISemanticTextMemory first:
public async Task<ISemanticTextMemory> GetTextMemory2()
{
var memoryBuilder = new MemoryBuilder();
("text-embedding-ada-002", "api key");
IMemoryStore memoryStore = await ("");
(memoryStore);
var textMemory = ();
return textMemory;
}
First need to have an embedding model, here used OpenAI's text-embedding-ada-002 model, also tried to use the embedding model provided by the silicon-based mobility platform, the generation of vectors is no problem, but in the search of the time will report an error, has not been resolved.
Use SQLite to store the generated vectors.
var lines = (input, 100);
var paragraphs = (lines, 1000);
foreach (var para in paragraphs)
{
await (index, id: ().ToString(), text: para, cancellationToken: default);
}
Segment the text, this example text content is very small, only one paragraph.
View the database:
The vector data has been deposited into the database.
Now search for the most relevant snippet based on the question:
Take the question "What is Little X's favorite programming language?" This question is an example.
Transform the problem into vectors and search using cosine similarity to search for the most relevant segments:
The most relevant text and questions obtained are embedded in the Prompt for the large language model to answer:
Large language modeling of response results:
The above implements a simple RAG application based on SemanticKernel.
Where to explore next
Although my computer does not work well to run the big language model locally, there are still many scenarios that need to run the big language model locally. The next step is to explore how to use the local big language model with the embedded model in SemanticKernel. If the big language model doesn't work well, I'll switch to a domestic platform. I've tried the embedded model, and it works fine locally.
Local runs use Ollama, and there are official plans to release an Ollama Connector:
After checking some information on the internet, some big brothers have already implemented using the dialog model and embedding model in Ollama in SemanticKernel. You can wait for the official support, or you can practice it yourself based on what the big guys have shared.
Local Memory: C# Semantic Kernel, Ollama and SQLite to manage Chat Memories locally | by John Kane | Medium
Using local LLM with Ollama and Semantic Kernel - Learnings in IT ()
Use Custom and Local AI Models with the Semantic Kernel SDK for .NET | Microsoft Learn
consultation
1、/microsoft/semantic-kernel/blob/main/dotnet/notebooks/
2、/microsoft/semantic-kernel/blob/main/dotnet/notebooks/
3、/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Memory/MemoryStore_CustomReadOnly.cs
4、/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Memory/SemanticTextMemory_Building.cs
5、/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/Memory/