Semantic Kernel-based Retrieval Augmented Generation (RAG) Practices in C#

Introduction to Semantic Kernel

Those who have played with the Large Language Model (LLM) know OpenAI, and then Microsoft Azure also provides OpenAI services: Azure OpenAI, just need to apply to the API Key, you can use these AI services. The way to use it can be to chat with AI directly through an online web page, or you can call the AI's API services to integrate the AI's capabilities into your own applications. However, all of these services are provided online and will require billing based on token, so not only do you need to rely on the internet, but there will be a cost to using them. As a result, there are local big language modeling services like Ollama, and as long as your computer is strong enough, using local big language modeling is a good choice if the application scenario allows it.

Since there are so many AI services to choose from, what should I do if I need to be able to easily interface with different AI services in my application? This is the basic functionality of Semantic Kernel, a framework for developing applications based on large language models, which makes it easier to integrate large language models into your applications.Semantic Kernel can be used to easily generate AI agents and integrate the latest AI models into a C#, Python, or Java codebase. So while it plays a very important role in the .NET AI ecosystem, it is an application development kit that supports multi-programming language cross-platform.

Semantic Kernel mainly contains these core concepts:

Connection: Interact with external AI services and data sources, such as seamless integration of Open AI and Ollama in applications
Plugins: Encapsulate features that can be used by the application, such as enhancements to the cue word functionality to provide more contextual information for larger language models
Planner: Orchestrate execution plans and strategies based on user behavior
Memory: Abstract and simplify the context management of AI applications, such as the storage of text embedding.

A specific introduction to the Semantic Kernel can be found atOfficial Microsoft Documentation。

Walkthrough: Using Microsoft Azure OpenAI Service with Semantic Kernel

Without further ado, let's get straight to the practical exercise. The purpose of this walkthrough is to implement a simple Q&A system using the gpt-4o big language model deployed on Azure.

Microsoft terminates Azure OpenAI services for individual users on October 21, 2024, with continued access for enterprise users. Reference:/roll/2024-10-18/

Deploying Big Language Models in Azure

Once inside, in the left sidebar of theshared resourcesection, selectdeploymentstab, and then in theModel deploymentpage, clickDeployment modelsbutton, in the drop-down menu, select theDeployment base model：

existSelect Modeldialog box, selectgpt-4oand then clickrecognizeButton:

In the pop-up dialog boxDeployment model gpt-4oto give the model a name, and then just click on thedeploymentsbutton, if you wish to make some settings for the model version, security, etc., you can also click thecustomizablebutton to expand the options.

Once deployed successfully, you can see the version of the deployed model and its status in the list on the Model Deployment page:

Click on the name of the newly deployed model to go to the Model Details page, where theendpointPartially, putTarget URIcap (a poem)keysCopy it and use it later. The target URI just needs to copy the hostname part, like this:

Implementing a Q&A Application in C# with Semantic Kernel

First create a console application and add A reference to the NuGet package:

$ dotnet new console --name ChatApp
$ dotnet add package

Then edit the file and add the following code:

using ;using ;using ;using ;using ;using ;using ;using
using ;using ;using ;using ;using ;using ;using ;using
using ;using ;

var apikey = ("azureopenaiapikey")! using ; using ; var apikey = ("azureopenaiapikey")!

// Initialize the Semantic Kernel
var kernel = ()
    .AddAzureOpenAIChatCompletion(
        "gpt-4",
        "",
        apikey)
    .Build();

// Create a conversation completion service as well as a conversation history object to hold the history of the conversation for subsequent use for the larger model
// Provide conversation context information.
var chatCompletionService = <IChatCompletionService>();
var chat = new ChatHistory("You are an AI assistant that helps people find information and answer questions");;
StringBuilder chatResponse = new();

while (true)
{
    ("Please enter a question>> ");

    // Add the user-entered question to the dialog
    (()!) ;

    ();

    // Get the feedback from the big language model and output the result verbatim
    await foreach (var message in
                   (chat))
    {
        // Output the currently fetched result string
        (message).

        // Add the output to a temporary variable
        ();
    }

    ();

    // Add the results of the current answer to the conversation history before moving on to the next Q&A, providing Q&A context for the big language model
    (());

    ();
}

In the code above, you need to configure your API Key and endpoint URI into it, for security, here I use an environment variable to save the API Key, which is then read in by the program. In order for the big language model to understand what has been discussed between me and it during a conversation, in the code, use aStringBuildertemporarily saves the answer result of the current dialog, and then passes this result back through the Semantic Kernel'sAddAssistantMessagemethod is added back to the conversation so that in the next conversation, the big language model will know what topic we are talking about.

For example, in the following example, in the second question I asked "How many migrations were there?" The AI knows that I'm talking about the great migrations of human history, and then lists the answers I'm looking for:

By this point, a simple gpt-4o based quiz application is complete, and its workflow is roughly as follows:

Can AI answer all the questions?

Since the gpt-4o big language model used here was released in May this year, and big language models are trained based on existing data, it should not know what happened after May, and when encountering such a question, the AI can only answer that it doesn't know, or give a more outrageous answer:

You may think, then I download this knowledge or news articles, and then based on the above code, add this information to the dialog history first, so that the big language model can understand the context, so that the accuracy when answering the questions will not be improved? This is the right idea, you can add the text information of the news to the dialog history before conducting the Q&A:

("Here's some additional information:" + await (""));

Doing so, however, results in the following exception message:

This problem is actually related to the Context Window of the big language model. In this example, our model can process up to 128,000 tokens at a time (a token is a data processing unit of a large language model, it can be a phrase, a word or a character), but we entered 147, 845 tokens. 845 tokens and got an error. Obviously, we should reduce the amount of incoming data, but there is no way to send the complete news article information to the big language model. This is where Retrieval Augmented Generation (RAG) comes in.

Semantic Kernel's Retrieval Augmented Generation (RAG) Practice

In fact, it is not necessary to send the whole news article to the large language model, you can change the idea: just need to extract the content related to the question in the news article and send it to the large language model, which can greatly reduce the number of tokens that need to be sent to the large language model. So, here comes some extra steps:

Pre-processing of large numbers of documents to quantify and preserve textual information (Text Embedding)
When asking a new question, find information relevant to the question based on the semantics of the question, from the saved text quantitative information (Embeddings)
Send this information to the Big Language Model and get an answer from the Big Language Model
Feedback of results to the caller

The process is roughly as follows:

The dashed gray box is theRetrieval Augmentation Generation (RAG)Related processes, here will not be for each label one by one description, can understand the above described 4 big steps, it is very good to understand the overall process in this figure. In the following, we directly use Semantic Kernel to enhance the model response through RAG.

First, in Azure OpenAI Studio, follow the steps above to deploy a text-embedding-3-small model, again documenting the endpoint URIs and API Key, and then, add the project A reference to the NuGet package, as we intend to run our code using a memory-based text vector database first.Semantic Kernel supports a wide variety of vector databases, such as Sqlite, Azure AI Search, Chroma, Milvus, Pinecone, Qdrant, Weaviate, and more. When adding a reference, you need to use the--prereleaseparameter, because theThe package is currently in alpha.

Change the above code to the following form:

using ;using ;using ;using ;using ;using ;using ;using
using ;using ;using ;using ;using ;using ;using ;using
using ;using ;using ;using ;using ;using ;using ;using
using ;using ;using ;using ;using ;using ;using ;using
The use of the term "using" in the context of a new technology is not a good idea.
using ;using ;using ;using ;using ;using ;using ;using

#pragma warning disable SKEXP0010, SKEXP0001, SKEXP0050

const string CollectionName = "LatestNews" ;using ;using ;#pragma warning disable

var apikey = ("azureopenaiapikey")! var apikey = ("azureopenaiapikey")!

// Initialize the Semantic Kernel
var kernel = ()
    .AddAzureOpenAIChatCompletion(
        "gpt-4",
        "",
        apikey)
    .Build();

// Create the text vector generation service
var textEmbeddingGenerationService = new AzureOpenAITextEmbeddingGenerationService(
    "text-embedding-3-small",
    "",
    apikey);

// Create an in-memory vector database to hold the text vectors
var memory = new MemoryBuilder()
    .WithMemoryStore(new VolatileMemoryStore())
    .WithTextEmbeddingGeneration(textEmbeddingGenerationService)
    .WithTextEmbeddingGeneration(textEmbeddingGenerationService)

// Read in content from an external file in Markdown format and generate multiple paragraphs based on semantics
var markdownContent = await (@"");
var paragraphs =
    (
        (("\r\n", " "), 128),
        64); var paragraphs = (("\r\n", " "), 128), 64)

// Quantize each paragraph and save it to the vector database
for (var i = 0; i < ; i++)
{
    await (CollectionName, paragraphs[i], $"paragraph{i}"); }
}

// Create a conversation completion service and a conversation history object to hold the history of the conversation for subsequent use in the larger model.
// Provide conversation context information.
var chatCompletionService = <IChatCompletionService>();
var chat = new ChatHistory("You are an AI assistant that helps people find information and answer questions");;
StringBuilder additionalInfo = new();
StringBuilder chatResponse = new();

while (true)
{
    ("Please enter a question>> ");
    var question = ()! ;
    ();

    // Find the 3 messages from the vector database that are most similar to the question and add them to the conversation history
    await foreach (var hit in (CollectionName, question, limit: 3))
    {
        (); }
    }
    var contextLinesToRemove = -1; if ( !
    if ( ! = 0)
    {
        (0, "Here's some additional information:"); } var contextLinesToRemove = ;; }
        contextLinesToRemove = ;
        (());
    }

    // Add the question entered by the user to the dialog
    (question);

    (); // Get feedback from the big language model and output the results verbatim.
    // Get the feedback from the big language model and output the result verbatim
    await foreach (var message in
                   (chat))
    {
        // Output the currently fetched result string
        (message).

        // Add the output to a temporary variable
        ();
    }

    ();

    // Add the results of the current answer to the conversation history before moving on to the next Q&A, providing Q&A context for the big language model
    (());

    // Remove the current question-related content from the conversation history
    if (contextLinesToRemove >= 0) (contextLinesToRemove);

    ().
}

Rerun the program and ask the same question, and you can see that the answer is now correct:

Now look at what's really in the vector database. Add a new pair of NuGet package reference, and then, the code above:

.WithMemoryStore(new VolatileMemoryStore())

Change to:

.WithMemoryStore(await (""))

Re-run the program and after successful execution, thebin\Debug\net8.0directory, you can find thefile, open the database file with a Sqlite viewer tool (I used SQLiteStudio) and you can see the tables and data below:

The Metadata field holds the raw data information for each paragraph, while the Embedding field is the text vector, which is actually a series of floating-point values representing the semantics of the text between thegap。

Using the Ollama-based Local Large Language Model

Semantic Kernel is now availableSupport for the Ollama Native Large Language Modelup, although it's also still in preview at the moment. It is possible to create a preview of the project by adding the NuGet package to experience. It is recommended to install the latest version ofOllama, then, download two large language models, one of type Chat Completion and the other of type Text Embedding. I chose thellama3.2:3bcap (a poem)mxbai-embed-largeThese two models:

The code just needs to replace Azure OpenAI with Ollama:

// initializationSemantic Kernel
var kernel = ()
    .AddOllamaChatCompletion(
        "llama3.2:3b",
        new Uri("http://localhost:11434"))
    .Build();

// Create a text vector generation service
var textEmbeddingGenerationService = new OllamaTextEmbeddingGenerationService(
    "mxbai-embed-large:latest",
    new Uri("http://localhost:11434"));

summarize

Through the introduction of this article, you should be able to have a certain understanding of Semantic Kernel, RAG and its application in C#, although it does not involve the principle content, but basically it can already provide a certain reference value at the application level.Semantic Kernel, although some of the Plugins are still in the preview stage, through the introduction of this article, we can already can see its powerful features, for example, allows us to easily access a variety of popular vector databases, but also allows us to easily switch to different AI large language modeling services, in the application of AI integration, Semantic Kernel plays an important role.

consultation

Portions of this article are referenced from the official Microsoft documentDemystifying Retrieval Augmented Generation with .NETThe code is also partially referenced in the article. The article is more detailed, and we recommend that interested readers move to read it.