SemanticKernel/C#: Using Dialogue Models and Embedding Models in Ollama for Local Offline Scenarios

preamble

The last article introduced the easy practice of RAG using SemanticKernel/C#, in the last article I used the online API compatible with OpenAI format, but in reality there will be a lot of local offline scenarios. Today, I'd like to introduce you how to use the dialog model and embedding model in Ollama for local offline scenarios in SemanticKernel/C#.

Getting Started

The dialog model used in this paper is gemma2:2b, and the embedding model is all-minilm:latest, which can be downloaded well in Ollama first.

Compatibility with the OpenAI Chat Completions API in Ollama, February 8, 2024, see /blog/openai-compatibility.

So it's easier to use the dialog model in Ollama in SemanticKernel/C#.

var kernel = ()
    .AddOpenAIChatCompletion(modelId: "gemma2:2b", apiKey: null, endpoint: new Uri("http://localhost:11434")).Build();

Just build the kernel this way.

Simply try the effect:

public async Task<string> Praise()
{
    var skPrompt = """
                  You are an expert at praising people and replying with a one sentence compliment.
                  Your reply should be one sentence, not too long and not too short.
                  """;
    var result = await _kernel.InvokePromptAsync(skPrompt);
    var str = ();
    return str;
}

That's all it takes to successfully use Ollama's dialog model in SemanticKernel.

Now to the embedded model, since Ollama is not compatible with the OpenAI format, using it directly won't work.

Ollama is formatted like this:

The OpenAI request format looks like this:

curl /v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-3-small"
  }'

OpenAI's return format looks like this:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ... (omitted for spacing)
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Therefore forwarding by request is not an option.

Someone also mentioned this earlier in ollama's issue:

There also seems to be preparations to implement embedded interface compatibility:

No compatibility so far after trying it.

You need to implement some interfaces of your own in SemanticKernel to use Ollama's embedding model, but after searching, I found that there are already big guys who have done this, github address:/BLaZeKiLL/。

For usage see:/BLaZeKiLL//tree/main/dotnet/

Big Brother implements ChatCompletion, EmbeddingGeneration and TextGenerationService, if you only use to EmbeddingGeneration you can look at Big Brother's code and add some classes in the project itself to reduce the package in the project.

Here, for the sake of convenience, we'll just install the big guy's package:

Build ISemanticTextMemory:

 public async Task<ISemanticTextMemory> GetTextMemory3()
 {
     var builder = new MemoryBuilder();
     var embeddingEndpoint = "http://localhost:11434";
     var cancellationTokenSource = new ();
     var cancellationToken = ;
     (new HttpClient());
     ("all-minilm:latest", embeddingEndpoint);           
     IMemoryStore memoryStore = await ("");
     (memoryStore);
     var textMemory = ();
     return textMemory;
 }

Now let's start trying out the results, making improvements based on what I shared yesterday and uploading a txt file today.

A private document is shown below with the privacy information replaced:

Dear students:
Hello, in order to help you peacefully and smoothly through the wonderful university time, the school specially introduced "Internet +" college safety education service platform, you can learn safety knowledge through the cell phone at any time and any place on the network micro-courses. College life is colorful, firmly grasp the safety knowledge, comprehensively improve the safety skills and quality. Please make sure to complete the study and examination of the course in the specified study time.
Please complete the study and examination independently in the following way:
1. Entrance to the mobile learning platform: Please pay attention to the WeChat public number "XX University" or scan the QR code below, enter and click [Academic Navigation] → [XX Microcourse] on the menu bar of the public number, enter your account number (student number) and password (student number), and then click [Log in] to bind your information and enter the learning platform.
2、Web learning platform entrance: open the browser, log in, and successfully enter the platform to learn safety knowledge.
3、Platform open time: April 1, 2024 - April 30, 2024, must complete all the courses to study before the test, a total of 50 test questions, full of 100 points, 80 points qualified, there are three test opportunities, the final score to take the optimal score.
4, Q&A qq group number: 123123123.
Learning platform login process
1. Mobile learning platform entrance:
Please scan the QR code below and pay attention to WeChat public number "XX University";
Public number menu bar [academic navigation] → [XX microcourse], select the name of the school, enter the account number (school number), password (school number), point [login] to bind the information, enter the learning platform;
If you encounter any problems, please tap [Online Class Service] or [Frequently Asked Questions] for consultation (Consultation time: Monday to Sunday 8:30-17:00).
2. Web-based learning platform entrance:
Open the browser, log in and successfully enter the platform to learn safety knowledge.
3. Safety micro-lesson learning and examination
1) Micro-lesson learning
Click [2024 Spring Safety Education] on the homepage [Learning Tasks] to enter the course;
Expand the list of micro-lessons and click on the micro-lesson to start learning;
Most of the micro-lessons are clicked to continue learning, and individual micro-lessons are swiped upward or to the left to learn;
After the micro-lesson learning is completed, there will be a prompt of "Congratulations, you have completed the learning of this micro-lesson", you need to click [OK], and then click [Return to Course List] to record the completion status of the micro-lesson;
2) Examination
After completing all the micro-lessons of the project, click [Exam Arrangement] → [Take Exam] to take the end-of-course exam.

Upload the document:

Cut into three segments:

Deposit data:

Return a question like "What is the Q&A qq group number?" :

It took a bit longer, maybe a couple dozen seconds, but it was answered correctly:

Try to answer one more question:

Answer the effect is not very good, and due to the configuration is not good, the local run is also very slow, if you have the conditions can be changed to a different model, if not have the conditions and is not necessarily run offline, you can pick up a free api, in combination with the local embedded model.

Switching to online api's Qwen/Qwen2-7B-Instruct works pretty well:

summarize

The main takeaway from this practice is how to use the dialog model with embedding model in Ollama for local offline scenarios in SemanticKernel. In practicing the RAG, it was found that the most important thing that affects the effect is in two places.

The first place is the determination of the slice size:

 var lines = (input, 20);
 var paragraphs = (lines, 100);

The second place to get a few pieces of relevant data with correlation settings:

var memoryResults = (index, input, limit: 3, minRelevanceScore: 0.3);

Relevance is too high a piece of data can not be found, too low and easy to find irrelevant data, need to be adjusted through practice, to a setting that can meet the needs.

consultation

1、/@johnkane24/local-memory-c-semantic-kernel-ollama-and-sqlite-to-manage-chat-memories-locally-9b779fc56432

2、/BLaZeKiLL/