C# Integration of Ollama to implement local LLMs calls

preamble

the past two yearsAIGCIt has grown very quickly, starting with justChatGPTTo the present very hundred of them. From the beginning of the large parameter model, then later small parameter model, from the beginning of a single text model to the current multimodal model and so on. Along with the progress is not only the diversity of models, but also the use of models. The threshold for the use of large models is getting lower and lower, and even now everyone can run models on their own computers. Today we are going to talk about the best of the big modeling tools.Ollama, and demonstrates how to passC#to useOllama。

Ollama

Ollamais an open source Large Language Model (LLM) service tool that allows users to quickly experiment with, manage and deploy large language models in a local PC environment. It supports a variety of popular open source large language models such asLlama 3.1、Phi 3、Qwen 2、GLM 4etc., and the models can be easily downloaded, run and managed through a command line interface.Ollamathat emerged to lower the barriers to using large language models is to make large language models more popular and accessible. In a nutshellOllama makes using models easier. Whether it'sCPUeither one or the otherGPUBoth, reasoning is faster with high arithmetic, slow and prone to gibberish if you don't have enough arithmetic.

mounting

OllamaThere are two commonly used installation methods, one is to go to the official website to download, the other is to go to GitHub to download, you can choose the corresponding system version to download the

Download directly from the homepage of the official website/
Github Relase Download/ollama/ollama/releases

I have a Windows operating system, so I just download it directly all the way to Next and it installs by default in theC drive or default startup drive (computing)It can't be changed. If you're obsessive-compulsive, you can change it bymklinkDo the link, but after the automatic update it's still in theC drive or default startup drive (computing)The automatic upgrading piece is nothing to worry about, if there is a new version on the network. Don't worry too much about the auto-upgrade piece, the networking situation, if there is a new versionOllamaUpdates will be pushed.

After the installation is complete, you can change common environment variables

pass (a bill or inspection etc)OLLAMA_MODELSenvironment variable sets the location of the model download, which by default is in theC drive or default startup drive (computing), which can be changed to another address.
pass (a bill or inspection etc)OLLAMA_HOSTset upOllamaThe port on which the service listens, by default11434。

After the installation is complete pass theversionCheck and if the version number is displayed the installation was successful.

ollama --version

The more common commands are few and simple

ollama listList locally downloaded models
ollama psView running models
ollama pull model identificationDownload the model locally, e.g. I want to downloadqwen2 7bfailing agreementollama pull qwen2:7b
ollama run model identificationRun the model, if it has been downloaded then run it directly, if not then download it before running it. For example, I want to runqwen2 7bYou can directly run theollama run qwen2:7b

It's also possible to convert an existing localGGUFThe model is imported into theOllamaIt's easy to go in and operate.

Write a file calledModelfilefile, write the following

FROM model path/qwen2-0_5b-instruct-q8_0.gguf

pass (a bill or inspection etc)OllamaCreating Models

ollama create qwen2:0.5b -f Modelfile

Run the model you just created

ollama run qwen2:0.5b

It is important to note that running7BAt least8GBmemory or video memory, running13BAt least16GBRAM or video memory. My computer's configuration information is as follows

Model: Xiaoxin Pro16 AI Yuanqi
CPU: AMD Ryzen 7 8845H
Memory: 32.0 GB

AMD Ryzen 7 8845Hinternally installedNPUThe overall math is fine. Run, run, run.13Band below is not much of a problem for the model. Of course, this level of parameter size will not be an omnipotent model, this level of model running cost is relatively low, suitable for doing some specific scenarios of reasoning tasks. If you need an omnipotent model, it is recommended that you use theChatGPTThis business model.

command activation

After downloading the model is complete you can test run it through thecmdRun the command, for example, I run upqwen2:7bmould

This way is simpler, only text dialog and no style, simple and rough.

interface access

OllamaThe essence of service delivery is stillhttpinterface, we can call the http interface by way of the/api/generateconnector

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2:7b",
  "prompt": "Can you please tell me what weather you know? Output it in json format",
  "stream": false
}'

modelSet the name of the model
promptclue
streamset tofalseRequest not to stream returns

Because it returns everything at once, you need to wait a while, if you need to stream the output you can set it totrueThe interface returns the following message after waiting for a while. After waiting for a while the interface returns the following message

{
    
    "created_at": "2024-09-04T06:13:53.1082355Z",
    "response":"```json\n{\n \"Common Weather \"\": [\n {\n \n \"Type \"\": \"Sunny \"\",\n \"Description \"\":\"The sky is cloudless or has a few high thin clouds, and there is plenty of sunshine during the day. \",\n \"Symbol \": \"☀️\"\n },\n {\n \n \"Type \": \"Cloudy \",\n \"Description \": \"Most of the sky is covered with clouds, but a blue sky can be seen, and the sun comes and goes. \",\n \"Symbol \": \"🌤️\"\n },\n {\n \n \"Type \": \"Cloudy \",\n \"Description \": \"All or most of the day is cloudy, with little or no sunlight to be seen, and the light is dim. \",\n \"Symbol\": \"☁️\"\n },\n {\n \n \"Type\": \"Rain\",\n \n \"Sub-Type\": [\n {\n \n \"Type\": \"Drizzle\",\n \n \"Descriptor\": \n "Precipitation is not heavy, and standing water does not usually form. \",\n \"Symbol \":\ "🌦️\"\n },\n {\n \n \"Type \":\ "Moderate rain \",\n \"Description \":\ "Precipitation is moderate and localized ponding may occur. \",\n \"Symbol \": \"🌧️\"\n },\n {\n \n \"Type \": \"Heavy rain \",\n \"Description \": \"Heavy rainfall, may be accompanied by thunder and lightning and strong winds.  \",\n \"Symbol \": \"❄️\"\n },\n {\n \n \"Type \": \"Medium snow \",\n \"Description \": \"Medium snowfall, snow may accumulate on the ground and on some vegetation. \",\n \"Symbol \": \"🌨️\"\n },\n {\n \n \"Type \": \"Heavy snowfall \",\n \n \"Description \": \"Heavy snowfall with deep snow on the ground, transportation and livelihoods severely affected. \",\n \"Symbol \": \"❄️💨\"\n }\n ]\n },\n {\n \n \"Type \": \"Fog \",\n \n \"Describe \":\n \n "Phenomenon of atmospheric water vapor condensing at or near the surface of the earth to form a large number of tiny suspended water droplets or ice crystals. \",\n \"Symbol \":\ "🌫️\"\n },\n {\n \n \"Type \":\ "Thunderstorms\",\n \"Description \":\ "Sudden, brief, heavy downpours accompanied by flashes of lightening and thunder, usually of short duration. \",\n \"Symbol \": \"⚡🌧️\"\n }\n ]\n}\n \n```",
    "done": true, \n
    "done_reason": "stop",
    "context": [
        151644,.
        872.
        198.
        //... Omit...
        73594
    ],, "total_duration": 70172634700, //...
    "total_duration": 70172634700,
    "load_duration": 22311300, "prompt_eval_count": 19, //...
    
    "prompt_eval_duration": 151255000, "eval_count": 495
    "prompt_eval_count": 19, "prompt_eval_duration": 151255000, "eval_count": 495, "eval_duration": 6000
    "eval_duration": 69997676000
}

There is also a more common operation that people are more concerned aboutembedding modelThe common point is that the text or images, video and other information for feature extraction into a vector way, this time you need to use the/api/embedinterface, the request format is shown below, the vectorization model used here isnomic-embed-textYou can use it on your own.ollama pullThe model.

curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text:latest",
  "input": "I am Chinese, I love my country"
}'

The format of the data returned by the embedded interface is shown below

{
    "model": "nomic-embed-text:latest",
    "embeddings": [
        [
            0.012869273,
            0.015905218, -0.13998738,
            -0.13998738, [ -0.012869273
            //... Omit a lot...
            -0.035138983, -0.03351391
            -0.03351391
        ]
    ],, "total_duration": 619728100, //...
    "total_duration": 619728100,
    "load_duration": 572422600, "prompt_eval_count": 12
    "prompt_eval_count": 12
}

without doubtOllamaProvides a lot of interfaces, such as dialog, model management, etc., here we do not introduce one by one, the need for students can consult the interface documentation address/ollama/ollama/blob/main/docs/

Visualization UI

Above we mentioned two ways to access theOllamaservices, one is the command line way, the other is the interface way. Both of these are primitive in their approach, but they don't appear as intuitive as the interface operation, and if you want to use the interface by way of theOllamaCompletion of the dialog service, officialGithubRecommendations are also more, interested students can check the document for themselves/ollama/ollama?tab=readme-ov-file#web--desktopI went with the first one.Open WebUIThe easy way to do this is through theDockerrun directly

docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://yoursollamaserviceip:11434 -v open-webui:/app/backend/data --name open-webui --restart always /open-webui/open-webui:main

Alternatively, you can build the startup by building the source code, following the command below step by step

git clone /open-webui/
cd open-webui/
cp -RPp . .env (make a copy).Rename the file to.envFor Windows use copy . .env)
npm install
npm run build
cd ./backend
conda create --name open-webui-env python=3.11 (create a virtual environment named pen-webui-env with conda)
conda activate open-webui-env (activate virtual environment)
pip install -r -U (install python dependencies)
bash (start_windows for Windows)

As you can see, it is dependentNodeJscap (a poem)Pythonof the program, it is also necessary to install theConda

🐰 >= 20.10
🐍 Python >= 3.11
conda I use 24.5.0.

After successful startup, type in your browserhttp://localhost:8080/After registering a user name and logging in, the interface is as follows

Models can be directly selected for dialog, similar toChatGPTThat style of conversation.

C# Integration with Ollama

Above we learned thatOllamabasic installation and use, understanding that its invocation is based on theHttp interfaceto accomplish this. In fact, I could also refer to the interface documentation to package a set of calls myself, but it's not necessary, because there are many ready-made SDKs that can be used directly.

Using Ollama Sdk

The SDK for C# used here is called 0llama, and its Github address is/tryAGI/Ollama, Why choose it is actually quite simple, because it supports thefunction call, which facilitates us to experience the new features earlier. It is very easy to install it, and I believe that students will be able to

dotnet add package Ollama --version 1.9.0

simple conversation

The simple dialog function is not that difficult to get started, it's all simple code.

string modelName = "qwen2:7b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));

("Starting a conversation！！！");
string userInput = "";
do
{
    ("User:");
    userInput = ()!;
    var enumerable = (modelName, userInput);
    ("Agent:");
    await foreach (var response in enumerable)
    {
        ($"{}");
    }
    ();

} while (!(userInput, "exit", ));
("Conclusion of the dialogue！！！");

The model name is mandatory to pass and by default thestreaming outputIf you want to return the same at once is to set thestream is falseThe The example uses theqwen2:7bmodel. Once executed it is ready for direct conversation, as follows

Overall in the domestic modelqwen2:7bThe overall effect is still good, at least it's not a distortion of the truth.

many rounds of dialogue

If you need to have a role-split multi-round conversation, use it in a different way, using the providedChatway, as follows

string modelName = "glm4:9b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));
("Starting a conversation！！！");
string userInput = "";
List<Message> messages = [];
do
{
    //Take only the five most recent messages
    messages = (5).ToList();
    ("User:");
    userInput = ()!;
    //Add a user message
    (new Message(, userInput));
    var enumerable = (modelName, messages, stream: true);
    ("Agent:");
    StringBuilder builder = new();
    await foreach (var response in enumerable)
    {
        string content = ;
        (content);
        (content);
    }
    //Adding Machine Messages
    (new Message(, ()));
    ();

} while (!(userInput, "exit", ));
("Conclusion of the dialogue！！！");

This time it's another domestic model.glm4:9b, Multi-round dialogues and full dialogues use different audiences.

Full conversations use the Completions object, and multi-round conversations use theChatObject.
Multi-round conversations require the use ofList<Message>Storing a record of previous conversations is the only way the model can capture the context here.

Running it up, the execution looks like this

The first time I asked him if he knew c# and it said a bunch of things to indicate that it did. The second time I asked it to write a simple example, but I didn't say write a c# example, but it could understand the intent by the above dialog, so it wrote me an example directly in c#.

function call

highOllamabe in favor offunction call, which of course requires that the model must also support it, and is ineffective if the model itself does not support it, where thellama3.1The support is better, the beauty isllama3.1The support for Chinese is not very good, so let's briefly demonstrate it, here using thellama3.1:8bModel, first of all, you need to define the method, so that when you talk to the model, the framework will draw out the meta-information of the method and send it to the model, so that the model can determine which one to call, here I simply define a calculation of the interface of the addition, deletion, modification, and checking, and to implement this interface.

//Define an interface，Providing meta-information
[OllamaTools]
public interface IMathFunctions
{
    [Description("Add two numbers")]
    int Add(int a, int b);
    [Description("Subtract two numbers")]
    int Subtract(int a, int b);
    [Description("Multiply two numbers")]
    int Multiply(int a, int b);
    [Description("Divide two numbers")]
    int Divide(int a, int b);
}

//Implementing the above interface provides specific methods
public class MathService : IMathFunctions
{
    public int Add(int a, int b) => a + b;
    public int Subtract(int a, int b) => a - b;
    public int Multiply(int a, int b) => a * b;
    public int Divide(int a, int b) => a / b;
}

With the above interfaces and implementation classes in place, we can pass theOllamaThey are now used in the following way

string modelName = "llama3.1:8b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));
var chat = (
    model: modelName,
    systemMessage: "You are a helpful assistant.",
    autoCallTools: true);

//do sth (for sb)OllamaRegister the class you just defined
var mathService = new MathService();
((), ());

while (true)
{
    try
    {
        ("User>");
        var newMessage = ();
        var msg = await (newMessage);
        ("Agent> " + );
    }
    finally
    {
        //Print all messages of this dialog
        (());
    }
}

Here you need to set theautoCallToolsbecause oftruein order to automatically invoke the method.PrintMessages()method is used to print all messages in the current session, and is usually called automatically.function callwill generate multiple requests, but we use the time is senseless, because the framework will help me automatically deal with, for example, my prompt is a math formula(12+8)*4/2=?As shown below

pass (a bill or inspection etc)PrintMessages()The dialog message printed by the method shows that, although I only provided a single prompt word, theOllama SDKBecause of the support for the autocall tool.llama3.1:8bPut the cue word equation(12+8)*4/2)The splitting was carried out and the calculation steps are shown below

Split the logic in the parentheses first12+8invokeAddMethods to get results 20
Then the second step uses the result obtained in the previous step to call theMultiplycount20*4Getting 80
Then use the result of the previous step to callDividecount80/2Result 40
Finally, the steps and results of the Tools call are sent in a dialog to thellama3.1model, the model gets the final output

If we don't print the process log, the model will just output the

Assistant:
The correct calculation is:
(12+8)=20
20*4=80
80/2=40
Therefore,the answer is:40.

embedding model

As we mentioned aboveOllamaNot only can you use the dialog model but you can also use theembedding modelThe function of theembedding modelIn simple terms it is the process of feature lifting for text, images, speech, etc. using a model to get vector data. This is accomplished byOllama SDKIt is possible to useOllamaThe code for the embedding function is shown below

 string modelName = "nomic-embed-text:latest";
 HttpClient client = new HttpClient();
  = new Uri("http://127.0.0.1:11434/api");
  = (3000);
 using var ollama = new OllamaApiClient(client);
 var embeddingResp = await (modelName, "c#It's a good programming language.");
 ($"[{(",", !)}]");

What you get is the vector information shown below

Vector data is available to compute the similarity, using thecosine angle (math.)The concept of color table allows you to calculate the spatial distance of vectors; the closer the spatial distance, the more similar the two vectors are. If you understand the color tableRGBIt's easier to understand if, for example(255, 0, 0)It's just solid red.(255, 10, 10)It's also red, but not solid red. If you take the(255, 0, 0)cap (a poem)(255, 10, 10)Mapped onto a three-dimensional map of spatial coordinates they are close together, but they are not the same as pure blue(0, 0, 255)The spatial distance is just as great as a closeXaxis, a closeZAxis. Vector databases, now familiar to everyone lock, probably use a similar principle. It is also the now popularRAGRetrieve the basis for enhanced generation.

For example, I embedded the following two sentences into the model to get vector values, and then computed them bycosine angle (math.)to compare their similarity

var embeddingResp = await (modelName, "c#It's a good programming language.");
var embeddingResp2 = await (modelName, "c#It's a good language.");
("similarity：" + CosineSimilarity([.. !], [.. embeddingResp2!.Embedding]));

//Calculate the cosine angle
public static double CosineSimilarity(double[] vector1, double[] vector2)
{
    if ( != )
        throw new ArgumentException("The vectors must have the same length");

    double dotProduct = 0.0;
    double magnitude1 = 0.0;
    double magnitude2 = 0.0;

    for (int i = 0; i < ; i++)
    {
        dotProduct += vector1[i] * vector2[i];
        magnitude1 += vector1[i] * vector1[i];
        magnitude2 += vector2[i] * vector2[i];
    }

    magnitude1 = (magnitude1);
    magnitude2 = (magnitude2);

    if (magnitude1 == 0.0 || magnitude2 == 0.0)
        return 0.0; // Avoid dividing by zero

    return dotProduct / (magnitude1 * magnitude2);
}

The similarity result obtained above is

Similarity: 0.9413230998586363

The similarity is high because both of their sentences express more or less the same meaning. But if I were to calculate the similarity of the following two sentences

var embeddingResp = await (modelName, "c# is a nice programming language");
var embeddingResp2 = await (modelName, "I like to eat mango and strawberry");

Then using the cosine values they are only similar in the sense that0.59, because the two statements are hardly related at all.

Similarity: 0.5948448463206064

multimodal model

At the beginning of the conversation model are relatively single, are simple text conversation, with continuous upgrading, some models have supported multiple formats of input and output rather than just a single text, such as support for images, video, voice, etc., these models are called multimodal models. UsingOllamaconformllavaThe model experience a handful of times, here I am using thellava:13b. I found a random image online to store locally
Use this image to ask questions about the model with the following code

HttpClient client = new HttpClient();
 = new Uri("http://127.0.0.1:11434/api");
 = (3000);
using var ollama = new OllamaApiClient(client);
string modelName = "llava:13b";
string prompt = "What is in this picture?";
 image = ("");
var enumerable = (modelName, prompt, images: [BitmapToBase64(image)], stream: true);
await foreach (var response in enumerable)
{
    ($"{}");
}

//Imageclassifier for repeated actionsbase64
public static string BitmapToBase64( bitmap)
{
    MemoryStream ms1 = new MemoryStream();
    (ms1, );
    byte[] arr1 = new byte[];
     = 0;
    (arr1, 0, (int));
    ();
    return Convert.ToBase64String(arr1);
}

I used a cue word to get the model to describe what's inside the image, and then converted this image into abase64The encoding format is sent to the model together and the model returns the following

It is indeed powerful enough to describe the information accurately and the wording is quite good, if one were to describe what is in the picture, I'm sure most people wouldn't be able to describe it as well, I have to say the models are getting more and more powerful.

Using SemanticKernel

In addition to integratingOllama SDKIn addition, you can useSemantic KernelintegrateOllama, we know that by defaultSemantic Kernelcan only be usedOpenAIcap (a poem)Azure OpenAIinterface format, but other model interfaces are not necessarily the same as theOpenAIinterface format for compatibility, sometimes even through theone-apiSuch a service to fit in a bit. But don't worry.OllamacompatibleOpenAIThe interface is formatted so that it can be used directly even without any adaptation services, we just need to re-adapt the request address.

using HttpClient httpClient = new HttpClient(new RedirectingHandler());
 = (120);

var kernelBuilder = ()
    .AddOpenAIChatCompletion(
       modelId: "glm4:9b",
       apiKey: "ollama",
       httpClient: httpClient);
Kernel kernel = ();

var chatCompletionService = <IChatCompletionService>();
OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new()
{
    ToolCallBehavior =
};

var history = new ChatHistory();
string? userInput;
do
{
    ("User > ");
    userInput = ();
    (userInput!);

    var result = (
        history,
        executionSettings: openAIPromptExecutionSettings,
        kernel: kernel);
    string fullMessage = "";
    ("Assistant > ");
    await foreach (var content in result)
    {
        ();
        fullMessage += ;
    }
    ();

    (fullMessage);
} while (userInput is not null);


public class RedirectingHandler : HttpClientHandler
{
    protected override Task<HttpResponseMessage> SendAsync(
        HttpRequestMessage request, CancellationToken cancellationToken)
    {
        var uriBuilder = new UriBuilder(!) { Scheme = "http", Host = "localhost", Port = 11434 };
        //dialog model
        if (request!.RequestUri!.("v1/chat/completions"))
        {
             = "/v1/chat/completions";
             = ;
        }
        //embedding model
        if (request!.RequestUri!.("v1/embeddings"))
        {
             = "/v1/embeddings";
             = ;
        }            
        return (request, cancellationToken);
    }
}

Here we use a domestic modelglm4:9b, it is important to note that because we are using a local service here, we need to adapt the address of the service, by writing theRedirectingHandlerclass and use it to construct aHttpClientinstance is passed to theKernel. Careful students may have noticed that here I am reposting theOllamaThe path to the service also becomes the same as theOpenAIservice with the same path, but above I call theOllamaThe service uses the/api/chatcap (a poem)/api/embedinterface for such addresses. This is because theOllamaIn order to be compatibleOpenAIstandards, specifically developed a set andOpenAIinterfaces with the same path and parameters, this is something to be aware of. Of courseOllamaNot all compatible yetOpenAIThe full characterization of the interface, for those interested in the/ollama/ollama/blob/main/docs/Documentation address for more details.

With the above service running, we can also have a conversation that works as follows

The same you can do withSemanticKernelUse the functions of the embedded model as follows

using HttpClient httpClient = new HttpClient(new RedirectingHandler());
 = (120);

var kernelBuilder = ()
    .AddOpenAITextEmbeddingGeneration(
       modelId:"nomic-embed-text:latest",
       apiKey:"ollama",
       httpClient: httpClient);
Kernel kernel = ();
var embeddingService = <ITextEmbeddingGenerationService>();
var embeddings = await (["I thinkc#It's a good programming language."]);
($"[{(",", embeddings[0].ToArray())}]");

Here's something for Hugh to keep in mindAddOpenAITextEmbeddingGenerationmethod is an evaluation method that may be removed in future versions, so by default using this method with VS will result in an error alert, which can be found in thecsproj(used form a nominal expression)PropertyGroupSet it up in the tabNoWarnto ignore this reminder.

<PropertyGroup>
   <OutputType>Exe</OutputType>
   <TargetFramework>net8.0</TargetFramework>
   <NoWarn>SKEXP0010;SKEXP0001</NoWarn>
</PropertyGroup>

summarize

This article describes how theC#combiningOllamaImplementing the deployment and invocation of local big language models, with a focus on demonstrating the use of theC#Specific steps to integrate the feature in your application. Help developers get started quickly with detailed installation guides and code examples.

First we introduced theOllamainstallation and the use of basic settings and commands.
It then describes how to pass theOllamaCalling large models, such as using thecommand line (computing)、Http interface service、visual interface。
Again we we pass theC#UsedOllama SDKI'm here to demonstrate.dialog mode、Text embedding、multimodal modelHow to use it, by the way, is related to similarity calculation.
Lastly, we show the results of the passage of theSemantic Kernelcall (programming)OllamaServices asOllamatreat (sb a certain way)OpenAIThe data format of the interface is compatible, and although there are still parts that are not compatible, it is not a big problem for daily use.

Through this article, I hope that students who have not understood the big model can get started or probably understand the relevant basics, after all, this is the last two years or the next few years are relatively hot a direction. Even if we can not study him in depth, but we have to know it to understand its basic principles and use. Why do we have to continue to learn, because these things are often really can provide us with convenience. Exposure to it, understand it, to really know what it can help me to solve the problem.

👇 Feel free to scan the code and follow my public number 👇.