preamble
the past two yearsAIGC
It has grown very quickly, starting with justChatGPT
To the present very hundred of them. From the beginning of the large parameter model, then later small parameter model, from the beginning of a single text model to the current multimodal model and so on. Along with the progress is not only the diversity of models, but also the use of models. The threshold for the use of large models is getting lower and lower, and even now everyone can run models on their own computers. Today we are going to talk about the best of the big modeling tools.Ollama
, and demonstrates how to passC#
to useOllama
。
Ollama
Ollama
is an open source Large Language Model (LLM) service tool that allows users to quickly experiment with, manage and deploy large language models in a local PC environment. It supports a variety of popular open source large language models such asLlama 3.1
、Phi 3
、Qwen 2
、GLM 4
etc., and the models can be easily downloaded, run and managed through a command line interface.Ollama
that emerged to lower the barriers to using large language models is to make large language models more popular and accessible. In a nutshellOllama makes using models easier
. Whether it'sCPU
either one or the otherGPU
Both, reasoning is faster with high arithmetic, slow and prone to gibberish if you don't have enough arithmetic.
mounting
Ollama
There are two commonly used installation methods, one is to go to the official website to download, the other is to go to GitHub to download, you can choose the corresponding system version to download the
- Download directly from the homepage of the official website/
- Github Relase Download/ollama/ollama/releases
I have a Windows operating system, so I just download it directly all the way to Next and it installs by default in theC drive or default startup drive (computing)
It can't be changed. If you're obsessive-compulsive, you can change it bymklink
Do the link, but after the automatic update it's still in theC drive or default startup drive (computing)
The automatic upgrading piece is nothing to worry about, if there is a new version on the network. Don't worry too much about the auto-upgrade piece, the networking situation, if there is a new versionOllama
Updates will be pushed.
After the installation is complete, you can change common environment variables
- pass (a bill or inspection etc)
OLLAMA_MODELS
environment variable sets the location of the model download, which by default is in theC drive or default startup drive (computing)
, which can be changed to another address. - pass (a bill or inspection etc)
OLLAMA_HOST
set upOllama
The port on which the service listens, by default11434
。
After the installation is complete pass theversion
Check and if the version number is displayed the installation was successful.
ollama --version
The more common commands are few and simple
-
ollama list
List locally downloaded models -
ollama ps
View running models -
ollama pull model identification
Download the model locally, e.g. I want to downloadqwen2 7b
failing agreementollama pull qwen2:7b
-
ollama run model identification
Run the model, if it has been downloaded then run it directly, if not then download it before running it. For example, I want to runqwen2 7b
You can directly run theollama run qwen2:7b
It's also possible to convert an existing localGGUF
The model is imported into theOllama
It's easy to go in and operate.
- Write a file called
Modelfile
file, write the following
FROM model path/qwen2-0_5b-instruct-q8_0.gguf
- pass (a bill or inspection etc)
Ollama
Creating Models
ollama create qwen2:0.5b -f Modelfile
- Run the model you just created
ollama run qwen2:0.5b
It is important to note that running7B
At least8GB
memory or video memory, running13B
At least16GB
RAM or video memory. My computer's configuration information is as follows
Model: Xiaoxin Pro16 AI Yuanqi
CPU: AMD Ryzen 7 8845H
Memory: 32.0 GB
AMD Ryzen 7 8845H
internally installedNPU
The overall math is fine. Run, run, run.13B
and below is not much of a problem for the model. Of course, this level of parameter size will not be an omnipotent model, this level of model running cost is relatively low, suitable for doing some specific scenarios of reasoning tasks. If you need an omnipotent model, it is recommended that you use theChatGPT
This business model.
command activation
After downloading the model is complete you can test run it through thecmd
Run the command, for example, I run upqwen2:7b
mould
This way is simpler, only text dialog and no style, simple and rough.
interface access
Ollama
The essence of service delivery is stillhttp
interface, we can call the http interface by way of the/api/generate
connector
curl http://localhost:11434/api/generate -d '{
"model": "qwen2:7b",
"prompt": "Can you please tell me what weather you know? Output it in json format",
"stream": false
}'
-
model
Set the name of the model -
prompt
clue -
stream
set tofalse
Request not to stream returns
Because it returns everything at once, you need to wait a while, if you need to stream the output you can set it totrue
The interface returns the following message after waiting for a while. After waiting for a while the interface returns the following message
{
"created_at": "2024-09-04T06:13:53.1082355Z",
"response":"```json\n{\n \"Common Weather \"\": [\n {\n \n \"Type \"\": \"Sunny \"\",\n \"Description \"\":\"The sky is cloudless or has a few high thin clouds, and there is plenty of sunshine during the day. \",\n \"Symbol \": \"☀️\"\n },\n {\n \n \"Type \": \"Cloudy \",\n \"Description \": \"Most of the sky is covered with clouds, but a blue sky can be seen, and the sun comes and goes. \",\n \"Symbol \": \"🌤️\"\n },\n {\n \n \"Type \": \"Cloudy \",\n \"Description \": \"All or most of the day is cloudy, with little or no sunlight to be seen, and the light is dim. \",\n \"Symbol\": \"☁️\"\n },\n {\n \n \"Type\": \"Rain\",\n \n \"Sub-Type\": [\n {\n \n \"Type\": \"Drizzle\",\n \n \"Descriptor\": \n "Precipitation is not heavy, and standing water does not usually form. \",\n \"Symbol \":\ "🌦️\"\n },\n {\n \n \"Type \":\ "Moderate rain \",\n \"Description \":\ "Precipitation is moderate and localized ponding may occur. \",\n \"Symbol \": \"🌧️\"\n },\n {\n \n \"Type \": \"Heavy rain \",\n \"Description \": \"Heavy rainfall, may be accompanied by thunder and lightning and strong winds. \",\n \"Symbol \": \"❄️\"\n },\n {\n \n \"Type \": \"Medium snow \",\n \"Description \": \"Medium snowfall, snow may accumulate on the ground and on some vegetation. \",\n \"Symbol \": \"🌨️\"\n },\n {\n \n \"Type \": \"Heavy snowfall \",\n \n \"Description \": \"Heavy snowfall with deep snow on the ground, transportation and livelihoods severely affected. \",\n \"Symbol \": \"❄️💨\"\n }\n ]\n },\n {\n \n \"Type \": \"Fog \",\n \n \"Describe \":\n \n "Phenomenon of atmospheric water vapor condensing at or near the surface of the earth to form a large number of tiny suspended water droplets or ice crystals. \",\n \"Symbol \":\ "🌫️\"\n },\n {\n \n \"Type \":\ "Thunderstorms\",\n \"Description \":\ "Sudden, brief, heavy downpours accompanied by flashes of lightening and thunder, usually of short duration. \",\n \"Symbol \": \"⚡🌧️\"\n }\n ]\n}\n \n```",
"done": true, \n
"done_reason": "stop",
"context": [
151644,.
872.
198.
//... Omit...
73594
],, "total_duration": 70172634700, //...
"total_duration": 70172634700,
"load_duration": 22311300, "prompt_eval_count": 19, //...
"prompt_eval_duration": 151255000, "eval_count": 495
"prompt_eval_count": 19, "prompt_eval_duration": 151255000, "eval_count": 495, "eval_duration": 6000
"eval_duration": 69997676000
}
There is also a more common operation that people are more concerned aboutembedding model
The common point is that the text or images, video and other information for feature extraction into a vector way, this time you need to use the/api/embed
interface, the request format is shown below, the vectorization model used here isnomic-embed-text
You can use it on your own.ollama pull
The model.
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text:latest",
"input": "I am Chinese, I love my country"
}'
The format of the data returned by the embedded interface is shown below
{
"model": "nomic-embed-text:latest",
"embeddings": [
[
0.012869273,
0.015905218, -0.13998738,
-0.13998738, [ -0.012869273
//... Omit a lot...
-0.035138983, -0.03351391
-0.03351391
]
],, "total_duration": 619728100, //...
"total_duration": 619728100,
"load_duration": 572422600, "prompt_eval_count": 12
"prompt_eval_count": 12
}
without doubtOllama
Provides a lot of interfaces, such as dialog, model management, etc., here we do not introduce one by one, the need for students can consult the interface documentation address/ollama/ollama/blob/main/docs/
Visualization UI
Above we mentioned two ways to access theOllama
services, one is the command line way, the other is the interface way. Both of these are primitive in their approach, but they don't appear as intuitive as the interface operation, and if you want to use the interface by way of theOllama
Completion of the dialog service, officialGithub
Recommendations are also more, interested students can check the document for themselves/ollama/ollama?tab=readme-ov-file#web--desktopI went with the first one.Open WebUIThe easy way to do this is through theDocker
run directly
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://yoursollamaserviceip:11434 -v open-webui:/app/backend/data --name open-webui --restart always /open-webui/open-webui:main
Alternatively, you can build the startup by building the source code, following the command below step by step
- git clone /open-webui/
- cd open-webui/
- cp -RPp . .env (make a copy)
.
Rename the file to.env
For Windows use copy . .env) - npm install
- npm run build
- cd ./backend
- conda create --name open-webui-env python=3.11 (create a virtual environment named pen-webui-env with conda)
- conda activate open-webui-env (activate virtual environment)
- pip install -r -U (install python dependencies)
- bash (start_windows for Windows)
As you can see, it is dependentNodeJs
cap (a poem)Python
of the program, it is also necessary to install theConda
- 🐰 >= 20.10
- 🐍 Python >= 3.11
- conda I use 24.5.0.
After successful startup, type in your browserhttp://localhost:8080/
After registering a user name and logging in, the interface is as follows
Models can be directly selected for dialog, similar toChatGPT
That style of conversation.
C# Integration with Ollama
Above we learned thatOllama
basic installation and use, understanding that its invocation is based on theHttp interface
to accomplish this. In fact, I could also refer to the interface documentation to package a set of calls myself, but it's not necessary, because there are many ready-made SDKs that can be used directly.
Using Ollama Sdk
The SDK for C# used here is called 0llama, and its Github address is/tryAGI/Ollama, Why choose it is actually quite simple, because it supports thefunction call
, which facilitates us to experience the new features earlier. It is very easy to install it, and I believe that students will be able to
dotnet add package Ollama --version 1.9.0
simple conversation
The simple dialog function is not that difficult to get started, it's all simple code.
string modelName = "qwen2:7b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));
("Starting a conversation!!!");
string userInput = "";
do
{
("User:");
userInput = ()!;
var enumerable = (modelName, userInput);
("Agent:");
await foreach (var response in enumerable)
{
($"{}");
}
();
} while (!(userInput, "exit", ));
("Conclusion of the dialogue!!!");
The model name is mandatory to pass and by default thestreaming output
If you want to return the same at once is to set thestream is false
The The example uses theqwen2:7b
model. Once executed it is ready for direct conversation, as follows
Overall in the domestic modelqwen2:7b
The overall effect is still good, at least it's not a distortion of the truth.
many rounds of dialogue
If you need to have a role-split multi-round conversation, use it in a different way, using the providedChat
way, as follows
string modelName = "glm4:9b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));
("Starting a conversation!!!");
string userInput = "";
List<Message> messages = [];
do
{
//Take only the five most recent messages
messages = (5).ToList();
("User:");
userInput = ()!;
//Add a user message
(new Message(, userInput));
var enumerable = (modelName, messages, stream: true);
("Agent:");
StringBuilder builder = new();
await foreach (var response in enumerable)
{
string content = ;
(content);
(content);
}
//Adding Machine Messages
(new Message(, ()));
();
} while (!(userInput, "exit", ));
("Conclusion of the dialogue!!!");
This time it's another domestic model.glm4:9b
, Multi-round dialogues and full dialogues use different audiences.
- Full conversations use the Completions object, and multi-round conversations use the
Chat
Object. - Multi-round conversations require the use of
List<Message>
Storing a record of previous conversations is the only way the model can capture the context here.
Running it up, the execution looks like this
The first time I asked him if he knew c# and it said a bunch of things to indicate that it did. The second time I asked it to write a simple example, but I didn't say write a c# example, but it could understand the intent by the above dialog, so it wrote me an example directly in c#.
function call
highOllama
be in favor offunction call
, which of course requires that the model must also support it, and is ineffective if the model itself does not support it, where thellama3.1
The support is better, the beauty isllama3.1
The support for Chinese is not very good, so let's briefly demonstrate it, here using thellama3.1:8b
Model, first of all, you need to define the method, so that when you talk to the model, the framework will draw out the meta-information of the method and send it to the model, so that the model can determine which one to call, here I simply define a calculation of the interface of the addition, deletion, modification, and checking, and to implement this interface.
//Define an interface,Providing meta-information
[OllamaTools]
public interface IMathFunctions
{
[Description("Add two numbers")]
int Add(int a, int b);
[Description("Subtract two numbers")]
int Subtract(int a, int b);
[Description("Multiply two numbers")]
int Multiply(int a, int b);
[Description("Divide two numbers")]
int Divide(int a, int b);
}
//Implementing the above interface provides specific methods
public class MathService : IMathFunctions
{
public int Add(int a, int b) => a + b;
public int Subtract(int a, int b) => a - b;
public int Multiply(int a, int b) => a * b;
public int Divide(int a, int b) => a / b;
}
With the above interfaces and implementation classes in place, we can pass theOllama
They are now used in the following way
string modelName = "llama3.1:8b";
using var ollama = new OllamaApiClient(baseUri: new Uri("http://127.0.0.1:11434/api"));
var chat = (
model: modelName,
systemMessage: "You are a helpful assistant.",
autoCallTools: true);
//do sth (for sb)OllamaRegister the class you just defined
var mathService = new MathService();
((), ());
while (true)
{
try
{
("User>");
var newMessage = ();
var msg = await (newMessage);
("Agent> " + );
}
finally
{
//Print all messages of this dialog
(());
}
}
Here you need to set theautoCallTools
because oftrue
in order to automatically invoke the method.PrintMessages()
method is used to print all messages in the current session, and is usually called automatically.function call
will generate multiple requests, but we use the time is senseless, because the framework will help me automatically deal with, for example, my prompt is a math formula(12+8)*4/2=?
As shown below
pass (a bill or inspection etc)PrintMessages()
The dialog message printed by the method shows that, although I only provided a single prompt word, theOllama SDK
Because of the support for the autocall tool.llama3.1:8b
Put the cue word equation(12+8)*4/2)
The splitting was carried out and the calculation steps are shown below
- Split the logic in the parentheses first
12+8
invokeAdd
Methods to get results 20 - Then the second step uses the result obtained in the previous step to call the
Multiply
count20*4
Getting 80 - Then use the result of the previous step to call
Divide
count80/2
Result 40 - Finally, the steps and results of the Tools call are sent in a dialog to the
llama3.1
model, the model gets the final output
If we don't print the process log, the model will just output the
Assistant:
The correct calculation is:
(12+8)=20
20*4=80
80/2=40
Therefore,the answer is:40.
embedding model
As we mentioned aboveOllama
Not only can you use the dialog model but you can also use theembedding model
The function of theembedding model
In simple terms it is the process of feature lifting for text, images, speech, etc. using a model to get vector data. This is accomplished byOllama SDK
It is possible to useOllama
The code for the embedding function is shown below
string modelName = "nomic-embed-text:latest";
HttpClient client = new HttpClient();
= new Uri("http://127.0.0.1:11434/api");
= (3000);
using var ollama = new OllamaApiClient(client);
var embeddingResp = await (modelName, "c#It's a good programming language.");
($"[{(",", !)}]");
What you get is the vector information shown below
Vector data is available to compute the similarity, using thecosine angle (math.)
The concept of color table allows you to calculate the spatial distance of vectors; the closer the spatial distance, the more similar the two vectors are. If you understand the color tableRGB
It's easier to understand if, for example(255, 0, 0)
It's just solid red.(255, 10, 10)
It's also red, but not solid red. If you take the(255, 0, 0)
cap (a poem)(255, 10, 10)
Mapped onto a three-dimensional map of spatial coordinates they are close together, but they are not the same as pure blue(0, 0, 255)
The spatial distance is just as great as a closeX
axis, a closeZ
Axis. Vector databases, now familiar to everyone lock, probably use a similar principle. It is also the now popularRAG
Retrieve the basis for enhanced generation.
For example, I embedded the following two sentences into the model to get vector values, and then computed them bycosine angle (math.)
to compare their similarity
var embeddingResp = await (modelName, "c#It's a good programming language.");
var embeddingResp2 = await (modelName, "c#It's a good language.");
("similarity:" + CosineSimilarity([.. !], [.. embeddingResp2!.Embedding]));
//Calculate the cosine angle
public static double CosineSimilarity(double[] vector1, double[] vector2)
{
if ( != )
throw new ArgumentException("The vectors must have the same length");
double dotProduct = 0.0;
double magnitude1 = 0.0;
double magnitude2 = 0.0;
for (int i = 0; i < ; i++)
{
dotProduct += vector1[i] * vector2[i];
magnitude1 += vector1[i] * vector1[i];
magnitude2 += vector2[i] * vector2[i];
}
magnitude1 = (magnitude1);
magnitude2 = (magnitude2);
if (magnitude1 == 0.0 || magnitude2 == 0.0)
return 0.0; // Avoid dividing by zero
return dotProduct / (magnitude1 * magnitude2);
}
The similarity result obtained above is
Similarity: 0.9413230998586363
The similarity is high because both of their sentences express more or less the same meaning. But if I were to calculate the similarity of the following two sentences
var embeddingResp = await (modelName, "c# is a nice programming language");
var embeddingResp2 = await (modelName, "I like to eat mango and strawberry");
Then using the cosine values they are only similar in the sense that0.59
, because the two statements are hardly related at all.
Similarity: 0.5948448463206064
multimodal model
At the beginning of the conversation model are relatively single, are simple text conversation, with continuous upgrading, some models have supported multiple formats of input and output rather than just a single text, such as support for images, video, voice, etc., these models are called multimodal models. UsingOllama
conformllava
The model experience a handful of times, here I am using thellava:13b
. I found a random image online to store locally
Use this image to ask questions about the model with the following code
HttpClient client = new HttpClient();
= new Uri("http://127.0.0.1:11434/api");
= (3000);
using var ollama = new OllamaApiClient(client);
string modelName = "llava:13b";
string prompt = "What is in this picture?";
image = ("");
var enumerable = (modelName, prompt, images: [BitmapToBase64(image)], stream: true);
await foreach (var response in enumerable)
{
($"{}");
}
//Imageclassifier for repeated actionsbase64
public static string BitmapToBase64( bitmap)
{
MemoryStream ms1 = new MemoryStream();
(ms1, );
byte[] arr1 = new byte[];
= 0;
(arr1, 0, (int));
();
return Convert.ToBase64String(arr1);
}
I used a cue word to get the model to describe what's inside the image, and then converted this image into abase64
The encoding format is sent to the model together and the model returns the following
It is indeed powerful enough to describe the information accurately and the wording is quite good, if one were to describe what is in the picture, I'm sure most people wouldn't be able to describe it as well, I have to say the models are getting more and more powerful.
Using SemanticKernel
In addition to integratingOllama SDK
In addition, you can useSemantic Kernel
integrateOllama
, we know that by defaultSemantic Kernel
can only be usedOpenAI
cap (a poem)Azure OpenAI
interface format, but other model interfaces are not necessarily the same as theOpenAI
interface format for compatibility, sometimes even through theone-api
Such a service to fit in a bit. But don't worry.Ollama
compatibleOpenAI
The interface is formatted so that it can be used directly even without any adaptation services, we just need to re-adapt the request address.
using HttpClient httpClient = new HttpClient(new RedirectingHandler());
= (120);
var kernelBuilder = ()
.AddOpenAIChatCompletion(
modelId: "glm4:9b",
apiKey: "ollama",
httpClient: httpClient);
Kernel kernel = ();
var chatCompletionService = <IChatCompletionService>();
OpenAIPromptExecutionSettings openAIPromptExecutionSettings = new()
{
ToolCallBehavior =
};
var history = new ChatHistory();
string? userInput;
do
{
("User > ");
userInput = ();
(userInput!);
var result = (
history,
executionSettings: openAIPromptExecutionSettings,
kernel: kernel);
string fullMessage = "";
("Assistant > ");
await foreach (var content in result)
{
();
fullMessage += ;
}
();
(fullMessage);
} while (userInput is not null);
public class RedirectingHandler : HttpClientHandler
{
protected override Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request, CancellationToken cancellationToken)
{
var uriBuilder = new UriBuilder(!) { Scheme = "http", Host = "localhost", Port = 11434 };
//dialog model
if (request!.RequestUri!.("v1/chat/completions"))
{
= "/v1/chat/completions";
= ;
}
//embedding model
if (request!.RequestUri!.("v1/embeddings"))
{
= "/v1/embeddings";
= ;
}
return (request, cancellationToken);
}
}
Here we use a domestic modelglm4:9b
, it is important to note that because we are using a local service here, we need to adapt the address of the service, by writing theRedirectingHandler
class and use it to construct aHttpClient
instance is passed to theKernel
. Careful students may have noticed that here I am reposting theOllama
The path to the service also becomes the same as theOpenAI
service with the same path, but above I call theOllama
The service uses the/api/chat
cap (a poem)/api/embed
interface for such addresses. This is because theOllama
In order to be compatibleOpenAI
standards, specifically developed a set andOpenAI
interfaces with the same path and parameters, this is something to be aware of. Of courseOllama
Not all compatible yetOpenAI
The full characterization of the interface, for those interested in the/ollama/ollama/blob/main/docs/Documentation address for more details.
With the above service running, we can also have a conversation that works as follows
The same you can do withSemanticKernel
Use the functions of the embedded model as follows
using HttpClient httpClient = new HttpClient(new RedirectingHandler());
= (120);
var kernelBuilder = ()
.AddOpenAITextEmbeddingGeneration(
modelId:"nomic-embed-text:latest",
apiKey:"ollama",
httpClient: httpClient);
Kernel kernel = ();
var embeddingService = <ITextEmbeddingGenerationService>();
var embeddings = await (["I thinkc#It's a good programming language."]);
($"[{(",", embeddings[0].ToArray())}]");
Here's something for Hugh to keep in mindAddOpenAITextEmbeddingGeneration
method is an evaluation method that may be removed in future versions, so by default using this method with VS will result in an error alert, which can be found in thecsproj
(used form a nominal expression)PropertyGroup
Set it up in the tabNoWarn
to ignore this reminder.
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<NoWarn>SKEXP0010;SKEXP0001</NoWarn>
</PropertyGroup>
summarize
This article describes how theC#
combiningOllama
Implementing the deployment and invocation of local big language models, with a focus on demonstrating the use of theC#
Specific steps to integrate the feature in your application. Help developers get started quickly with detailed installation guides and code examples.
- First we introduced the
Ollama
installation and the use of basic settings and commands. - It then describes how to pass the
Ollama
Calling large models, such as using thecommand line (computing)
、Http interface service
、visual interface
。 - Again we we pass the
C#
UsedOllama SDK
I'm here to demonstrate.dialog mode
、Text embedding
、multimodal model
How to use it, by the way, is related to similarity calculation. - Lastly, we show the results of the passage of the
Semantic Kernel
call (programming)Ollama
Services asOllama
treat (sb a certain way)OpenAI
The data format of the interface is compatible, and although there are still parts that are not compatible, it is not a big problem for daily use.
Through this article, I hope that students who have not understood the big model can get started or probably understand the relevant basics, after all, this is the last two years or the next few years are relatively hot a direction. Even if we can not study him in depth, but we have to know it to understand its basic principles and use. Why do we have to continue to learn, because these things are often really can provide us with convenience. Exposure to it, understand it, to really know what it can help me to solve the problem.