SemanticKernel/C#: implementation of interfaces to access local embedding models

preamble

In this article, we learn how to implement the ITextEmbeddingGenerationService interface to access the local embedding model through this project.

Project Address:/BLaZeKiLL/

fulfill

SemanticKernel at first glance thought to only support OpenAI's various models, but in fact also provides a powerful abstraction capability, you can implement your own interface to achieve access to models that are not compatible with the OpenAI format.

This project implements ITextGenerationService, IChatCompletionService and ITextEmbeddingGenerationService interfaces. Since Ollama's dialogs now support the OpenAI format, it is possible to access the models in Ollama without implementing the ITextGenerationService and IChatCompletionService to access the models in Ollama. However, the embedding in Ollama is not compatible with OpenAI format yet, so you can realize the ITextEmbeddingGenerationService interface to to access the embedding models in Ollama.

Check out the ITextEmbeddingGenerationService interface:

represents a generator that produces text embeddings of floating-point type.

Take another look at the IEmbeddingGenerationService<string, float> interface:

[Experimental("SKEXP0001")]
public interface IEmbeddingGenerationService<TValue, TEmbedding> : IAIService where TEmbedding : unmanaged
{
      Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));
}

Take another look at the IAIService interface:

suggests that we just need to implement the

Task<IList<ReadOnlyMemory<TEmbedding>>> GenerateEmbeddingsAsync(IList<TValue> data, Kernel? kernel = null, CancellationToken cancellationToken = default(CancellationToken));

IReadOnlyDictionary<string, object?> Attributes { get; }

This method and property will do.

How it's done in learning.

Add the OllamaBase class:

 public interface IOllamaBase
 {
     Task PingOllamaAsync(CancellationToken cancellationToken = new());
 }
 public abstract class OllamaBase<T> : IOllamaBase where T : OllamaBase<T>
 {
     public IReadOnlyDictionary<string, object?> Attributes => _attributes;
     private readonly Dictionary<string, object?> _attributes = new();
     protected readonly HttpClient Http;
     protected readonly ILogger<T> Logger;

     protected OllamaBase(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
     {
         _attributes.Add("model_id", modelId);
         _attributes.Add("base_url", baseUrl);

         Http = http;
         Logger = loggerFactory is not null ? loggerFactory.CreateLogger<T>() : NullLogger<T>.Instance;
     }

     /// <summary>
     /// Ping Ollama instance to check if the required llm model is available at the instance
     /// </summary>
     /// <param name="cancellationToken"></param>
     public async Task PingOllamaAsync(CancellationToken cancellationToken = new())
     {
         var data = new
         {
             name = Attributes["model_id"]
         };

         var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/show", data, cancellationToken).ConfigureAwait(false);

         ValidateOllamaResponse(response);

         Logger.LogInformation("Connected to Ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
     }

     protected void ValidateOllamaResponse(HttpResponseMessage? response)
     {
         try
         {
             response.EnsureSuccessStatusCode();
         }
         catch (HttpRequestException)
         {
             Logger.LogError("Unable to connect to ollama at {url} with model {model}", Attributes["base_url"], Attributes["model_id"]);
         }
     }
 }

Watch this.

public IReadOnlyDictionary<string, object?> Attributes => _attributes;

implements the properties in the interface.

Add the OllamaTextEmbeddingGeneration class:

#pragma warning disable SKEXP0001
    public class OllamaTextEmbeddingGeneration(string modelId, string baseUrl, HttpClient http, ILoggerFactory? loggerFactory)
        : OllamaBase<OllamaTextEmbeddingGeneration>(modelId, baseUrl, http, loggerFactory),
            ITextEmbeddingGenerationService
    {
        public async Task<IList<ReadOnlyMemory<float>>> GenerateEmbeddingsAsync(IList<string> data, Kernel? kernel = null,
            CancellationToken cancellationToken = new())
        {
            var result = new List<ReadOnlyMemory<float>>(data.Count);

            foreach (var text in data)
            {
                var request = new
                {
                    model = Attributes["model_id"],
                    prompt = text
                };

                var response = await Http.PostAsJsonAsync($"{Attributes["base_url"]}/api/embeddings", request, cancellationToken).ConfigureAwait(false);

                ValidateOllamaResponse(response);

                var json = JsonSerializer.Deserialize<JsonNode>(await response.Content.ReadAsStringAsync().ConfigureAwait(false));

                var embedding = new ReadOnlyMemory<float>(json!["embedding"]?.AsArray().GetValues<float>().ToArray());

                result.Add(embedding);
            }

            return result;
        }
    }

Note the implementation of the GenerateEmbeddingsAsync method. The idea of the implementation is to send a request to the embedding interface in Ollama to get the embedding array.

Extension methods need to be added in order to work in MemoryBuilder:

#pragma warning disable SKEXP0001
    public static class OllamaMemoryBuilderExtensions
    {
        /// <summary>
        /// Adds Ollama as the text embedding generation backend for semantic memory
        /// </summary>
        /// <param name="builder">kernel builder</param>
        /// <param name="modelId">Ollama model ID to use</param>
        /// <param name="baseUrl">Ollama base url</param>
        /// <returns></returns>
        public static MemoryBuilder WithOllamaTextEmbeddingGeneration(
            this MemoryBuilder builder,
            string modelId,
            string baseUrl
        )
        {
            builder.WithTextEmbeddingGeneration((logger, http) => new OllamaTextEmbeddingGeneration(
                modelId,
                baseUrl,
                http,
                logger
            ));

            return builder;
        }       
    }

start using

 public async Task<ISemanticTextMemory> GetTextMemory3()
 {
     var builder = new MemoryBuilder();
     var embeddingEndpoint = "http://localhost:11434";
     var cancellationTokenSource = new System.Threading.CancellationTokenSource();
     var cancellationToken = cancellationTokenSource.Token;
     builder.WithHttpClient(new HttpClient());
     builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);
     IMemoryStore memoryStore = await SqliteMemoryStore.ConnectAsync("");
     builder.WithMemoryStore(memoryStore);
     var textMemory = builder.Build();
     return textMemory;
 }

  builder.WithOllamaTextEmbeddingGeneration("mxbai-embed-large:335m", embeddingEndpoint);

The extension method WithOllamaTextEmbeddingGeneration is implemented, so it can be written as such, using the vector model mxbai-embed-large:335m.

I used WPF to make a simple interface to try the effect.

Look for a news embed:

Text vectorization is stored in a database:

Now test the RAG effect:

The answer was also okay.

The large model uses Qwen/Qwen2-72B-Instruct from the online api, and the embedded model uses mxbai-embed-large:335m from the local Ollama.