LLM Exploration: Deploying Ollama and one-api services offline

Preface

DeepSeek has been deployed using Ollama on Linux servers before

This time, I deployed it on a server without an external network (it should be said that it is more restricted). I encountered some pitfalls, so I will record it.

ollama

Ollama naturally cannot use the online installation script

According to the documentation of ollama

First download the installation package on the local computer based on the server's system and CPU architecture

curl -L /download/ -o

Then use scp and other tools to upload to the server

scp server address:/temp

After connecting to the server, unzip the installation, follow the ollama document (see the first reference)

sudo tar -C /usr -xzf

At this time, the ollama program can be executed

ollama serve

Then add it to the service, which is also the official recommended practice of Ollama, which is convenient for management.

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)

Create a new file under /etc/systemd/system

[Unit]
Description=Ollama Service
After=

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"

[Install]
WantedBy=

Then enable the service

sudo systemctl daemon-reload
sudo systemctl enable ollama

Here the installation of ollama is done

Model deployment

Offline servers cannot use ollama pull to pull models

You need to download locally first, and you can perform the operation of ollama pull on your local computer

Then find the model file and upload it to the server

This is the general idea, let me introduce it in detail

Find the local model file

If there is no special configuration, the default model files of ollama are all in~/.ollama/models/blobsinside

First execute the command to see the path to the specified model, for example, you want to find the deepseek-r1:32b model

ollama show deepseek-r1:32b --modelfile

Output after executing the command (excerpt)

FROM C:\Users\deali\.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49
TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<｜User｜>{{ .Content }}
{{- else if eq .Role "assistant" }}<｜Assistant｜>{{ .Content }}{{- if not $last }}<｜end▁of▁sentence｜>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<｜Assistant｜>{{- end }}
{{- end }}"""
PARAMETER stop <｜begin▁of▁sentence｜>
PARAMETER stop <｜end▁of▁sentence｜>
PARAMETER stop <｜User｜>
PARAMETER stop <｜Assistant｜>

You can see this line

FROM C:\Users\deali\.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49

It is the path to the local model downloaded by ollama

Upload this file to the server

Export Modelfile

This file format is similar to Dockerfile

Export using the following command

ollama show deepseek-r1:32b --modelfile > Modelfile

Then the file must be uploaded to the server.

Importing models on the server

After the model file and Modelfile are uploaded, put it in the same directory.

Rename it first to facilitate subsequent imports

mv sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 deepseek-r1_32b.gguf

Then edit the Modelfile file and change the FROM line to, which is the model file name after the modification just now

FROM ./deepseek-r1_32b.gguf

Then execute the following command to import

ollama create deepseek-r1:32b -f Modelfile

If the import is successful without accident, you can execute itollama listTo see if it has been imported.

one-api

One API is an open source LLM (large language model) API management and distribution system, designed to access multiple large models uniformly through the standard OpenAI API format and use it out of the box. It supports a variety of mainstream big models, including OpenAI ChatGPT series, Anthropic Claude series, Google PaLM2/Gemini series, Mistral series, ByteDance Doubao big model, Baidu Wenxin Yiyan series model, Alibaba Tongyi Qianwen series model, Xun Feixinghuo cognitive model, Zhishu ChatGLM series model, Tencent Hunyuan big model, etc.

Docker deployment

One-api is developed using the gin framework of Go, and it is easy to deploy. I usually use docker to deploy. I won't repeat this.

services:
  db:
    image: mysql:8.1.0
    container_name: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: mysql-password
    volumes:
      - ./data:/var/lib/mysql
  one-api:
    image: justsong/one-api
    container_name: one-api
    restart: always
    ports:
      - "3000:3000"
    depends:
      - db
    environment:
      - SQL_DSN=root:mysql-password@tcp(db:3306)/one_api
      - TZ=Asia/Shanghai
      - TIKTOKEN_CACHE_DIR=/TIKTOKEN_CACHE_DIR
    volumes:
      - ./data:/data
      - ./TIKTOKEN_CACHE_DIR:/TIKTOKEN_CACHE_DIR

networks:
  default:
    name: one-api

Solve tiktoken issues

The problem encountered is that it relies on the tiktoken library. Tiktoken needs to be downloaded online token encoder

The solution is to read the error log, for example

one-api  | [FATAL] 2025/02/17 - 10:47:21 | relay/adaptor/openai/:26 [InitTokenEncoders] failed to get gpt-3.5-turbo token encoder: Get "/encodings/cl100k_base.tiktoken": dial tcp 57.150.97.129:443: i/o timeout, if you are using in offline environment, please set TIKTOKEN_CACHE_DIR to use exsited files

Here you need to/encodings/cl100k_base.tiktokendownload

We first download the file locally and upload it to the server

But it's not possible at this time

tiktoken only recognizes SHA-1 of URL

Generate SHA-1

TIKTOKEN_URL=/encodings/cl100k_base.tiktoken
echo -n $TIKTOKEN_URL | sha1sum | head -c 40

You can also synthesize a single line of commands

echo -n "/encodings/cl100k_base.tiktoken" | sha1sum | head -c 40

In this line of command,echo -nUsed to output the specified URL string (which-nThe function of the parameters isForbidden to add newline characters at the end of the output），sha1sumCalculate its SHA-1 hash value,head -c 40Intercept the first 40 characters, that is, the first 40 digits of the hash value.

The execution result is

9b5ad71b2ce5302211f9c61530b329a4922fc6a4

Then rename the cl100k_base.tiktoken file to the output9b5ad71b2ce5302211f9c61530b329a4922fc6a4

In the previous section, we have specified the TIKTOKEN_CACHE_DIR environment variable

Then put this 9b5ad71b2ce5302211f9c61530b329a4922fc6a4 file in the TIKTOKEN_CACHE_DIR directory.

If you encounter similar errors in the future, repeat the above operations until no errors are reported.

I'm currently using only two encoders downloaded

References

/ollama/ollama/blob/main/docs/
/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer
/cjdty/p/18659438
/p/20485169539