Ollama + JuiceFS: Pull once, run everywhere!

Today's blog is reprinted from our full-stack engineer Zhu Weiwei. When using Ollma for large model loading, she tried to use JuiceFS for model sharing. JuiceFS's data preheating and distributed caching features significantly improved loading efficiency and optimized performance bottlenecks.

01 Background

With the development of AI technology, big models have subtly influenced our lives. Commercial LLMs have always been separated from users by a gap because of fees, black boxes, data security and other reasons. More and more big models have chosen to open source, so that users can use their own big models more conveniently and reliably.

Ollma is a tool that simplifies the deployment and running of big models, on the one hand, by providing Docker-like usage, running a big model instance is as easy as starting a container, on the other hand, by providing OpenAI-compatible APIs, smoothing out the differences in usage between big models.

In order to avoid using "artificial retardation", we will choose a model with the largest possible size parameters, but it is well known that the larger the model parameters, although having a better performance. But also has a larger size, for example, the Llama 3.1 70B model size is 40GB.

In today's world, managing a large file that is strongly related to business functions is a headache. There are generally no more than two options, one is model productization and the other is shared storage.

Model productizationThe big model itself is built into the artifact deliverable, whether it's a Docker image or an OS snapshot, and it seeks to accomplish versioning and distribution of the big model through the capabilities of IaaS or PaaS;
shared storage: The idea of shared storage is simpler, just put the big model in a shared file system and pull it on demand.

Model productization is more like a hot startHowever, by reusing the ability to distribute platform layer artifacts, the large model is already local when the instance is ready, but its bottleneck lies in the distribution of large files, and the means of distributing large artifacts is still limited at this stage of software engineering development.

Shared storage is more like a cold startThe remote model file is visible at instance startup, but needs to be loaded remotely to run. Although shared storage is a very intuitive way to do this, it is very testing, and the shared storage itself may be the bottleneck of the entire loading phase.

But.It's a different story if a shared storage itself also supports data warming, distributed caching, and other means of hot-starting, and JuiceFS is one such program。

In this article, we will introduce the JuiceFS shared storage part of the project through a demo, based on the distributed file system capabilities provided by JuiceFS, making Ollama model files, once pulled, run everywhere.

02 One pull

This article demonstrates how to pull models using a Linux machine as an example.

Preparing the JuiceFS File System

By default, Ollama places model data in the/root/.ollama so here JuiceFS is mounted under the/root/.ollama Down:

$ juicefs mount weiwei /root/.ollama --subdir=ollama

The model data pulled by Ollama will then be placed on the JuiceFS file system.

Pull model installation

Ollama：

curl -fsSL / | sh

Pulling the model, here we use llama 3.1 8B as an example:

$ ollama pull llama3.1
pulling manifest
pulling 8eeb52dfb3bb... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB
pulling 11ce4ee3e170... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB
pulling 0ba8f0e314b4... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  12 KB
pulling 56bb8bd477a5... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏   96 B
pulling 1a4c3c319823... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success

Ollama allows users to create their own models with a Modelfile, written in the same way as a Dockerfile type. Here we setup a system prompt based on llama 3.1:

$ cat <<EOF > Modelfile
> FROM llama3.1

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system message
SYSTEM """
You are a literary writer named Rora. Please help me polish my writing.
"""
> EOF
$
$ ollama create writer -f ./Modelfile
transferring model data
using existing layer sha256:8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258
using existing layer sha256:0ba8f0e314b4264dfd19df045cde9d4c394a52474bf92ed6a3de22a4ca31a177
creating new layer sha256:1dfe258ba02ecec9bf76292743b48c2ce90aefe288c9564c92d135332df6e514
creating new layer sha256:7fa4d1c192726882c2c46a2ffd5af3caddd99e96404e81b3cf2a41de36e25991
creating new layer sha256:ddb2d799341563f3da053b0da259d18d8b00b2f8c5951e7c5e192f9ead7d97ad
writing manifest
success

View the list of models:

$  ollama list
NAME            ID           SIZE   MODIFIED
writer:latest   346a60dbd7d4 4.7 GB 17 minutes ago
llama3.1:latest 91ab477bec9d 4.7 GB 4 hours ago

03 Running everywhere

The JuiceFS filesystem now contains the big model pulled with Ollama, so you can run it anywhere else with JuiceFS mounted. In this article, we'll demonstrate running big models with Ollama on Linux, Mac, and Kubernetes, respectively.

Linux

You can run it directly on a machine that has JuiceFS mounted:

$ ollama run writer
>>> The flower is beautiful
A lovely start, but let's see if we can't coax out a bit more poetry from your words. How about this:

"The flower unfolded its petals like a gentle whisper, its beauty an unassuming serenade that drew the eye and stirred the soul."

Or, perhaps a slightly more concise version:

"In the flower's delicate face, I find a beauty that soothes the senses and whispers secrets to the heart."

Your turn! What inspired you to write about the flower?

Mac

Mount JuiceFS:

weiwei@hdls-mbp ~ juicefs mount weiwei .llama --subdir=ollama
.OK, weiwei is ready at /Users/weiwei/.llama.

Click on the link to install:/download/

One thing to note here is that when you pulled the model just now, it was stored as root, so you'll need to switch to root to run ollama on a Mac.

If you use a manually created writer model, there is a problem, the layer of the newly created model is written with 600 permissions, and you can only run it on a Mac if you manually set it to 644. This is a bug in Ollama, and I've already sent a PR to Ollama.
（/ollama/ollama/pull/6386). However, no new version has been released as of now. The temporary solution is as follows:

hdls-mbp:~ root# cd /Users/weiwei/.ollama/models/blobs
hdls-mbp:blobs root# ls -alh . | grep rw-------
-rw-------   1 root  wheel    14B  8 15 23:04 sha256-804a1f079a1166190d674bcfb0fa42270ec57a4413346d20c5eb22b26762d132
-rw-------   1 root  wheel   559B  8 15 23:04 sha256-db7eed3b8121ac22a30870611ade28097c62918b8a4765d15e6170ec8608e507
hdls-mbp:blobs root#
hdls-mbp:blobs root#  chmod 644 sha256-804a1f079a1166190d674bcfb0fa42270ec57a4413346d20c5eb22b26762d132 sha256-db7eed3b8121ac22a30870611ade28097c62918b8a4765d15e6170ec8608e507
hdls-mbp:blobs root#
hdls-mbp:blobs root#
hdls-mbp:blobs root#
hdls-mbp:blobs root# ollama list
NAME            ID           SIZE   MODIFIED
writer:latest   346a60dbd7d4 4.7 GB About an hour ago
llama3.1:latest 91ab477bec9d 4.7 GB 4 hours ago

Run the writer model and let it touch up the text for us:

hdls-mbp:weiwei root# ollama run writer
>>> The tree is very tall
A great start, but let's see if we can make it even more vivid and engaging.

Here's a revised version:

"The tree stood sentinel, its towering presence stretching towards the sky like a verdant giant, its branches dancing
in the breeze with an elegance that seemed almost otherworldly."

Or, if you'd prefer something simpler yet still evocative, how about this:

"The tree loomed tall and green, its trunk sturdy as a stone pillar, its leaves a soft susurrus of sound in the gentle
wind."

Which one resonates with you? Or do you have any specific ideas or feelings you want to convey through your writing
that I can help shape into a compelling phrase?

Kubernetes

JuiceFS provides a CSI Driver that enables users to use PVs directly in Kubernetes, supporting both static and dynamic configurations. Since we are directly using files already in the filesystem, we use static configuration here.

Prepare PVC and PV:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ollama-vol
  labels:
    juicefs-name: ollama-vol
spec:
  capacity:
    storage: 10Pi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: 
    volumeHandle: ollama-vol
    fsType: juicefs
    nodePublishSecretRef:
      name: ollama-vol
      namespace: kube-system
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ollama-vol
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi
  selector:
    matchLabels:
      juicefs-name: ollama-vol

Deploy Ollama:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  labels:
    app: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - image: /hdls/ollama:0.3.5
        env:
        - name: OLLAMA_HOST
          value: "0.0.0.0"
        ports:
        - name: ollama
          containerPort: 11434
        args:
        - "serve"
        name: ollama
        volumeMounts:
        - mountPath: /root/.ollama
          name: shared-data
          subPath: ollama
      volumes:
      - name: shared-data
        persistentVolumeClaim:
          claimName: ollama-vol
---
apiVersion: v1
kind: Service
metadata:
  name: ollama-svc
spec:
  selector:
    app: ollama
  ports:
    - name: http
      protocol: TCP
      port: 11434
      targetPort: 11434

Since the Ollama deployment deploys an Ollama server, it can be accessed as an api:

$ curl http://192.168.203.37:11434/api/generate -d '{
  "model": "writer",
  "prompt": "The sky is blue",
  "stream": false
}'
{"model":"writer","created_at":"2024-08-15T14:35:43.593740142Z","response":"A starting point, at least! Let's see... How about we add some depth to this sentence? Here are a few suggestions:\n\n* Instead of simply stating that the sky is blue, why not describe how it makes you feel? For example: \"As I stepped outside, the cerulean sky seemed to stretch out before me like an endless canvas, its vibrant hue lifting my spirits and washing away the weight of the world.\"\n* Or, we could add some sensory details to bring the scene to life. Here's an example: \"The morning sun had just risen over the horizon, casting a warm glow across the blue sky that seemed to pulse with a gentle light – a softness that soothed my skin and lulled me into its tranquil rhythm.\"\n* If you're going for something more poetic, we could try to tap into the symbolic meaning of the sky's color. For example: \"The blue sky above was like an open door, inviting me to step through and confront the dreams I'd been too afraid to chase – a reminder that the possibilities are endless, as long as we have the courage to reach for them.\"\n\nWhich direction would you like to take this?","done":true,"done_reason":"stop","context":[128006,9125,128007,1432,2675,527,264,32465,7061,7086,432,6347,13,5321,1520,757,45129,856,4477,627,128009,128006,882,128007,271,791,13180,374,6437,128009,128006,78191,128007,271,32,6041,1486,11,520,3325,0,6914,596,1518,1131,2650,922,584,923,1063,8149,311,420,11914,30,5810,527,264,2478,18726,1473,9,12361,315,5042,28898,430,279,13180,374,6437,11,3249,539,7664,1268,433,3727,499,2733,30,1789,3187,25,330,2170,358,25319,4994,11,279,10362,1130,276,13180,9508,311,14841,704,1603,757,1093,459,26762,10247,11,1202,34076,40140,33510,856,31739,323,28786,3201,279,4785,315,279,1917,10246,9,2582,11,584,1436,923,1063,49069,3649,311,4546,279,6237,311,2324,13,5810,596,459,3187,25,330,791,6693,7160,1047,1120,41482,927,279,35174,11,25146,264,8369,37066,4028,279,6437,13180,430,9508,311,28334,449,264,22443,3177,1389,264,8579,2136,430,779,8942,291,856,6930,323,69163,839,757,1139,1202,68040,37390,10246,9,1442,499,2351,2133,369,2555,810,76534,11,584,1436,1456,311,15596,1139,279,36396,7438,315,279,13180,596,1933,13,1789,3187,25,330,791,6437,13180,3485,574,1093,459,1825,6134,11,42292,757,311,3094,1555,323,17302,279,19226,358,4265,1027,2288,16984,311,33586,1389,264,27626,430,279,24525,527,26762,11,439,1317,439,584,617,279,25775,311,5662,369,1124,2266,23956,5216,1053,499,1093,311,1935,420,30],"total_duration":13635238079,"load_duration":39933548,"prompt_eval_count":35,"prompt_eval_duration":55817000,"eval_count":240,"eval_duration":13538816000}

04 Summary

Ollama is a tool that simplifies running big models locally by pulling them locally and then using simple commands to run them locally.JuiceFS acts as the underlying storage for the Big Model Registry, and because of its distributed nature, users can pull a model once and then use it elsewhere, making it possible to This makes it possible to pull a model once and run it everywhere.

I hope this has been of some help to you, and if you have any other questions feel free to join theJuiceFS CommunityCommunicate with everyone.