Seamless Integration, Instant Intelligence [I]: Dify-LLM Big Model Platform, Zero-Coding Integration Embedded in Third-Party Systems, 42K+ Star Labels Witness Exclusive Intelligent Programs [With ollama Deployment]

Seamless Integration, Instant Intelligence [I]: Dify-LLM Large Model Platform, Zero-Code Integration Embedded in Third-Party Systems, 42K+ Star Labels Witness Exclusive Intelligent Solutions

synopsis

1.1 Functional situation

Dify, a future-proof open source Large Language Model (LLM) application development platform, revolutionizes the integration of Backend as a Service (BaaS) and LLMOps, paving a fast track for developers from creative prototyping to efficient production. It is designed to break down technical barriers so that users with non-technical backgrounds can easily participate in the conceptualization and data operation of AI applications, and work together to shape the intelligent future.

Dify embeds a full range of technology cornerstones for building LLM applications, from the massive selection of model libraries (hundreds of models supported), to the efficient and intuitive Prompt orchestration interface, to the superior quality of the Retrieval Augmentation Generation (RAG) engine and the solid and reliable Agent framework. This integrated technology stack not only greatly simplifies the development process, but also gives developers unprecedented flexibility and creativity. Through its flexible process orchestration capabilities and user-friendly interface and API, Dify helps developers effectively avoid duplication of effort, allowing them to focus their valuable time and energy on innovative thinking and in-depth exploration of business needs.

The word Dify comes from Define + Modify, which means to define and continuously improve your AI application; it's Do it for you.

List of core features:
1. workflow: Build and test powerful AI workflows on canvas, utilizing all of the following features and more.
1. Comprehensive modeling support: Seamless integration with hundreds of proprietary/open-source LLMs and dozens of inference providers and self-hosted solutions, covering GPT, Mistral, Llama3, and any model compatible with the OpenAI API.
1. Prompt IDE: An intuitive interface for producing prompts, comparing model performance, and adding other features (such as text-to-speech) to chat-based applications.
2. RAG Pipeline: Extensive RAG functionality covering everything from document ingestion to retrieval, with out-of-the-box support for extracting text from PDF, PPT and other common document formats.
1. Agent: You can define an Agent based on LLM function calls or ReAct, and add pre-built or custom tools to the Agent.Dify provides more than 50 built-in tools for AI Agents, such as Google Search, DALL-E, Stable Diffusion, and WolframAlpha.
2. LLMOps: Monitor and analyze application logs and performance over time. You can continuously improve tips, datasets, and models based on production data and annotations.
1. back-end as a service: All of Dify's features come with corresponding APIs, so you can easily integrate Dify into your own business logic.
Function Comparison

Framework Schematic

1.2 Key technical characteristics

Local Model Inference Runtime Support: Xinference (recommended), OpenLLM, LocalAI, ChatGLM, Ollama, NVIDIA TIS
Agentic Workflow Features: Support Node
- LLM
- knowledge base search
- Classification of issues
- conditional branching
- code execution
- template conversion
- HTTP request
- artifact
RAG Characterization:
- Indexing method
  - byword
  - text vector
  - Problems Aided by LLM - Segmented Patterns
- Search method
  - byword
  - Text Similarity Matching
  - hybrid search
  - multiplex recall
- Recall Optimization Technology
  - Using the ReRank Model
Vector database support: Qdrant, Weaviate, Zilliz/Milvus, Pgvector, Pgvector-rs, Chroma, OpenSearch, TiDB, Tencent Vector, Oracle

1.3 Cloud services

Dify provides a cloud service for everyone, so you can use the full power of Dify without having to deploy it yourself. To use the Dify cloud service, you need a GitHub or Google account.

log inDify Cloud ServicesTo create a Workspace or join an existing Workspace, you need to create a Workspace or join an existing Workspace.
Configure your model provider, or use the hosted model providers we offer
Apps can be created now!

1.4 More LLM platform references:

RAG+AI Workflow+Agent: How to Choose an LLM Framework, a Comprehensive Comparison of MaxKB, Dify, FastGPT, RagFlow, Anything-LLM, and More!
Wisdom wins the future: domestic large model + Agent application case selection, as well as the mainstream Agent framework open source project recommendation

2. Community Edition deployment

2.1 Docker Compose Deployment (recommended)

The docker installation can be found in the following article:
- An article to get you started with the vector database milvus: with docker installation, milvus installation and use, attu visualization, a complete guide to start Milvus performed a vector similarity search Section 2.2 docker hub acceleration
- tRecommended View.Say goodbye to the DockerHub image download problem: Master an efficient download strategy and enjoy a seamless development experience!
- Installing Docker
- Installing Docker Compose
Cloning the Dify Code Repository

git clone /langgenius/

Launch Dify

# Go to the docker directory of the Dify source code and execute the one-click start command:
cd dify/docker
cp . .env
docker compose up -d

If your system has Docker Compose V2 installed instead of V1, use docker compose instead of docker-compose. check if this is the case with $ docker compose version. Read more here.

If you encounter a pulling failure problem, please add the mirror source and refer to the above recommended articles for solutions.

Deployment results are shown:

Finally, check to make sure that all containers are running properly:

docker compose ps

Includes 3 business services api / worker / web, and 6 base components weaviate / db / redis / nginx / ssrf_proxy / sandbox.

NAME                  IMAGE                              COMMAND                  SERVICE      CREATED          STATUS                    PORTS
docker-api-1          langgenius/dify-api:0.6.16         "/bin/bash /entrypoi…"   api          15 minutes ago   Up 15 minutes             5001/tcp
docker-db-1           postgres:15-alpine                 "…"   db           15 minutes ago   Up 15 minutes (healthy)   5432/tcp
docker-nginx-1        nginx:latest                       "sh -c 'cp /docker-e…"   nginx        15 minutes ago   Up 15 minutes             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp
docker-redis-1        redis:6-alpine                     "…"   redis        15 minutes ago   Up 15 minutes (healthy)   6379/tcp
docker-sandbox-1      langgenius/dify-sandbox:0.2.1      "/main"                  sandbox      15 minutes ago   Up 15 minutes             
docker-ssrf_proxy-1   ubuntu/squid:latest                "sh -c 'cp /docker-e…"   ssrf_proxy   15 minutes ago   Up 15 minutes             3128/tcp
docker-weaviate-1     semitechnologies/weaviate:1.19.0   "/bin/weaviate --hos…"   weaviate     15 minutes ago   Up 15 minutes             
docker-web-1          langgenius/dify-web:0.6.16         "/bin/sh ./entrypoin…"   web          15 minutes ago   Up 15 minutes             3000/tcp
docker-worker-1       langgenius/dify-api:0.6.16         "/bin/bash /entrypoi…"   worker       15 minutes ago   Up 15 minutes             5001/tcp

Update Dify

Go to the docker directory of the dify source code and execute the following commands in order:

cd dify/docker
docker compose down
git pull origin main
docker compose pull
docker compose up -d

Synchronized environment variable configuration (Important!)

If the . file is updated, be sure to synchronize changes to your local .env file.
Check all the configuration items in the .env file to make sure they match your actual runtime environment. You may need to add new variables from . You may need to add the new variables in . to the .env file and update any values that have changed.
Visit Dify

docker deployment is complete, enter the command sudo docker ps to see running containers, which in the list of running containers can be seen in a nginx container, and external access to the 80 port, this is the external access to the port, the following local access to our test

In your browser, typehttp://localhost Visit Dify. visithttp://127.0.0.1:80 Ready to use locally deployed Dify.

10.80.2.195:80

Fill in whatever you want and go to the interface

Customized Configuration

Edit the environment variable values in the .env file. Then, restart Dify:

docker compose down
docker compose up -d

The complete set of environment variables can be found in docker/. The full set of environment variables can be found in the

2.2 Local Code Source Deployment

pre-conditions

Clone Dify Code:

git clone /langgenius/

Before enabling the business services, PostgresSQL / Redis / Weaviate needs to be deployed (if not available locally), which can be started with the following command:

cd docker
cp  
docker compose -f  up -d

Server-side deployment
- API Interface Services
- Worker Asynchronous Queue Consumption Service
Installation of the base environment
Python 3. is required for server startup. Recommended to use anaconda installation, refer to the article:
- Linux and Windows: Installing Anaconda
- Anaconda Installation Tutorial
- It is also possible to use thepyenv，pyenv install 3.10, switching to the "3.10" Python environment.pyenv global 3.10

Go to the api directory

cd api

Copy the environment variable configuration file.

cp . .env

Generate a random key and replace the value of SECRET_KEY in .env

openssl rand -base64 42
sed -i 's/SECRET_KEY=.*/SECRET_KEY=<your_value>/' .env

Installation of dependency packages

The Dify API service uses Poetry to manage dependencies. You can execute the poetry shell to activate the environment.

poetry env use 3.10
poetry install

Perform a database migration to migrate the database structure to the latest version.

poetry shell
flask db upgrade

Starting the API Service

flask run --host 0.0.0.0 --port=5001 --debug

Correct output:

* Debug mode: on
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5001
INFO:werkzeug:Press CTRL+C to quit
INFO:werkzeug: * Restarting with stat
WARNING:werkzeug: * Debugger is active!
INFO:werkzeug: * Debugger PIN: 695-801-919

Starting the Worker Service

Used to consume asynchronous queue tasks such as dataset file imports, updating dataset documents, and other asynchronous operations. Linux / MacOS startup:

celery -A  worker -P gevent -c 1 -Q dataset,generation,mail,ops_trace --loglevel INFO

Replace this command if you are booting with a Windows system:

celery -A  worker -P solo --without-gossip --without-mingle -Q dataset,generation,mail,ops_trace --loglevel INFO

 -------------- celery@ v5.2.7 (dawn-chorus)
--- ***** ----- 
-- ******* ---- macOS-10.16-x86_64-i386-64bit 2023-07-31 12:58:08
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         app:0x7fb568572a10
- ** ---------- .> transport:   redis://:**@localhost:6379/1
- ** ---------- .> results:     postgresql://postgres:**@localhost:5432/dify
- *** --- * --- .> concurrency: 1 (gevent)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> dataset          exchange=dataset(direct) key=dataset
                .> generation       exchange=generation(direct) key=generation
                .> mail             exchange=mail(direct) key=mail

[tasks]
  . tasks.add_document_to_index_task.add_document_to_index_task
  . tasks.clean_dataset_task.clean_dataset_task
  . tasks.clean_document_task.clean_document_task
  . tasks.clean_notion_document_task.clean_notion_document_task
  . tasks.create_segment_to_index_task.create_segment_to_index_task
  . tasks.deal_dataset_vector_index_task.deal_dataset_vector_index_task
  . tasks.document_indexing_sync_task.document_indexing_sync_task
  . tasks.document_indexing_task.document_indexing_task
  . tasks.document_indexing_update_task.document_indexing_update_task
  . tasks.enable_segment_to_index_task.enable_segment_to_index_task
  . tasks.generate_conversation_summary_task.generate_conversation_summary_task
  . tasks.mail_invite_member_task.send_invite_member_mail_task
  . tasks.remove_document_from_index_task.remove_document_from_index_task
  . tasks.remove_segment_from_index_task.remove_segment_from_index_task
  . tasks.update_segment_index_task.update_segment_index_task
  . tasks.update_segment_keyword_index_task.update_segment_keyword_index_task

[2024-07-31 13:58:08,831: INFO/MainProcess] Connected to redis://:**@localhost:6379/1
[2024-07-31 13:58:08,840: INFO/MainProcess] mingle: searching for neighbors
[2024-07-31 13:58:09,873: INFO/MainProcess] mingle: all alone
[2024-07-31 13:58:09,886: INFO/MainProcess] pidbox: Connected to redis://:**@localhost:6379/1.
[2024-07-31 13:58:09,890: INFO/MainProcess] celery@ ready.

Front-end page deployment
Installation of the base environment
Web front-end service startup requires the (LTS) 、NPM Version maybeYarn。

Installing NodeJS + NPM
go into/en/download, select the corresponding operating system's Download and install the above installer, recommended stable version, comes with NPM already.

Go to the web directory and install the dependencies

cd web
npm install

Configure environment variables. Create the file . in the current directory and copy the contents of . in the current directory and copy the contents of . Modify the values of these environment variables as required.

#For production release, change this to PRODUCTION
NEXT_PUBLIC_DEPLOY_ENV=DEVELOPMENT
#The deployment edition, SELF_HOSTED
NEXT_PUBLIC_EDITION=SELF_HOSTED
#The base URL of console application, refers to the Console base URL of WEB service if console domain is
#different from api or web app domain.
#example: /console/api
NEXT_PUBLIC_API_PREFIX=http://localhost:5001/console/api
#The URL for Web APP, refers to the Web App base URL of WEB service if web app domain is different from
#console or api domain.
#example: /api
NEXT_PUBLIC_PUBLIC_API_PREFIX=http://localhost:5001/api

#SENTRY
NEXT_PUBLIC_SENTRY_DSN=
NEXT_PUBLIC_SENTRY_ORG=
NEXT_PUBLIC_SENTRY_PROJECT=

Build the code and start the web service

npm run build

npm run start
#or
yarn start
#or
pnpm start

The terminal will output the following message:

ready - started server on 0.0.0.0:3000, url: http://localhost:3000
warn  - You have enabled experimental feature (appDir) in .
warn  - Experimental features are not covered by semver, and may cause unexpected or broken application behavior. Use at your own risk.
info  - Thank you for testing `appDir` please leave your feedback at /app-feedback

Visit Dify

In your browser, typehttp://localhost Visit Dify. visithttp://127.0.0.1:3000 Ready to use locally deployed Dify.

2.3 Starting the front-end Docker container alone

When developing the back-end alone, you may only need the source code to start the back-end service without building the front-end code locally and starting it, so you can start the front-end service directly by pulling the docker image and starting the container, as shown below:

Use the DockerHub image directly

docker run -it -p 3000:3000 -e CONSOLE_API_URL=http://127.0.0.1:5001 -e APP_API_URL=http://127.0.0.1:5001 langgenius/dify-web:latest

Building Docker images from source code
1. Building a front-end image
```
cd web && docker build . -t dify-web
```
1. Starting the front-end image
```
docker run -it -p 3000:3000 -e CONSOLE_API_URL=http://127.0.0.1:5001 -e APP_API_URL=http://127.0.0.1:5001 dify-web
```
1. When the console domain name and the Web APP domain name do not match, you can set CONSOLE_URL and APP_URL separately.
  local accesshttp://127.0.0.1:3000

3. Local model for Ollama deployment

Ollama is an open source framework designed for easy deployment and running of Large Language Models (LLMs) on local machines. Here is the official website address for Ollama:/

The following is an overview of its main features and functions:
1. Simplified Deployment: Ollama aims to simplify the process of deploying large language models in Docker containers, making it easy for non-expert users to manage and run these complex models.
2. Lightweight and Scalable: As a lightweight framework, Ollama maintains a small resource footprint while being scalable, allowing users to adjust configurations as needed to accommodate projects of different sizes and hardware conditions.
3. API Support: Provides a clean API that makes it easy for developers to create, run and manage large language model instances, lowering the technical barrier to interacting with the model.
4. Pre-built model library: contains a series of pre-trained large-scale language models, which can be used directly by users to apply to their own applications without the need to train from scratch or find their own model sources.

3.1 One-Click Installation

curl: (77) error setting certificate verify locations:CAfile: /data/usr/local/anaconda/ssl/: none
Reason: The address path CAfile is incorrect, i.e., the file cannot be found under this path.

Solution:

Find the location of your file /path/to/. If you don't have that certificate, you can start by adding the/ca/ Download it and save it in a directory somewhere.
Setting environment variables

export CURL_CA_BUNDLE=/path/to/
#Replace "/path/to/" with the actual path to your certificate file.
export CURL_CA_BUNDLE=/www/anaconda3/anaconda3/ssl/

Execute downloads

curl -fsSL / | sh

3.2 Manual installation

ollama Chinese:/getting-started/linux/

Download the ollama binary: Ollama is distributed as a self contained binary. Download it to a directory in your PATH:

sudo curl -L /download/ollama-linux-amd64 -o /usr/bin/ollama

sudo chmod +x /usr/bin/ollama

Add Ollama as a startup service (recommended): Create a user for Ollama:

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

3. Create a service file in /etc/systemd/system/:

#vim  

[Unit]

Description=Ollama Service
After=

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3

[Install]
WantedBy=

Then start the service:

sudo systemctl enable ollama

Launch Ollama¶
Use systemd to start Ollama:

sudo systemctl start ollama

Update, view log

# Run it again
sudo curl -L /download/ollama-linux-amd64 -o /usr/bin/ollama
sudo chmod +x /usr/bin/ollama

# To view the journal of Ollama running as a startup service, run:
journalctl -u ollama

Step 7: Shut down the Ollama service

# Shut down the ollama service
service ollama stop

3.3 Linux Intranet Offline Installation of Ollama

Check the server CPU model

##Command to view CPU model of Linux system, my server cpu model is x86_64
lscpu

Step 2: Download the Ollama installation package according to the CPU model and save it to the directory

Download Address:/ollama/ollama/releases/

#x86_64 CPU choose to download ollama-linux-amd64
#aarch64|arm64 CPU choose to download ollama-linux-arm64

#The same applies to the download from an internet-enabled machine
wget /download/ollama-linux-amd64

Download to an offline server:/usr/bin/ollama ollama is the ollama-linux-amd64 you downloaded renamed (mv), other steps are consistent

3.4 Modifying the storage path

Ollama models are stored by default:

macOS: ~/.ollama/models
Linux: /usr/share/ollama/.ollama/models
Windows: C:\Users<username>.ollama\models

If Ollama is running as a systemd service, the following command should be used to set the environment variable systemctl:

Edit the systemd service systemctl edit by calling . This will open an editor.
EnvironmentFor each environment variable, add a line [Service] under the section:

Added 2 lines directly to "/etc/systemd/system/".

[Service]
Environment="OLLAMA_HOST=0.0.0.0:7861"
Environment="OLLAMA_MODELS=/www/algorithm/LLM_model/models"

Save and exit.
Reload systemd and restart Ollama:

systemctl restart ollama

Reference Links:/ollama/ollama/blob/main/docs/

Use systemd to start Ollama:

sudo systemctl start ollama

terminate (law)

Terminate (the big model loaded by ollama will stop occupying the video memory, at this time ollama belongs to the out-of-connection state, the deployment and running operation is invalid, and will report an error:

Error: could not connect to ollama app, is it running? needs to be started before it can deploy and run operations

systemctl stop

Startup after termination (after startup, you can then use ollama to deploy and run the big model)

systemctl start

3.5 Launching LLM

Download model

ollama pull llama3.1
ollama pull qwen2

Running the big model

ollama run llama3.1
ollama run qwen2

To see if large models are recognized: theollama list, if successful, you will see the large model

ollama list
NAME            ID              SIZE    MODIFIED    
qwen2:latest    e0d4e1163c58    4.4 GB  3 hours ago

using thisollama pscommand to view the models currently loaded into memory.

NAME            ID              SIZE    PROCESSOR       UNTIL              
qwen2:latest    e0d4e1163c58    5.7 GB  100% GPU        3 minutes from now

nvidia-smi view

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id         | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB           On  | 00000000:00:08.0 Off |                    0 |
| N/A   35C    P0              56W / 300W |   5404MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   3062036      C   ...unners/cuda_v11/ollama_llama_server     5402MiB |
+---------------------------------------------------------------------------------------+

Once started, we can verify that it is available:

curl http://10.80.2.195:7861/api/chat -d '{
  "model": "llama3.1",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

3.6 More Other Configurations

Environment variables that can be set by Ollama：

OLLAMA_HOST: This variable defines which network interfaces Ollama listens on. By setting OLLAMA_HOST=0.0.0.0, we can allow Ollama to listen to all available network interfaces, thus allowing external network access.
OLLAMA_MODELS: This variable specifies the storage path for the model image. By setting OLLAMA_MODELS=F:\OllamaCache, we can store the model image in the E drive to avoid the problem of insufficient space in the C drive.
OLLAMA_KEEP_ALIVE: This variable controls how long the model stays alive in memory. Setting OLLAMA_KEEP_ALIVE=24h keeps the model in memory for 24 hours, increasing access speed.
OLLAMA_PORT: This variable allows us to change the default port for Ollama. For example, setting OLLAMA_PORT=8080 changes the service port from the default 11434 to 8080.
OLLAMA_NUM_PARALLEL: This variable determines the number of user requests that Ollama can handle simultaneously. Setting OLLAMA_NUM_PARALLEL=4 allows Ollama to handle two concurrent requests at the same time.
OLLAMA_MAX_LOADED_MODELS: This variable limits the number of models that Ollama can load at the same time. Setting OLLAMA_MAX_LOADED_MODELS=4 ensures that system resources are allocated appropriately.

Environment="OLLAMA_PORT=9380" didn't work.

Specify it this way:Environment="OLLAMA_HOST=0.0.0.0:7861"
Specify GPU
There are multiple GPUs locally, how to run Ollama with the specified GPU? Create the following configuration file on Linux and configure the environment variable CUDA_VISIBLE_DEVICES to specify the GPU to run Ollama, and then restart the Ollama service. [The test serial number starts from 0 or 1, it should start from 0].

vim /etc/systemd/system/
[Service]
Environment="CUDA_VISIBLE_DEVICES=0,1"

3.7 Ollama Common Commands

Reboot ollama

systemctl daemon-reload
systemctl restart ollama

Restart the ollama service

ubuntu/debian

sudo apt update
sudo apt install lsof
stop ollama
lsof -i :11434
kill <PID>
ollama serve

Ubuntu

sudo apt update
sudo apt install lsof
stop ollama
lsof -i :11434
kill <PID>
ollama serve

Confirm the service port status:

netstat -tulpn | grep 11434

Configuration Services

The HOST needs to be configured in order for the service to be accessible to the outside environment.

Open the configuration file:

vim /etc/systemd/system/

Modify the variable Environment as appropriate:

server environment:

Environment="OLLAMA_HOST=0.0.0.0:11434"

virtual machine environment:

Environment="OLLAMA_HOST=Server Intranet IP Address:11434"

3.8 Uninstalling Ollama

If you decide that you no longer want to use Ollama, you can remove it completely from your system by following these steps:

(1) Stop and disable the service:

sudo systemctl stop ollama
sudo systemctl disable ollama

(2) Delete the service file and the Ollama binary:

sudo rm /etc/systemd/system/ 
sudo rm $(which ollama)

(3) Clean up Ollama users and groups:

sudo rm -r /usr/share/ollama
sudo userdel ollama
sudo groupdel ollama

By following these steps, you will not only be able to successfully install and configure Ollama on the Linux platform, but also have the flexibility to update and uninstall it.

4. Configure LLM+Dify

Confirm the service port status:

netstat -tulnp | grep ollama
#netstat -tulpn | grep 11434

report an error： "Error: could not connect to ollama app, is it running?"

Reference Links:/questions/78437376/run-ollama-run-llama3-in-colab-raise-err-error-could-not-connect-to-ollama

The /etc/systemd/system/ file is:

[Service]
ExecStart=/usr/local/bin/ollama serve
Environment="OLLAMA_HOST=0.0.0.0:7861"
Environment="OLLAMA_KEEP_ALIVE=-1"

runtime instruction

export OLLAMA_HOST=0.0.0.0:7861
ollama list
ollama run llama3.1

#You can also add it directly to the environment variable
vim ~/.bashrc
source ~/.bashrc

Fill in Settings > Model Provider > Ollama:

Model Name: llama3.1
Base URL:http://<your-ollama-endpoint-domain>:11434
- The address of the Ollama service that can be accessed is required here.
- If Dify is deployed as a docker, it is recommended to fill in the LAN IP address, for example:http://10.80.2.195:11434 or the docker host IP address, for example:http://172.17.0.1:11434。
- If deployed as a local source, you can fill in thehttp://localhost:11434。
Model type: dialog
Model context length: 4096
- The maximum context length of the model, if not clear you can fill in the default value 4096.
Maximum token limit: 4096
- The maximum number of tokens to be returned by the model for content, which may be consistent with the model context length if not otherwise specified by the model.
Does Vision support: Yes
- Check this box when the model supports picture understanding (multimodal), as in llava.
Click "Save" to verify that the model is correct and can be used in your application.
The Embedding model is accessed in a similar way to LLM, by changing the model type to Text Embedding.

If you deploy Dify and Ollama with Docker, you may encounter the following error.

httpconnectionpool(host=127.0.0.1, port=11434): max retries exceeded with url:/cpi/chat (Caused by NewConnectionError('< object at 0x7f8562812c20>: fail to establish a new connection:[Errno 111] Connection refused'))

httpconnectionpool(host=localhost, port=11434): max retries exceeded with url:/cpi/chat (Caused by NewConnectionError('< object at 0x7f8562812c20>: fail to establish a new connection:[Errno 111] Connection refused'))

This error is due to the Docker container being unable to access the Ollama service. localhost usually refers to the container itself, not the host or another container. To resolve this issue, you need to expose the Ollama service to the network.

4.1. Multi-model comparison

Just refer to the individual model deployments for another configuration add-on

It should be noted that after adding a new model configuration, you need to refresh the dedify web page, directly web-side refresh is good, the newly added model will be loaded in!

You can see the model resource consumption after the call

More recommendations

(Cpolar Intranet Penetration) Linux System Docker Builds Dify Platform and Realizes Remote Building of Generative AI Applications
More LLM platform references:
- RAG+AI Workflow+Agent: How to Choose an LLM Framework, a Comprehensive Comparison of MaxKB, Dify, FastGPT, RagFlow, Anything-LLM, and More!
- Wisdom wins the future: domestic large model + Agent application case selection, as well as the mainstream Agent framework open source project recommendation
Official Website:/zh
github address:/langgenius/dify/tree/main
ollama Chinese website:/
Installation tutorial for ollama:/getting-started/linux/
Ollama Linux Deployment and Applications LLama 3

More quality content please pay attention to the public number: Ting, artificial intelligence; will provide some related resources and quality articles, free access to read.

More quality content please follow CSDN: Ting, Artificial Intelligence; will provide some related resources and quality articles, free access to read.