LLM-01 Large Model Local Deployment Running ChatGLM2-6B-INT4(6GB) Easy to get started Environment Configuration Standalone Single Card Multicard 2070Super8GBx2 Fighting Monsters and Upgrading!

Relocation instructions

I've posted articles on CSDN before, and have been thinking about trying to post some good ones out there! This entry was posted in CSDN on 2024-04-17 10:11:55

write sth. upfront

Other video card environments are fine! But it should have at least 8GB of video memory or it will blow up easily.
If you have multiple graphics cards, a single multi-card is a great solution!!!!

Background

Currently borrowing a server from an algorithmic group, we can check out the current graphics card situation

nvidia-smi

PS: (Follow-up has been done on theCUDAet ceteraescalate (in intensity)(See my other post with details of the upgrade process)

Project Address

Official address:

# Projects need to be cloned
/THUDM/ChatGLM2-6B
# Model download (if you don't have science, trouble a bit need to download manually)
/d/674208019e314311ab5c/?p=%2Fchatglm2-6b-int4&mode=list
# Model downloads (if you can be scientific, the official download experience is more comfortable)
/THUDM/chatglm2-6b-int4

We need to clone the project and we need to download the corresponding model, if you have science, you can ignore the model download as it will download itself when you start the project.

Configuration Requirements

According to the official description, you can see the corresponding graphics card requirements, which in my case (2070Super 8GB * 2), here I chose to download theINT4of the model.

Installing Pyenv

Since many different project teams have different python version requirements, as well as different version requirements, you will need to configure a separate environment.
Here you can chooseConda, or you can choose pyenv, or docker. the option I chose is:pyenv

# pyenv official address
/pyenv/pyenv

Once the installation is complete, remember to configure the environment variables:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

If you're using ZSH as I am:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc

Testing Pyenv

# View the current Python situation on your system
pyenv versions

Using Pyenv

# Python version
pyenv local 3.10
# Standalone environment
python -m venv env
# Switch environments
source env/bin/active
# cd to the project directory
# Install Python libraries pip install -

You'll see something similar to what I'm testing here on a MacBook:

Installation of dependencies

# Python version
pyenv local 3.10
# Standalone environment
python -m venv env
# Switch environments
source env/bin/active
# cd to the project directory
# Install Python libraries pip install -

Caution. It's two parts: (this is my server's configuration, you'll have to figure out where to place your content as well) as shown below:

Project folder /home/jp/wzk/chatglm2-6b-int4/ChatGLM2-6B
Model folder /home/jp/wzk/chatglm2-6b-int4/chatglm2-6b-int4

Project folder:

model folder

Initiation of projects

In the project's directory, we use the ready-made direct start: web_demo.py

# Open it first and take a look
vim web_demo.py

model_path You downloaded it.model folder(You can leave it unchanged if you're not downloading manually, in which case it will download automatically)

PS: At this point you need to go to the last line and change the exposed service

# The code is modified to look like this
().launch(server_name="0.0.0.0", server_port=7861, share=False, inbrowser=True)

Exit save and we start the service:

python web_demo.py

Use of items

Complete the above and wait a bit to see it:

Depending on your server's IP and port, just visit.

multi-card activation

Since a single card can easily blow OOM, and it just so happens that here is 2 * 2070Super 8GB, we can split the model between the two cards with a simple code change.
The official solution given is to start it through the accelerate library.

Modify the web_demo.py just now, please see the picture for the detailed location:

# Modify the number of GPUs to 2
model = load_model_on_gpus(model_path, num_gpus=2)

Just reboot and it's already a multi-card boot!!!!