beOllama、LMStudio and the underlying implementations of many other popular projects are alsoGPUStack One of the inference engines supported, it provides theGGUF Model File Format.GGUF (General Gaussian U-Net Format) is a file format for storing models for inference, designed to be optimized for inference and to allow fast loading and running of models.
Quantized models are also supported to reduce model storage and computation requirements while maintaining high model accuracy, allowing large models to be efficiently deployed on desktops, embedded devices, and resource-constrained environments with increased inference speed.
Today brings an introduction to how to make and quantifyGGUF model, upload the model to theHuggingFace cap (a poem)ModelScope A tutorial on how to operate the Model Warehouse.
Registering and Configuring HuggingFace and ModelScope
- Register for HuggingFace
interviews/join
Sign up for a HuggingFace account (requires some sort of internet access)
- Configuring HuggingFace SSH Public Keys
Add the SSH public key of your local environment to HuggingFace and check the SSH public key of your local environment (if you don't have it you can use thessh-keygen -t rsa -b 4096
command is generated):
cat ~/.ssh/id_rsa.pub
Click on your avatar in the upper right corner of HuggingFace and select theSettings
- SSH and GPG Keys
, add the public key above for authentication later when uploading the model.
- Register ModelScope
interviews/register?back=%2Fhome
Register for a ModelScope account
- Get ModelScope Token
interviews/my/myaccesstoken
If you have a Git access token, copy it and save it for later authentication when you upload the model.
Preparation Environment
Create and activateConda Environment (reference if not installed)Miniconda Installation:/miniconda/
):
conda create -n llama-cpp python=3.12 -y
conda activate llama-cpp
which python
pip -V
The latest branch of the cloned code is compiled to quantize the required binaries:
cd ~
git clone -b b4034 /ggerganov/
cd /
pip install -r
brew install cmake
make
Once the compilation is complete, you can run the following command to confirm quantization of the required binariesllama-quantize
Availability:
./llama-quantize --help
Download the original model
Download the original model that needs to be converted to GGUF format and quantized.
Download the model from HuggingFace, via the HuggingFace providedhuggingface-cli
command to download, first install the dependencies:
pip install -U huggingface_hub
Domestic network configuration to download the mirror source:
export HF_ENDPOINT=
Download heremeta-llama/Llama-3.2-3B-Instruct
The model, which isGated model
If you are a member of HuggingFace, you will need to fill out an application at HuggingFace and confirm that you are authorized to access the site:
Click on your avatar in the upper right corner of HuggingFace and select theAccess Tokens
to create aRead
Token for the permission, save it:
downloadingmeta-llama/Llama-3.2-3B-Instruct
Model.--local-dir
Specifies to save to the current directory.--token
Specifies the Access Token created above:
mkdir ~/
cd ~//
huggingface-cli download meta-llama/Llama-3.2-3B-Instruct --local-dir Llama-3.2-3B-Instruct --token hf_abcdefghijklmnopqrstuvwxyz
Conversion to GGUF format with quantitative modeling
Scripts for creating GGUF formats and quantization models:
cd ~//
vim
Fill in the contents of the following script and put cap (a poem)
directory path to the actual path of the current environment, which needs to be an absolute path, change the
d
variable in thegpustack
Modify to HuggingFace username:
#!/usr/bin/env bash
llama_cpp="/Users/gpustack/"
b="/Users/gpustack/"
export PATH="$PATH:${llama_cpp}"
s="$1"
n="$(echo "${s}" | cut -d'/' -f2)"
d="gpustack/${n}-GGUF"
# prepare
mkdir -p ${b}/${d} 1>/dev/null 2>&1
pushd ${b}/${d} 1>/dev/null 2>&1
git init . 1>/dev/null 2>&1
if [[ ! -f .gitattributes ]]; then
cp -f ${b}/${s}/.gitattributes . 1>/dev/null 2>&1 || true
echo "*.gguf filter=lfs diff=lfs merge=lfs -text" >> .gitattributes
fi
if [[ ! -d assets ]]; then
cp -rf ${b}/${s}/assets . 1>/dev/null 2>&1 || true
fi
if [[ ! -d images ]]; then
cp -rf ${b}/${s}/images . 1>/dev/null 2>&1 || true
fi
if [[ ! -d imgs ]]; then
cp -rf ${b}/${s}/imgs . 1>/dev/null 2>&1 || true
fi
if [[ ! -f ]]; then
cp -f ${b}/${s}/ . 1>/dev/null 2>&1 || true
fi
set -e
pushd ${llama_cpp} 1>/dev/null 2>&1
# convert
[[ -f venv/bin/activate ]] && source venv/bin/activate
echo "#### convert_hf_to_gguf.py ${b}/${s} --outfile ${b}/${d}/${n}-"
python3 convert_hf_to_gguf.py ${b}/${s} --outfile ${b}/${d}/${n}-
# quantize
qs=(
"Q8_0"
"Q6_K"
"Q5_K_M"
"Q5_0"
"Q4_K_M"
"Q4_0"
"Q3_K"
"Q2_K"
)
for q in "${qs[@]}"; do
echo "#### llama-quantize ${b}/${d}/${n}- ${b}/${d}/${n}-${q}.gguf ${q}"
llama-quantize ${b}/${d}/${n}- ${b}/${d}/${n}-${q}.gguf ${q}
ls -lth ${b}/${d}
sleep 3
done
popd 1>/dev/null 2>&1
set +e
Start converting the model toFP16
accuracy of the GGUF model, and were separately modeled with theQ8_0
、Q6_K
、Q5_K_M
、Q5_0
、Q4_K_M
、Q4_0
、Q3_K
、Q2_K
methods to quantify the model:
bash Llama-3.2-3B-Instruct
After the script has finished executing, it confirms the successful conversion to theFP16
precision GGUF model and the quantized GGUF model:
The model is stored in the directory corresponding to the user name:
ll gpustack/Llama-3.2-3B-Instruct-GGUF/
Uploading models to HuggingFace
Click on your avatar in the upper right corner of HuggingFace and select theNew Model
Create a model repository with the same name in the formatOriginal model name - GGUF
:
Updates the README of the model:
cd ~//gpustack/Llama-3.2-3B-Instruct-GGUF
vim
For maintainability, after the opening metadata, record the original model and the branching code Commit message, note that it follows the original model's message and the
The branch code Commit message is changed:
# Llama-3.2-3B-Instruct-GGUF
**Model creator**: [meta-llama](/meta-llama)<br/>
**Original model**: [Llama-3.2-3B-Instruct](/meta-llama/Llama-3.2-3B-Instruct)<br/>
**GGUF quantization**: based on release [b8deef0e](/ggerganov//commit/b8deef0ec0af5febac1d2cfd9119ff330ed0b762)
---
To prepare for the upload, install Git LFS to manage large file uploads:
brew install git-lfs
Add a remote repository:
git remote add origin git@:gpustack/Llama-3.2-3B-Instruct-GGUF
Add the file and pass it through thegit ls-files
Acknowledging the documents submitted.git lfs ls-files
Recognize that all.gguf
Files are uploaded by Git LFS management:
git add .
git ls-files
git lfs ls-files
To upload files over 5GB in size to HuggingFace you need to enable large file uploads, log in to HuggingFace from the command line and enter the Token created in the Downloading Models section above:
huggingface-cli login
Enables large file uploads for the current directory:
huggingface-cli lfs-enable-largefiles .
Upload the model to the HuggingFace repository:
git commit -m "feat: first commit" --signoff
git push origin main -f
Once the upload is complete, confirm in HuggingFace that the model file was uploaded successfully.
Uploading models to ModelScope
Click on the avatar in the upper right corner of the ModelScope and select theCreating Models
Create a model repository with the same name in the formatOriginal model name - GGUF
The following is a description of the model, and other configurations such as License, model type, AI framework, and whether to make the model public or not:
Upload the local repository's file and create:
To add a remote repository, you need to use the ModelScope Git access token you obtained at the beginning of this article to provide authentication when uploading models:
git remote add modelscope https://oauth2:xxxxxxxxxxxxxxxxxxxx@/gpustack/Llama-3.
Get files that already exist in the remote repository:
git fetch modelscope master
Since ModelScope usesmaster
branch rather thanmain
branch, you need to switch to themaster
Branching out and adoptingcherry-pick
commander-in-chief (military)main
Move the files undermaster
branch, first check and make a note of the current Commit ID:
git log
Switch tomaster
branch, and throughmain
The Commit ID of the branch willmain
The files under the branch are moved to themaster
Branching out:
git checkout FETCH_HEAD -b master
git cherry-pick -n 833fb20e5b07231e66c677180f95e27376eb25c6
Modify the conflict file to resolve the conflict (you can use the original model's.gitattributes
incorporation*.gguf filter=lfs diff=lfs merge=lfs -text
reference Script Related Logic ):
vim .gitattributes
Add the file and pass it through thegit ls-files
Acknowledging the documents submitted.git lfs ls-files
Recognize that all.gguf
Files are uploaded by Git LFS management:
git add .
git ls-files
git lfs ls-files
Upload the model to the ModelScope repository:
git commit -m "feat: first commit" --signoff
git push modelscope master -f
When the upload is complete, confirm in ModelScope that the model file was uploaded successfully.
summarize
The above areCreate and quantify GGUF models and upload them to the HuggingFace and ModelScope model repositories using theThe operation of the tutorial.
Its flexibility and efficiency make it ideal for model inference in resource-limited scenarios, which are widely used, GGUF is
The model file format required to run the model. I hope the above tutorial will help you on how to manage GGUF model files.
If you think it's well written, feel free tokudos、forwarding (mail, SMS, packets of data)、focus。