Technical background
This article mainly introduces the method of converting bin-format model files into safetensors-format model files on Hugging Face and downloading them to the local level.
bin to safetensors
First install safetensors:
$ python3 -m pip install safetensors --upgrade
Then clone Github's safetensors repository:
$ git clone /huggingface/
Cloning to 'safetensors'...
remote: Enumerating objects: 4812, done.
remote: Counting objects: 100% (1486/1486), done.
remote: Compressing objects: 100% (406/406), done.
remote: Total 4812 (delta 1340), reused 1082 (delta 1079), pack-reused 3326 (from 2)
In the receiving object: 100% (4812/4812), 1.15 MiB | 1.22 MiB/s, completed.
Processing in delta: 100% (2457/2457), completed.
Go to the subdirectory:
$ cd safetensors/bindings/python/
$ ll
Total dosage 84
drwxrwxr-x 6 dechin dechin 4096 February 21 16:37 ./
drwxrwxr-x 3 dechin dechin 4096 February 21 16:37 ../
drwxrwxr-x 2 dechin dechin 4096 February 21 16:37 benches/
-rw-rw-r-- 1 dechin dechin 476 February 21 16:37
-rw-rw-r-- 1 dechin dechin 1454 February 21 16:37 convert_all.py
-rw-rw-r-- 1 dechin dechin 14769 February 21 16:37
-rw-rw-r-- 1 dechin dechin 729 February 21 16:37
-rw-rw-r-- 1 dechin dechin 685 February 21 16:37 .gitignore
-rw-rw-r-- 1 dechin dechin 1103 February 21 16:37 Makefile
-rw-rw-r-- 1 dechin dechin 190 February 21 16:37
-rw-rw-r-- 1 dechin dechin 2419 February 21 16:37
drwxrwxr-x 3 dechin dechin 4096 February 21 16:37 py_src/
-rw-rw-r-- 1 dechin dechin 852 February 21 16:37
-rw-rw-r-- 1 dechin dechin 891 February 21 16:37
drwxrwxr-x 2 dechin dechin 4096 February 21 16:37 src/
-rw-rw-r-- 1 dechin dechin 5612 February 21 16:37
drwxrwxr-x 3 dechin dechin 4096 February 21 16:37 tests/
One of themformat conversion script. View usage:
$ python3 --help
usage: [-h] [--revision REVISION] [--force] [-y] model_id
Simple utility tool to convert automatically some weights on the hub to `safetensors` format. It is PyTorch
exclusive for now. It works by downloading the weights (PT), converting them locally, and uploading them back as a
PR on the hub.
positional arguments:
model_id The name of the model on the hub to convert. . `gpt2` or `facebook/wav2vec2-base-960h`
options:
-h, --help show this help message and exit
--revision REVISION The revision to convert
--force Create the PR even if it already exists of if the model was already converted.
-y Ignore safety prompt
This script can convert the model file with the specified path into the safetensors model, but if it is run directly, an error will be reported:
$ python3 --force -y Salesforce/blip-image-captioning-base
: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 15.3MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:06<00:00, 7.82MB/s]
Traceback (most recent call last):
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
response.raise_for_status()
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/requests/", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
: 401 Client Error: Unauthorized for url: /api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/datb/DeepSeek/safetensors/bindings/python/", line 369, in <module>
commit_info, errors = convert(api, model_id, revision=, force=)
File "/datb/DeepSeek/safetensors/bindings/python/", line 313, in convert
new_pr = api.create_commit(
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1524, in _inner
return fn(self, *args, **kwargs)
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3961, in create_commit
self.preupload_lfs_files(
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4184, in preupload_lfs_files
_fetch_upload_modes(
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/_commit_api.py", line 542, in _fetch_upload_modes
hf_raise_for_status(resp)
File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.: 401 Client Error. (Request ID: Root=1-67b83d70-5af65b805cde0ba55c72abd1;ce2be295-8da7-4230-a8db-f505919ddb85)
Repository Not Found for url: /api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
Note: Creating a commit assumes that the repo already exists on the Huggingface Hub. Please use `create_repo` if it's not the case.
This requires us to register a Hugging Face account first, and then get a token for free:
Add token to the first two lines:
$ head -n 2
from huggingface_hub import login
login("your_token")
Execute the conversion script again:
$ python3 --force -y Salesforce/blip-image-captioning-base
: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 16.2MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:04<00:00, 7.94MB/s]
Pr created at /Salesforce/blip-image-captioning-base/discussions/42
### Success 🔥
Yay! This model was successfully converted and a PR was open using your token, here:
[/Salesforce/blip-image-captioning-base/discussions/42](/Salesforce/blip-image-captioning-base/discussions/42)
The safetensor file was successfully built. And submitted a Pull Request, so even if this PR is not merged, we can download the relevant model files from our own PR.
Download the warehouse from HF
We can usegit-lfs
Download the model file from Hugging Face's PR. First we download all small files (non-LFS files) from the main branch:
$ GIT_LFS_SKIP_SMUDGE=1 git clone /Salesforce/blip-image-captioning-base
Cloning to 'blip-image-captioning-base'...
remote: Enumerating objects: 76, done.
remote: Counting objects: 100% (76/76), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 76 (delta 39), reused 72 (delta 37), pack-reused 0 (from 0)
Expanded object: 100% (76/76), 323.20 KiB | 1.05 MiB/s, completed.
After downloading the lightweight file, enter the download path:
$ cd blip-image-captioning-base/
$ ll
Total dosage 976
drwxrwxr-x 3 dechin dechin 4096 February 21 17:22 ./
drwxrwxr-x 3 dechin dechin 4096 February 21 17:22 ../
-rw-rw-r-- 1 dechin dechin 4563 February 21 17:22
drwxrwxr-x 9 dechin dechin 4096 February 21 17:22 .git/
-rw-rw-r-- 1 dechin dechin 1477 February 21 17:22 .gitattributes
-rw-rw-r-- 1 dechin dechin 287 February 21 17:22 preprocessor_config.json
-rw-rw-r-- 1 dechin dechin 134 February 21 17:22 pytorch_model.bin
-rw-rw-r-- 1 dechin dechin 6359 February 21 17:22
-rw-rw-r-- 1 dechin dechin 125 February 21 17:22 special_tokens_map.json
-rw-rw-r-- 1 dechin dechin 134 February 21 17:22 tf_model.h5
-rw-rw-r-- 1 dechin dechin 506 February 21 17:22 tokenizer_config.json
-rw-rw-r-- 1 dechin dechin 711396 February 21 17:22
-rw-rw-r-- 1 dechin dechin 231508 February 21 17:22
You can see that the big model file at this time has not been downloaded, and then pull the contents of our own PR branch in this path:
$ git pull origin refs/pr/42
remote: Enumerating objects: 4, done.
remote: Counting objects: 100% (4/4), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
Expanded object: 100% (3/3), 1.42 KiB | 1.42 MiB/s, completed.
From /Salesforce/blip-image-captioning-base
* branch refs/pr/42 -> FETCH_HEAD
Update 82a3776..0f2f8a0
Fast-forward
| 3+++
1 file changed, 3 insertions(+)
create mode 100644
The name of our PR is followed by the origin, which can be viewed on the Hugging Face related branch homepage. Check the local path again:
$ ll
Total dosage 967508
drwxrwxr-x 3 dechin dechin 4096 February 21 17:24 ./
drwxrwxr-x 3 dechin dechin 4096 February 21 17:22 ../
-rw-rw-r-- 1 dechin dechin 4563 February 21 17:22
drwxrwxr-x 9 dechin dechin 4096 February 21 17:24 .git/
-rw-rw-r-- 1 dechin dechin 1477 February 21 17:22 .gitattributes
-rw-rw-r-- 1 dechin dechin 989721336 February 21 17:24
-rw-rw-r-- 1 dechin dechin 287 February 21 17:22 preprocessor_config.json
-rw-rw-r-- 1 dechin dechin 134 February 21 17:22 pytorch_model.bin
-rw-rw-r-- 1 dechin dechin 6359 February 21 17:22
-rw-rw-r-- 1 dechin dechin 125 February 21 17:22 special_tokens_map.json
-rw-rw-r-- 1 dechin dechin 134 February 21 17:22 tf_model.h5
-rw-rw-r-- 1 dechin dechin 506 February 21 17:22 tokenizer_config.json
-rw-rw-r-- 1 dechin dechin 711396 February 21 17:22
-rw-rw-r-- 1 dechin dechin 231508 February 21 17:22
You can see that the safetensors model file is downloaded successfully, so we complete the process of converting the online model format and then downloading it to the local area.
Summary
This article introduces a method to convert large-scale model files in bin format on Hugging Face online to safetensors file format and then download them locally.
Copyright Statement
The first link to this article is:/dechinphy/p/
Author ID: DechinPhy
More original articles:/dechinphy/
Please ask the blogger to have coffee:/dechinphy/gallery/image/