Location>code7788 >text

Say goodbye to Hugging Face model download problems: master efficient download strategies and enjoy a seamless development experience!

Popularity:329 ℃/2024-08-08 10:15:29

Say goodbye to Hugging Face model download problems: master efficient download strategies and enjoy a seamless development experience!

Huggingface domestic open source mirrors:/

It summarizes a lot of methods of downloading, which are explained below

Method 1: Web Download

On the model homepage of theFiles and VersionYou can get the download link of the file in the center. No need to log in directly click on the download, you can also copy the download link and download with other download tools.

Method 2: huggingface-cli (🔺)

  • Detailed address:/docs/huggingface_hub/guides/download#download-from-the-cli

huggingface-cli Hugging Face is an official command-line tool that comes with a comprehensive download feature.

  1. ** Installation of dependencies**

After creating a virtual environment for your project, activate it and execute it:

pip install -U huggingface_hub

Ready to runhuggingface-cli download --hcommand to view the parameters of the download function, as follows

  1. Setting environment variables
  • Linux
export HF_ENDPOINT=

  • Windows Powershell
$env:HF_ENDPOINT = ""

  • python
import os
['HF_ENDPOINT'] = ''

It is recommended that the above line be written to~/.bashrc

vim ~/.bashrc
export HF_ENDPOINT=
source ~/.bashrc
  1. Download model

As an example, download the Qwen/Qwen2-7B-Instruct model at the model address:/Qwen/Qwen2-7B-Instruct, run the following command

You can add--local-dir-use-symlinks False parameter to disable softlinking of files, so that what you see in the download path is what you get.

#The following commands are all available
huggingface-cli download --resume-download --local-dir-use-symlinks False Qwen/Qwen2-7B-Instruct --local-dir /www/algorithm/agent/Qwen2-7B-Instruct

huggingface-cli download --resume-download Qwen/Qwen2-7B-Instruct --local-dir /www/algorithm/agent/Qwen2-7B-Instruct --local-dir-use-symlinks False --token hf_*****

#huggingface-cli download --resume-download gpt2 --local-dir gpt2

huggingface-cli download --resume-download --local-dir-use-symlinks False THUDM/glm-4-9b-chat --local-dir /www/algorithm/agent/glm-4-9b-chat

Downloading is quite fast, about 7-8min for a 7B model.

  1. Download Dataset
huggingface-cli download --repo-type dataset --resume-download wikitext --local-dir wikitext

Method 3: Dedicated multi-threaded downloader hfd

Conventional tools such as browsers use single-threaded downloads by default, which can sometimes be very slow due to domestic network operators' line quality, QoS and other factors, and multi-threaded acceleration is an effective and significant way to increase download speed.

The classic multithreading tools recommended two: IDM, Aria2. IDM for Windows, aria2 for Linux, so get the URL, you can use these multithreading tools to download. Take my test as an example, single-threaded 700KB/s, IDM 8 threads 6MB/s. Gigabit broadband, the use of IDM can run to 80MB/s +.

hfd is a Git- and aria2-based command-line script for huggingface downloads: (Gitst link). hfd is much more robust than the huggingface-cli, with fewer weird errors, and more fine-grained multithreading control, with the ability to set the number of threads.

  • Specific Steps:
    • Step1: Git clone all files in the project repository other than the lfs file, and automatically get the url of the lfs file;

    • Step2: Download the file using aria2 multithreading.

hfd It is a dedicated download tool for huggingface developed on this site, based on the mature toolgit+aria2It is possible to do stable downloads without disconnecting.

  1. Download hfd
wget /hfd/
chmod a+x

find . -name #View file location

Depends on the installation:

sudo apt-get install aria2


  • Download Git LFS
    Download the latest version of Git LFS. The download link for the latest version can be found on the official Git LFS website. On Ubuntu systems, you can download Git LFS using the following command:
apt-get install git-lfs
  1. Setting environment variables
  • Linux
export HF_ENDPOINT=

  • Windows Powershell
$env:HF_ENDPOINT = ""

  1. Download model

Model URL: THUDM/glm-4-9b-chat./THUDM/glm-4-9b-chat/tree/main

./ gpt2 --tool aria2c -x 4

  1. ** Download the data set**
./ wikitext --dataset --tool aria2c -x 4

If aria2 is not installed, you can use wget by default:

  • Full command format:
$ . / -h
Usage.
hfd <model_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool wget|aria2c] [-x threads] [--dataset]

Description.
Download a model or dataset from Hugging Face using the provided model ID.

Parameters.
model_id Hugging Face model ID in the format 'repo/model_name'.
--include (optional) Flag to specify the string pattern of the files to be included in the download.
The --exclude (optional) flag to specify a string pattern of files to exclude from the download.
exclude_pattern Pattern to match file names to exclude.
--hf_username (optional) Hugging Face username for authentication.
--hf_token (Optional) Hugging Face token to use for authentication.
---tool (optional) The download tool to use. Can be wget (default) or aria2c.
-x (optional) Number of download threads for aria2c.
--dataset (optional) Flag indicating the dataset to download.
Example:
hfd bigscience/bloom-560m --exclude safetensors
hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken --tool aria2c -x 8
hfd lavita/medical-qa-shared-task-v1-toy --dataset

Method 4: Use environment variables (non-intrusive)

Non-invasive and solves most cases. huggingface toolchain gets theHF_ENDPOINTEnvironment variables to determine the URL used to download the file, so you can use by setting the variable.

HF_ENDPOINT= python your_script.py

Some datasets have built-in download scripts, though, so you'll need to manually change the address within the script to accomplish this.

Frequently Asked Questions.

Q1: Some programs require login, how can I download them?

A: Some Gated Repo requires a login to apply for a license. For account security, this site does not support login, you need to go to the Hugging Face official website to login, apply for a license, in theGet Access Token here Then go back to the mirror site and download it with the command line.
Some of the tools to download Gated Repo's method:

huggingface-cli: add--tokenparameters

huggingface-cli download --token hf_*** --resume-download meta-llama/Llama-2-7b-hf --local-dir Llama-2-7b-hf

hfd: add--hf_username``--hf_tokenparameters

hfd meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token hf_***

e.g. the restfrom_pretrainedwgetcurlHow to set the authentication token.

Q2:Reasons for not recommending Git clone

In addition, the official git clone repo_url download method is also provided, this method is quite simple, however, it is the least recommended direct use of the method, the disadvantages are two:

  1. Doesn't support intermittent transfer, breaks and starts over;
  2. clone will download the historical version to occupy disk space, even if there is no historical version, the size of the .git folder will store a copy of the current version of the model as well as meta-information, resulting in the entire model folder occupies more than twice the disk space, for some models with historical versions, the download time is more than twice as long, for the network is not stable enough, and the disk is not large enough for the user, it is seriously not recommended!

Q3:Other methods recommended (🔺)

You can go to other platforms to download it:

  • ollama

Web site:/library

  • Magic Tower Community

Web site:/models

  • Reference Links

    • /

    • /weixin_43196262/article/details/135268100

    • /p/86c4a45f0a18

    • /p/663712983