I. Mind mapping presentation
Introduction to OpenCompass
OpenCompass is a large model evaluation system, open source and efficient. It also integrates CompassKit evaluation tool, CompassHub evaluation set community, CompassRank evaluation list.
Official website address: /home
OpenCompass Installation
3.1 Creating a virtual environment
conda create --name opencompass python=3.10 -y conda activate opencompass
3.2 Installing OpenCompass via pip
# Supports most data sets and models pip install -U opencompass # Full installation (more datasets supported) # pip install "opencompass[full]" # model inference backends, and since these inference backends often have conflicting dependencies, it is recommended to use different virtual environments to manage them. # pip install "opencompass[lmdeploy]" # pip install "opencompass[vllm]" # API testing (e.g. OpenAI, Qwen) # pip install "opencompass[api]"
3.3 Installing OpenCompass Based on Source Code
git clone https:///open-compass/opencompass opencompass cd opencompass pip install -e . # pip install -e ".[full]" # pip install -e ".[vllm]"
3.4 Downloading system data sets (optional)
Since we use our own downloaded dataset, the system's dataset, is not necessary, but it is still recommended for the robustness of the original program, as I have not verified that it is not downloaded.
# Download the dataset to data/ be in a position of wget https:///open-compass/opencompass/releases/download/0.2.2.rc1/
# After unzipping it is a folder calleddata(I'll tell you about that later.datafPut it there.,Remember this.datafile (paper))
unzip OpenCompassData-core-20240207.zip
3.5 Automatic download of models and data using ModelScope (optional)
Because we also use the local model, do not need the program to download their own, if you do online testing, you can configure the
pip install modelscope export DATASET_SOURCE=ModelScope
3.6 Online assessment (optional)
At this point if, for example, you have the conditions for FQ, you can just take the online test.
IV. OpenCompass online assessment (optional)
Because online assessment of many models from huggingface direct download, and then assessed, the need for FQ, I do not demonstrate here, directly to the official online testing process to bring over to show, if you do not need online testing can be directly skipped.
4.1 First Assessment
OpenCompass supports setting up configurations via the command line interface (CLI) or Python scripts. For simple evaluation setups, we recommend using the CLI; for more complex evaluations, the scripting approach is recommended. You can find more scripting examples in the configs folder.
# Command Line Interface (CLI) opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen # Python Script opencompass ./configs/eval_chat_demo.py
More sample scripts can be found in the configs folder.
4.2 API Measurement
OpenCompass is not designed to distinguish between open source models and API models. You can evaluate both types of models in the same way or even in the same setup.
export OPENAI_API_KEY="YOUR_OPEN_API_KEY" # Command Line Interface (CLI) opencompass --models gpt_4o_2024_05_13 --datasets demo_gsm8k_chat_gen # Python Script opencompass ./configs/eval_api_demo.py # The o1_mini_2024_09_12/o1_preview_2024_09_12 models are now supported, with max_completion_tokens=8192 by default.
4.3 Back-end reasoning
If you want to use an inference backend other than HuggingFace for accelerated evaluation, such as LMDeploy or vLLM, you can do so with the following command.
opencompass --models hf_internlm2_5_1_8b_chat --datasets demo_gsm8k_chat_gen -a lmdeploy
V. Loading local test datasets
5.1 Download the LawBench dataset we want to use locally via git
git clone /ljn20001229/
Note: 1 place is the dataset zip downloaded through git, you need to unzip it into the data at the same level labeled 2 place.
Note: The 3 LawBench locations are unzipped files.
At this point our customized local data will be downloaded and put in place.
VI. Configuring the local Qwen model
6.1 Model Download. Download directly from modelscope.
6.2 Add the downloaded model to the project directory.
VII. Writing local assessment scripts
7.1 Create eval_local_qwen_1_8b_chat.py in the configs folder of the root directory to be used as the evaluation startup script for our Qwen1.8B model with the following code:
1 # eval_local_qwen_1_8b_chat.py 2 3 from import read_base 4 5 with read_base(): 6 # Importing data sets 7 from ..lawbench_zero_shot_gen_002588 import lawbench_datasets as zero 8 from ..lawbench_one_shot_gen_002588 import lawbench_datasets as one 9 # Import model 10 from .local_qwen_1_8b_chat import models 11 datasets = [*zero, *one]
7.2 modifications from ..lawbench_zero_shot_gen_002588 import lawbench_datasets as zero hit the nail on the head lawbench_zero_shot_gen_002588 file:
7.3 similarly modify from ..lawbench_one_shot_gen_002588 import lawbench_datasets as one hit the nail on the head lawbench_one_shot_gen_002588 file
7.4 Creating the qwen.local_qwen_1_8b_chat file from .local_qwen_1_8b_chat import models
VIII. Launching local assessment
Local evaluation is done directly using python by executing the configs/eval_local_qwen_1_8b_chat.py file that we created with the following arguments
python configs/eval_local_qwen_1_8b_chat.py --debug
IX. Explanation of measurement parameters
- --debug: debug mode, there will be log messages output on the console
- --dry-run: This test will only load the dataset, but it will not be used again in the evaluation.
- --accelerator vllm: vllm acceleration for local deployment of large models
- --reuse: whether to reuse historical results
- --work-dir: the path to store the results, default is outputs/default.
- --max-num-worker: for data parallelism
X. Assessment results
So far the use of OpenCompass through the local dataset LawBench evaluation of the local model Qwen1.8B_chat model record is complete, thank you for looking at the granddaddy, look so long! Quill!!!