Training Named Entity Recognition Models with DeepKE DEMO (Official DEMO)
Description:
- First published: 2024-10-10
- DeepKE Resources:
- Documentation:/DeepKE/
- Website:/
- cnschema: /
Set up Github mirrors if needed
git config --system url."/".insteadOf /
To unset, enter:
git config --system --unset :///.insteadof
Creating a conda environment
conda create -n deepke python=3.8
conda activate deepke
# mountingtorch
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url /whl/cu113
# 使用阿里云镜像mountingtorch 1.11.0
# pip install /pytorch-wheels/cu113/torch-1.11.0+cu113-cp38-cp38-linux_x86_64.whl /pytorch-wheels/cu113/torchvision-0.12.0+cu113-cp38-cp38-linux_x86_64.whl /pytorch-wheels/cu113/torchaudio-0.11.0+cu113-cp38-cp38-linux_x86_64.whl -i /pypi/simple/
Install DeepKE:
git clone /zjunlp/
cd DeepKE
pip install pip==24.0
pip install -r -i /pypi/simple/
python install
python develop
pip install prettytable==2.4.0
pip install ipython==8.12.0
Download Dataset
# apt-get install wget
cd example/ner/standard
wget 120.27.214.45/Data/ner/standard/
tar -xzvf
can be seendata
It's in the folder:
-
: Training set
-
: Validation set
-
: Test set
Configuring wandb
exist/ Register an account on and create a new project with a name like:deepke-ner-official-demo
show (a ticket)/authorize Get API key
(of a computer) runwandb init
Input the API key you just obtained and the project you created.
Run training and prediction
Delete the checkpoints and logs folders (if any) saved from previous training:
rm -r checkpoints/
rm -r logs/
lstmcrf
show (a ticket)example/ner/standard/run_lstmcrf.py
, make sure the wandb and yaml libraries are imported properly:
import wandb
import yaml
Modify the project name of wandb:
if config['use_wandb']:
(project="deepke-ner-official-demo")
modificationsexample/ner/standard/conf/
hit the nail on the headuse_wandb
because ofTrue
。
If you need to train with multiple GPUs, modify theexample/ner/standard/conf/
hit the nail on the headuse_multi_gpu
because ofTrue
Start training:
python run_lstmcrf.py
>> total: 109870 loss: 27.181508426008552
precision recall f1-score support
B-LOC 0.8920 0.8426 0.8666 1951
B-ORG 0.8170 0.7439 0.7787 984
B-PER 0.8783 0.8167 0.8464 884
I-LOC 0.8650 0.8264 0.8453 2581
I-ORG 0.8483 0.8365 0.8424 3945
I-PER 0.8860 0.8436 0.8643 1714
O 0.9861 0.9912 0.9886 97811
accuracy 0.9732 109870
macro avg 0.8818 0.8430 0.8618 109870
weighted avg 0.9727 0.9732 0.9729 109870
The prediction text used is saved in theexample/ner/standard/conf/
in the program, change it to the following:
text: ""Water heaters and other trade-ins saved more than 2,000 yuan."" On Oct. 3, Jin Yu, a citizen, touches his cell phone to place an order, make a payment and register at a shopping plaza in Xiangyang, Hubei province. Hubei is focusing on promoting large-scale equipment renewal and consumer goods trade-in. "Strive to the end of this year, the province's automobile scrapping and renewal, replacement and renewal of 45,000, 125,000, respectively, home appliances to 1.7 million sets of old for new." Long Xiaohong, director of the Hubei Provincial Department of Commerce, introduced."
Operational Forecasting:
python
NER Results.
[('Lake', 'B-LOC'), ('North', 'I-LOC'), ('Province', 'I-LOC'), ('Xiang', 'B-LOC'), ('Yang', 'I-LOC'), ('Municipality', 'I-LOC'), ('Market', 'I-LOC'), ('Yuh', 'I-PER'), ('Lake', 'B-ORG'), ('North ', 'I-ORG'), ('Province', 'I-ORG'), ('Business', 'I-ORG'), ('Service', 'I-ORG'), ('Office', 'I-ORG'), ('Hall', 'I-ORG'), ('Dragon', 'B-PER'), ('Small', 'I-PER'), ('Red', 'I-PER')]
bert
modificationsexample/ner/standard/conf/
hit the nail on the headhydra/model
because ofbert
。
The hyperparameters of bert are set in theexample/ner/standard/conf/hydra/model/
, which can be modified if necessary.
modificationsexample/ner/standard/conf/
hit the nail on the headuse_wandb
because ofTrue
。
modificationsexample/ner/standard/run_bert.py
The name of the project in the wandb:
if cfg.use_wandb:
(project="deepke-ner-official-demo")
Modify as necessaryexample/ner/standard/conf/
hit the nail on the headtrain_batch_size
For bert it is recommended to be no smaller than 64
Start training:
export HF_ENDPOINT=
python run_bert.py
w2ner
w2ner is a new SOTA model.
on the basis ofW2NER (AAAI'22) for entity recognition for multiple scenarios (for details, please refer to paperUnified Named Entity Recognition as Word-Word Relation Classification).
Named Entity Recognition (NER) involves three main types of NER, including planar, overlapping (aka nested), and discontinuous NER, which have mostly been studied separately. Recently, there has been an increasing interest in unifying NERs, the
W2NER
Use one model to handle all three of these tasks simultaneously.
Due to the use of a single card GPU, modifying theexample/ner/standard/w2ner/conf/
hit the nail on the headdevice
because of0
。
modificationsexample/ner/standard/w2ner/conf/
hit the nail on the headdata_dir
cap (a poem)do_train
:
data_dir: "../data"
do_train: True
in order to use the previously downloaded dataset and start training.
Run training:
python