Detailed tutorial on labeling NER data using doccano
Description:
- First published: 2024-10-12
- References:
- /zjunlp/DeepKE/blob/main/README_TAG_CN.md
- /doccano/tutorial/
- /GongYangXianShen/article/details/137270106 (converted to BIO format)
Deploying doccano
/doccano/doccano There are instructions on how to deploy. For example, using Docker to deploy:
docker run --name doccano \
-d --restart always \
-e "ADMIN_USERNAME=admin" \
-e "ADMIN_EMAIL=admin@" \
-e "ADMIN_PASSWORD=password" \
-v doccano-db:/data \
-p 8001:8000 doccano/doccano
Create User
There is only one user by default, we need to open the ADMIN page to add a new user.
After the main URL add/admin/
Then open the ADMIN page (note that the trailing slash is required) and click on theAdd
:
After adding the username and password information, clickSAVE
to save:
How to do NER labeling
Create a project
The default interface is in English, you can switch to Chinese if you are not used to English:
Then click on Login and enter your username and password to log in, after you have logged in:
strike (on the keyboard)establish
, it will jump to the following page:
Click to selectSerial labeling
(Sequence Labeling), then enter the necessary information such as the name and configure other attributes as needed:
strike (on the keyboard)establish
, jump to the following page:
Importing data sets
Click on the left side of thedata set
Button:
Move the mouse to themanipulate
Button:
Click Import Dataset:
doccano
Several formats of text are supported, and their differences are listed below:
-
Textfile
: The uploaded file istxt
format, when marking a wholetxt
The document is displayed as one page of content; -
Textline
: The uploaded file istxt
format, when markingtxt
A line of text in a document is displayed as a page of content; -
JSONL
:JSON Lines
The shorthand for each line is a validJSON
Value; -
CoNLL
:CoNLL
A file formatted with a series of tab-delimited words on each line;
Upload a TXT file:
After clicking Import:
Define Tags
Click on the left side of thetab (of a window) (computing)
and then move the mouse to themanipulate
menu and clickCreating Tags
:
Create 3 common labels thatPER
, LOC
, ORG
, the actual application needs to determine what tags are available based on the requirements. Below is an example of creating aPER
Labeling as an example:
After creation:
Add Member
Click on the left side of themembers
button, and then click therise
:
Select the users and roles that need to be added to the project, of which there are 3 (Project Administrator , Annotator, Reviewer). Save the selection:
You can see it after saving:
Assigning labeling tasks
First, check the data to be assigned:
Then, click on the Operation menu underAssign to member
:
Select the distribution scheme and then click on the right side of theAssign
buttons
The above distribution scheme allocates 15% of the tasks to theadmin
users, 85% of the tasks are assigned touser1
Users.
View distribution results:
annotate (e.g. a character with its pinyin)
Click on the left sidedata set
Then select a piece of data and click on the right-mostannotate (e.g. a character with its pinyin)
button to start labeling.
For example, click on the right-hand side of thePER
label, and then the mouse selects the corresponding text in the text respectively:
When the labeling is complete, click the X button in the upper left corner of the text to indicate that the labeling has been completed:
Export data
Click on the left sidedata set
button, move the mouse to themanipulate
Menu, clickExporting a data set
:
optionJSONL
format, check theExport only approved documents
(export only audited data) and click Export: