Location>code7788 >text

Detailed tutorial on labeling NER data using doccano

Popularity:851 ℃/2024-10-12 11:30:51

Detailed tutorial on labeling NER data using doccano

Description:

  • First published: 2024-10-12
  • References:
    • /zjunlp/DeepKE/blob/main/README_TAG_CN.md
    • /doccano/tutorial/
    • /GongYangXianShen/article/details/137270106 (converted to BIO format)

Deploying doccano

/doccano/doccano There are instructions on how to deploy. For example, using Docker to deploy:

docker run --name doccano \
  -d --restart always \
  -e "ADMIN_USERNAME=admin" \
  -e "ADMIN_EMAIL=admin@" \
  -e "ADMIN_PASSWORD=password" \
  -v doccano-db:/data \
  -p 8001:8000 doccano/doccano

Create User

There is only one user by default, we need to open the ADMIN page to add a new user.

After the main URL add/admin/Then open the ADMIN page (note that the trailing slash is required) and click on theAdd

After adding the username and password information, clickSAVEto save:

How to do NER labeling

Create a project

The default interface is in English, you can switch to Chinese if you are not used to English:

Then click on Login and enter your username and password to log in, after you have logged in:

strike (on the keyboard)establish, it will jump to the following page:

Click to selectSerial labeling(Sequence Labeling), then enter the necessary information such as the name and configure other attributes as needed:

strike (on the keyboard)establish, jump to the following page:

Importing data sets

Click on the left side of thedata setButton:

Move the mouse to themanipulateButton:

Click Import Dataset:

doccanoSeveral formats of text are supported, and their differences are listed below:

  • Textfile: The uploaded file istxtformat, when marking a wholetxtThe document is displayed as one page of content;
  • Textline: The uploaded file istxtformat, when markingtxtA line of text in a document is displayed as a page of content;
  • JSONLJSON LinesThe shorthand for each line is a validJSONValue;
  • CoNLLCoNLLA file formatted with a series of tab-delimited words on each line;

Upload a TXT file:

After clicking Import:

Define Tags

Click on the left side of thetab (of a window) (computing)and then move the mouse to themanipulatemenu and clickCreating Tags

Create 3 common labels thatPER, LOC, ORG, the actual application needs to determine what tags are available based on the requirements. Below is an example of creating aPERLabeling as an example:

After creation:

Add Member

Click on the left side of themembersbutton, and then click therise

Select the users and roles that need to be added to the project, of which there are 3 (Project Administrator , Annotator, Reviewer). Save the selection:

You can see it after saving:

Assigning labeling tasks

First, check the data to be assigned:

Then, click on the Operation menu underAssign to member

Select the distribution scheme and then click on the right side of theAssignbuttons

The above distribution scheme allocates 15% of the tasks to theadminusers, 85% of the tasks are assigned touser1Users.

View distribution results:

annotate (e.g. a character with its pinyin)

Click on the left sidedata setThen select a piece of data and click on the right-mostannotate (e.g. a character with its pinyin)button to start labeling.

For example, click on the right-hand side of thePERlabel, and then the mouse selects the corresponding text in the text respectively:

When the labeling is complete, click the X button in the upper left corner of the text to indicate that the labeling has been completed:

Export data

Click on the left sidedata setbutton, move the mouse to themanipulateMenu, clickExporting a data set

optionJSONLformat, check theExport only approved documents(export only audited data) and click Export: