Location>code7788 >text

Manual Assessment | Tips & Tricks

Popularity:788 ℃/2024-12-19 20:36:28

Tips & Hints

this ismanual assessment The third in a series of articles, Tips and Hints, the full series includes.

  • basic concept
  • human annotator
  • Tips & Hints

It is recommended to read the "Using human annotators" section before reading this article. This article will present some practical advice when building evaluation datasets using human annotations.

Mission design

  • simplicity reigns supreme: Annotation tasks avoid unnecessary complexity. Minimizing the cognitive burden on the annotators helps to ensure that they remain focused, thus improving the quality of the annotation.

  • Checking information:: Annotation tasks avoid introducing unnecessary information. It is sufficient to provide only the information that is necessary for the task to ensure that there is no additional bias against the annotator.

  • Content streamlining: Differences in where and how things are displayed can lead to additional workload and cognitive burden, which in turn affects the quality of the annotation. For example, displaying text and tasks on the same page avoids unnecessary scrolling, or multiple serial tasks can be displayed sequentially when they are combined. Think carefully about how everything is presented in your annotation tool and see if there is room for simplification.

  • test setup: Once the task design as well as the annotation guidelines are complete, ensure that they are tested on a small number of samples on their own before inviting the entire annotation team to participate and iterate as needed.

marking process

  • Individual labeling: In order to avoid the propagation of personal bias within the team, which may lead to biased results, the annotators should not help each other or learn from each other's answers in the course of the task. The alignment principle of the annotation guidelines should be applied throughout the task, and new annotators should be trained with independent datasets or inter-annotation consistency metrics should be used to ensure consistent results across the annotation team.

  • version consistency: If the annotated document requires significant updates (e.g., definition or instruction changes, adding or removing labels), decide whether to iterate over the annotated data, or at a minimum, version-track the changed dataset, which can be accomplished using a method such as theguidelines-v1 The metadata value of the

hybrid human-machine labeling

Manual annotation is certainly a great advantage, but sometimes the annotation team will be subject to some limitations, such as time and resources. At this time, models can be partially utilized to improve annotation efficiency.

  • Model-assisted labeling: It is possible to use the model's predictions or generated results as pre-labeling to avoid the labeling team starting from scratch. It should be noted that this may introduce model bias, e.g., when the model is less accurate instead of increasing the labeling effort.

  • Supervised model evaluationThe results can be validated or discarded by combining model evaluation (refer to the "Model as a judge" page) with a human-supervised methodology. Be aware of the introduced bias (see "Advantages and disadvantages of manual evaluation" section).

  • Recognizing Edge Cases: To make the task more efficient, a set of models can be used to make an initial judgment, and then a human supervisor can be introduced when the models' opinions are too biased or tied. Again, attention needs to be paid to the bias introduced (refer to the "Advantages and disadvantages of manual evaluation" section).

End-to-End Tutorial

If you want to build your own assessment tasks in their entirety, you can refer to Argilla'sPractical assessment tutorialsThe text details the use ofArgilla cap (a poem)distilabel Perform synthetic data, manual assessments, etc. to build domain-specific assessment tasks. Once constructed you can use thelighteval The library is evaluated.


Link to original article./huggingface/evaluation-guidebook/blob/main/contents/human-evaluation/

Author: clefourrier

Translator: SuSung-boy

Reviewer: adeenayakup