- Mitigation: Follow the self-consistency setting prompt, enter the model to execute multiple times and retain most results
- Mitigation measures: adopt jury mechanism
- Mitigation measures:
- The model is required to output a detailed inference process firstOutput score again
- Add a consistent scoring criteria in prompt
- Mitigation measures:
- Randomly adjust the answer position
- Calculate logarithmic probability of all options and normalize
- Mitigation measures:Consider the length difference in the answer
- In all assessments,Whether manual assessment can serve as a good baseline is still controversial. For example, in certain specific fields (such as medicine, law, mathematics, etc.), if the marker is not professional enough, the results obtained may be as bad as using LLM directly.
- Mitigation measures: Follow carefully the prompt format of the evaluation model training set (such as the format of the instruction fine-tuning model).