Learn how to utilize benchmarks and consensus scoring to analyze the quality of your labels.
benchmark_score
and consensus-score
values in the performance_details
section of exported labels in the resulting JSON file.
For a benchmark data row labeled multiple times, all non-benchmark labels contain a benchmark_reference_label
field, which is the ID of the benchmark label they reference. The benchmark label itself doesn’t have a benchmark_reference_label
field or an associated benchmark score, as it serves as the standard for comparison, not as a label being compared.
house
and the second labeler submits an annotation on the same word in the text file with hous
, the agreement score between these two annotations would be 0.80.Each of the dotted boxes represents a unique answer choice/answer schema.
Labeler 1
Labeler 2