Generally speaking, labeling teams and machine learning teams care about surfacing poor-quality labels for two reasons:
- Your model is only as good as the data you train it on. Therefore, it is critical to use high-quality labels for training your model.
- Finding labeling mistakes allows you to give feedback to labelers so that they can improve their accuracy.
With Labelbox, you can easily find and fix labeling errors. The goal is to surface data rows where model predictions and ground truth labels disagree due to labeling mistakes. You can then rework these poor-quality labels to ensure high accuracy and a robust machine learning model.
Model predictions and model metrics are useful tools for finding incorrectly labeled data. Machine learning models have different performance characteristics than human labelers. For example, a model — unlike a human — never becomes tired.
To do this, you first have to upload model predictions and upload model metrics on your labeled data. In other words, you should upload predictions and metrics to the model run that contains the labeled data used to train your model.
Strong disagreement and high confidence can mean label errors
A great way to surface label errors is to find predictions where the model disagrees strongly with ground truth labels, yet the model is very confident.
Go to the Model tab and open the model and model run on which you want to find label errors.
Filter the data rows to keep only disagreements between model predictions and ground truth labels. To do so, you can add a Metrics filter in order to keep only data rows with low metrics (e.g., at least one false positive).
- Surface data rows where the model is most confident. To do so, you can sort data rows in decreasing order of confidence. This assumes you have uploaded model confidence as a custom scalar metric to the model run. Predictions that have low metrics (e.g., false positives) and high model confidence tend to correspond with labeling mistakes.
- Then, you can manually inspect these surfaced data rows in detail. It is common for machine learning teams to manually inspect hundreds of data rows to capture as many label errors as possible. To do so, click on the thumbnails which will open the detailed view.
In the example with image assets, the first image that surfaced from our filters that we inspect indeed contains a labeling error:
- The model correctly predicted there is a car in the middle of the parking, yet it is not included in the ground truth annotations.
- In just a few clicks, we found a labeling mistake in the official DOTA dataset.
In the example with text assets, the first text that surfaced from our filters that we inspect indeed contains a labeling error:
- The model correctly predicted that "John" is a person, yet it is not in included in the ground truth annotations.
- In just a few clicks, we found a labeling mistake in the official WikiNEuRal NER dataset.
Even though our models are not perfect (e.g., the image model fails to predict some cars at the bottom of the image), they are still helpful for finding labeling errors.
- Now that we have surfaced label mistakes, you can select the poorly labeled data rows and send them for re-labeling by clicking on [n] selected > Send to > Project as batch.
The projector view is a powerful way to find labeling mistakes. In the projector view, you can:
- Click on any point to preview the corresponding data row.
- Select an area of the screen to preview all data rows in the region.
By coloring the projector view, for each class, you might notice suspicious points. For instance, a data row containing the
basketball_court annotation in the middle of the
ground_track_fieldcluster is likely to be a labeling mistake.
Updated 8 months ago