Find and fix labeling mistakes

How to find and address labeling mistakes.

Generally speaking, labeling teams and machine learning teams care about surfacing poor-quality labels for two reasons:

  • Your model is only as good as the data you train it on. Therefore, it is critical to use high-quality labels for training your model.
  • Finding labeling mistakes allows you to give feedback to labelers so that they can improve their accuracy.

With Labelbox, you can easily find and fix labeling errors. The goal is to surface data rows where model predictions and ground truth labels disagree due to labeling mistakes. You can then rework these poor-quality labels to ensure high accuracy and a robust machine learning model.

Use a trained model to find label errors

Model predictions and model metrics are useful tools for finding incorrectly labeled data. Machine learning models have different performance characteristics than human labelers. For example, a model — unlike a human — never becomes tired.

To do this, you first have to upload model predictions and upload model metrics on your labeled data. In other words, you should upload predictions and metrics to the model run that contains the labeled data used to train your model.

📘

Strong disagreement and high confidence can mean label errors

A great way to surface label errors is to find predictions where the model disagrees strongly with ground truth labels, yet the model is very confident.

Find and fix labeling mistakes

  1. Go to the Model tab and open the model and model run on which you want to find label errors.

  2. Filter the data rows to keep only disagreements between model predictions and ground truth labels. To do so, you can add a Metrics filter in order to keep only data rows with low metrics (e.g., at least one false positive).

3444

Surface mispredictions on images.

3456

Surface mispredictions on text.

  1. Surface data rows where the model is most confident. To do so, you can sort data rows in decreasing order of confidence. This assumes you have uploaded model confidence as a custom scalar metric to the model run. Predictions that have low metrics (e.g., false positives) and high model confidence tend to correspond with labeling mistakes.
3456

Filter mispredictions and sort on confidence to surface candidate labeling errors on images.

3456

Filter mispredictions and sort on confidence to surface candidate labeling errors on text.

  1. Then, you can manually inspect these surfaced data rows in detail. It is common for machine learning teams to manually inspect hundreds of data rows to capture as many label errors as possible. To do so, click on the thumbnails which will open the detailed view.

In the example with image assets, the first image that surfaced from our filters that we inspect indeed contains a labeling error:

  • The model correctly predicted there is a car in the middle of the parking, yet it is not included in the ground truth annotations.
  • In just a few clicks, we found a labeling mistake in the official DOTA dataset.
3456

Use your model as a guide for finding labeling mistakes on images.

In the example with text assets, the first text that surfaced from our filters that we inspect indeed contains a labeling error:

  • The model correctly predicted that "John" is a person, yet it is not in included in the ground truth annotations.
  • In just a few clicks, we found a labeling mistake in the official WikiNEuRal NER dataset.

Even though our models are not perfect (e.g., the image model fails to predict some cars at the bottom of the image), they are still helpful for finding labeling errors.

3456

Use your model as a guide for finding labeling mistakes in text.

  1. Now that we have surfaced label mistakes, you can select the poorly labeled data rows and send them for re-labeling by clicking on [n] selected > Send to > Project as batch.
Send poorly labeled data rows to a project for re-labeling.

Send poorly labeled data rows to a project for re-labeling.

Use embeddings to find label errors

The projector view is a powerful way to find labeling mistakes. In the projector view, you can:

  • Click on any point to preview the corresponding data row.
  • Select an area of the screen to preview all data rows in the region.

By coloring the projector view, for each class, you might notice suspicious points. For instance, a data row containing the basketball_court annotation in the middle of the ground_track_fieldcluster is likely to be a labeling mistake.

3456

This label seems out of distribution. Click on it to inspect and check if it's a labeling mistake.