Find & fix labeling mistakes

Generally speaking, labeling teams and machine learning teams care about surfacing poor-quality labels for 2 reasons:

  • Your model is only as good as the data you train it on. Therefore, it is critical to use high-quality labels for training your model.
  • Finding labeling mistakes help give feedback to labelers to help them improve.

With Labelbox, you can easily find and fix labeling errors. The goal is to surface data rows where model predictions and ground truth labels disagree (due to labeling mistakes). It is best practice to rework these poor-quality labels to ensure high-performing labeling teams as well as a robust machine learning model.

Use a trained model to find label errors

Model predictions and model metrics are useful tools for finding incorrectly labeled data. Machine learning models have different performance characteristics than human labelers. For example, a model—unlike a human—does not get tired.

To do this, you first have to upload model predictions and upload model metrics on your labeled data. In other words, you should upload predictions and metrics to the model run that contains the labeled data used to train your model.

📘

Tip

A great way to surface label errors is to find predictions where the model disagrees strongly with ground truth labels, yet the model is very confident.

Find and fix labeling mistakes

  1. Go to the Model tab. Open the model and model run you want to find label errors on.

  2. Filter the data rows to keep only disagreements between model predictions and ground truth labels. To do so, you can add a filter on metrics in order to keep only data rows with low metrics (e.g. at least one false positive in the image).

3444

Surface mispredictions on images

3456

Surface mispredictions on text

  1. Surface data rows where the model is most confident. To do so, you can sort data rows by decreasing order of confidence. This assumes you have uploaded model confidence, as a custom scalar metric, to the model run. Predictions that have low metrics (e.g. false positives) and high model confidence tend to correspond to labeling mistakes.
3456

Filter mispredictions and sort on confidence, to surface candidate labeling errors on images

3456

Filter mispredictions and sort on confidence, to surface candidate labeling errors on text

  1. Then, you can manually inspect these surface data rows in detail. It is common for machine learning teams to manually inspect hundreds of data rows, to capture as many label errors as possible. To do so, click on the thumbnails which will open the detailed view.

In our image example, the first (surfaced) image we inspect indeed contains a labeling error:

  • The model corrected predicted there is a car in the middle of the parking, yet it is not in the ground truths
  • In just a few clicks, we found a labeling mistake in the official DOTA dataset

In our text example, the first (surfaced) text we inspect indeed contains a labeling error:

  • the model correctly predicted that "John" is a person, yet it is not in the ground truth
  • in just a few clicks, we found a labeling mistake in the official WikiNEuRal NER dataset

Even though our models are not perfect (e.g. the image model fails to predict some cars at the bottom of the image), they are still helpful to find labeling errors.

3456

Use your model as a guide to find labeling mistakes on images

3456

Use your model as a guide to find labeling mistakes in text

  1. Now that we have surfaced label mistakes, you can select the poorly labeled data rows and them to re-labeling, by clicking on "[X] selected" > "Send to" > "Project as batch".

Use embeddings to find label errors

The projector view is a powerful way to find labeling mistakes.

In the projector view, you can:

  • Click on any point to preview the corresponding data row
  • Select a region of the screen, to preview all data rows inside it

By coloring the projector view, for each class, you might notice suspicious points. For instance, a data row containing the basketball_court annotation in the middle of the ground_track_fieldcluster is likely to be a labeling mistake.

3456

This label seems out of distribution. Click on it to inspect it and check if it's a labeling mistake.