Find model errors (Error Analysis)

Error analysis (AKA Model Error Analysis) is the process you can use to analyze where model predictions disagree with ground truth labels. A disagreement between model predictions and ground truth labels can be due to a model error (poor model prediction) or a labeling mistake (ground truth is wrong). In this section, we detail 3 workflows to surface model errors - edge cases on which the model is struggling - using Labelbox.

Before you start

Before engaging in Error Analysis, you should:

  1. Go the Models tab.

  2. Open the Model you want to perform Error Analysis on.

  3. Select the Model Run you want to perform Error Analysis on.

Workflow 1: Using filters in the Gallery view

Follow these steps to learn how to use filters and metrics, in the Gallery view, to surface model errors. You can adjust this workflow to your specific use case.

  1. Inside the Model Run, go to the Gallery view by clicking the Gallery icon on the right.

  2. Optionally, select the validation or test split. Some machine learning teams prefer doing Error Analysis on the validation or test splits only, rather than on all the Model Run data.

  3. Filter Data Rows to keep only disagreements between model predictions and ground truth labels. To do so, you can add a filter on metrics to keep only Data Rows with low metrics (Metrics:IOU between 0 and 0.5 in our example). In the screenshot below, you can see that 307 Data Rows matching these filters are surfaced.

  4. [ Option 1 ] Surface Data Rows where the disagreement is the highest. To do so, you can sort Data Rows by increasing metrics (IOU in our example). Predictions that have the lowest metrics (IOU in our example) are likely to be model errors.

We filter and sort Data Rows to keep the largest disagreements between model predictions and ground truth labelsWe filter and sort Data Rows to keep the largest disagreements between model predictions and ground truth labels

We filter and sort Data Rows to keep the largest disagreements between model predictions and ground truth labels

  1. [ Option 2 ] Surface Data Rows where the model is least confident. To do so, you can sort Data Rows by increasing the order of confidence. This assumes you have uploaded model confidence, as a Scalar metric, to the Model Run. Predictions that have low metrics (IOU) and low model confidence are likely to be edge cases on which the model is struggling.

  2. Then, manually inspect in detail some of these surfaced Data Rows. The goal is to find patterns of edge cases on which the model is struggling. It is common practice to manually inspect hundreds of Data Rows, to find these patterns. To do so, click on the thumbnails which will open a Detailed view. For the best Error Analysis experience, change the display setting to Color by feature. This way, you can easily visualize where predictions and labels disagree.

The Detailed view makes it easy to inspect in details a disagreement. The goal is to find patterns of model failures.The Detailed view makes it easy to inspect in details a disagreement. The goal is to find patterns of model failures.

The Detailed view makes it easy to inspect in details a disagreement. The goal is to find patterns of model failures.

In this example, you can see several occurrences of Data Rows where the model predicts a basketball_court instead of a ground_track_field.

In this example, the pattern of model failures can be inferred as "the model seems to struggle to distinguish ground track fields and basketball courts, especially when they have green and brown colors".

  1. Double check that you have found a pattern of model failure. To do so, we filter to keep only Data Rows that contain a basketball_court or a ground_track_field annotation, and that have low IOU. This surfaces many examples of the exact edge case we discovered above: "the model struggles to distinguish ground track fields and basketball courts, especially when they have green and brown colors".
Many ground track fields and basketball courts are being mispredicted (low IOU). This is a patter of model failure.Many ground track fields and basketball courts are being mispredicted (low IOU). This is a patter of model failure.

Many ground track fields and basketball courts are being mispredicted (low IOU). This is a patter of model failure.

By browsing through examples in this pattern of model failure, you can see that many basketball courts have brown and green colors, just like ground track fields.

Labelbox helps you find patterns of model failures. In this case, the model struggles to distinguish ground track fields and basketball courts, especially when they have green and brown colorsLabelbox helps you find patterns of model failures. In this case, the model struggles to distinguish ground track fields and basketball courts, especially when they have green and brown colors

Labelbox helps you find patterns of model failures. In this case, the model struggles to distinguish ground track fields and basketball courts, especially when they have green and brown colors

After you surface edge cases on which the model is struggling, and found a pattern of model failure, you can take action to improve model performance.

Workflow 2: Using graphs in the Metrics view

The Metrics view is a powerful tool for doing Error Analysis.

By looking at the scalar metrics, we could have noticed that the model is struggling to detect basketball_court ground truths. Then, we could have clicked on the histogram bar corresponding to basketball courts: the Gallery view opens, with filtering and sorting activated, to show Data Rows in this basketball_court bar of this histogram.

This is an alternative to steps 1-4 described in the previous section.

The Metrics view are a good way to identify classes on which the model is struggling.The Metrics view are a good way to identify classes on which the model is struggling.

The Metrics view are a good way to identify classes on which the model is struggling.

Workflow 3: Using the Projector view

The Projector view is a powerful way to do Error Analysis.

By coloring the projector view, for each class we notice that Data Rows containing the basketball_court annotation, and those containing the ground_track_field annotation overlap. The two classes are not easy to separate in the embedding space. Hence, the model is likely to struggle with the Data Rows at the intersection of the two clusters.

In the Projector view, you can select the Data Rows that are at the intersection of the basketball_court cluster and the ground_track_field cluster. The model is likely to struggle with these Data Rows. Once the Data Rows are selected, you can switch back to the Grid view, and inspect these Data Rows.

This is an alternative to steps 1-4 described in the previous section.


Did this page help you?