Skip to main content
Once you have created a model run and uploaded your predictions, you can start analyzing your model’s performance. This section will guide you through the various tools and features available in Labelbox to help you understand where your model is succeeding and where it is failing.

Filtering and sorting

Labelbox provides powerful tools for exploring the data within your model run. You can filter and sort your data based on a wide range of attributes, including:
  • Annotations: Find data rows with or without certain annotations.
  • Predictions: Filter data based on your model’s predictions.
  • Metrics: Sort your data by performance metrics like IoU and confidence.
  • Metadata: Use your own custom metadata to filter your data.

Save data as a slice

A Slice is a saved query that acts as a dynamic, “smart” folder for your data. Rather than creating a static list of data rows, a Slice continuously and automatically updates itself as new data is added or metadata changes. When to use this:
  • Tracking key segments: You want to constantly monitor your model’s performance on a critical subset of your data (e.g., “all images from camera-02” or “all data with a night-time metadata tag”).
  • Automating data curation: Create a Slice for “data with low confidence predictions” to automatically group your model’s most uncertain predictions for review.
  • Building validation sets: Create a Slice to represent a specific distribution of data that you want to use as a consistent validation set across multiple Model Runs.

How to create a slice

  1. Apply your filters: In your Model Run, apply the set of filters that defines the cohort of data you want to track.
  2. Save the slice: Click the Save as slice button at the top of the filter bar.
  3. Name your slice: Give your Slice a descriptive name that reflects the query, such as “Low Confidence Detections” or “Validation Set: Highway Scenes”.
The Slice is now saved and available in both the Model and Catalog tabs. You can select this Slice in any Model Run to instantly apply the saved filter set. The Slice will automatically include any new data that matches its criteria, providing a powerful way to automate your data management and analysis workflows. To create a slice programmatically, see our Model run slice developer guide. After you create a slice, it will appear in the left side panel of the model run view. You may modify the attributes of the slice later by updating its filters.
See Limits to learn the limits for creating slices.

Auto-generated slices

When you create a model run and associate ground truth labels, Labelbox automatically generates a set of default slices. These slices act as powerful, pre-built filters that help you immediately begin diagnosing your model’s performance. They provide the foundational building blocks for a comprehensive model error analysis workflow. Here is a detailed breakdown of each auto-generated slice and how to leverage it for model improvement.
Auto-generated sliceDescription
True positiveThis slice contains every data row where your model correctly predicted an object that matched a ground truth label, according to the IoU threshold you’ve set.
False positiveThis slice contains every data row where your model made a prediction for which there was no corresponding ground truth label. In essence, your model is “hallucinating” or seeing things that aren’t there.
False negativeThis slice contains every data row where a ground truth label exists, but your model completely failed to predict it. This is your “missed detections” bucket.
True negativeThis slice contains every data row where there is no ground truth label and your model also made no prediction. This concept is most relevant for global or document-level classification tasks. For object detection, this slice represents the background where your model correctly remained silent.
Low precisionThis slice contains data rows where your model’s precision for a specific class is low. Precision measures the accuracy of your model’s positive predictions (True Positives / (True Positives + False Positives)). In simple terms, a low precision score means the model is making a high number of False Positive predictions for that class.
Low recallThis slice contains data rows where your model’s recall for a specific class is low. Recall (also known as sensitivity) measures the model’s ability to find all of the actual positive examples (True Positives / (True Positives + False Negatives)). A low recall score means the model is missing a high number of objects for that class.
Low F1-scoreThis slice identifies data rows belonging to classes with a low F1-Score. The F1-score is the harmonic mean of precision and recall, providing a single, balanced measure of a model’s performance. A low F1-score indicates a problem with either precision, recall, or both.
Low confidenceThis slice contains every prediction where your model’s confidence score was below a certain threshold (e.g., less than 50%). It’s important to note that these are not necessarily incorrect predictions; they are simply predictions where the model is expressing uncertainty.
Candidate mislabelsThis powerful slice identifies data rows where your model made a high-confidence prediction that directly disagrees with the ground truth label. For example, the model predicts “car” with 98% confidence, but the ground truth label says “truck”.