Filtering and sorting
Labelbox provides powerful tools for exploring the data within your model run. You can filter and sort your data based on a wide range of attributes, including:- Annotations: Find data rows with or without certain annotations.
- Predictions: Filter data based on your model’s predictions.
- Metrics: Sort your data by performance metrics like IoU and confidence.
- Metadata: Use your own custom metadata to filter your data.
Save data as a slice
A Slice is a saved query that acts as a dynamic, “smart” folder for your data. Rather than creating a static list of data rows, a Slice continuously and automatically updates itself as new data is added or metadata changes. When to use this:- Tracking key segments: You want to constantly monitor your model’s performance on a critical subset of your data (e.g., “all images from
camera-02” or “all data with anight-timemetadata tag”). - Automating data curation: Create a Slice for “data with low confidence predictions” to automatically group your model’s most uncertain predictions for review.
- Building validation sets: Create a Slice to represent a specific distribution of data that you want to use as a consistent validation set across multiple Model Runs.
How to create a slice
- Apply your filters: In your Model Run, apply the set of filters that defines the cohort of data you want to track.
- Save the slice: Click the Save as slice button at the top of the filter bar.
- Name your slice: Give your Slice a descriptive name that reflects the query, such as “Low Confidence Detections” or “Validation Set: Highway Scenes”.
See Limits to learn the limits for creating slices.
Auto-generated slices
When you create a model run and associate ground truth labels, Labelbox automatically generates a set of default slices. These slices act as powerful, pre-built filters that help you immediately begin diagnosing your model’s performance. They provide the foundational building blocks for a comprehensive model error analysis workflow. Here is a detailed breakdown of each auto-generated slice and how to leverage it for model improvement.| Auto-generated slice | Description |
|---|---|
| True positive | This slice contains every data row where your model correctly predicted an object that matched a ground truth label, according to the IoU threshold you’ve set. |
| False positive | This slice contains every data row where your model made a prediction for which there was no corresponding ground truth label. In essence, your model is “hallucinating” or seeing things that aren’t there. |
| False negative | This slice contains every data row where a ground truth label exists, but your model completely failed to predict it. This is your “missed detections” bucket. |
| True negative | This slice contains every data row where there is no ground truth label and your model also made no prediction. This concept is most relevant for global or document-level classification tasks. For object detection, this slice represents the background where your model correctly remained silent. |
| Low precision | This slice contains data rows where your model’s precision for a specific class is low. Precision measures the accuracy of your model’s positive predictions (True Positives / (True Positives + False Positives)). In simple terms, a low precision score means the model is making a high number of False Positive predictions for that class. |
| Low recall | This slice contains data rows where your model’s recall for a specific class is low. Recall (also known as sensitivity) measures the model’s ability to find all of the actual positive examples (True Positives / (True Positives + False Negatives)). A low recall score means the model is missing a high number of objects for that class. |
| Low F1-score | This slice identifies data rows belonging to classes with a low F1-Score. The F1-score is the harmonic mean of precision and recall, providing a single, balanced measure of a model’s performance. A low F1-score indicates a problem with either precision, recall, or both. |
| Low confidence | This slice contains every prediction where your model’s confidence score was below a certain threshold (e.g., less than 50%). It’s important to note that these are not necessarily incorrect predictions; they are simply predictions where the model is expressing uncertainty. |
| Candidate mislabels | This powerful slice identifies data rows where your model made a high-confidence prediction that directly disagrees with the ground truth label. For example, the model predicts “car” with 98% confidence, but the ground truth label says “truck”. |