Labelbox enables users to slice and dice their data. Powerful filtering and sorting helps with managing massive amounts of data, surfacing high-impact subsets and removing unwanted data.
Here are the supported search and filter capabilities in Catalog.
|Annotation||Filter on annotations created on or uploaded to Labelbox||Show images where X was annotated|
|Predictions (coming soon)||Filter on predictions uploaded to model runs||Show images where a model detected X|
|Dataset||Filter the dataset that data rows belong to||Show all images uploaded to dataset X|
|Metadata||Metadata fields uploaded by the user||The datetime an image was captured|
|Project status||Status with respect to the project||Data rows not submitted to a particular project|
|Similarity||Filter by a function score||Use similarity to find data for labeling|
|Natural Language Search||Filter based on natural language||Use NL search to find all "photo of birds in grass fields"|
|Media attributes||Attributes of the data computed on upload. Each media type has different fields.||Media type: Image, Video, Text,...|
Think of creating a filter like constructing a pyramid with layers of logical sequence. Each layer is an AND operation. Within a layer, you can use OR operations. Each filter provides a count of data rows or annotations that match the filter. Only non-zero counts of instances of an attribute are available for selection and provided as a hint.
Here is a realistic example to help you understand filter construction.
An ML engineer is developing an AI model to identify vessels on synthetic aperture radar satellite imagery. The engineer learns that the model performs poorly on images containing coastlines. So the engineer queries for images that are at least 200px wide AND belong to the dataset named "SAR dataset (chipped)".
Then, the engineer queries for images that are more similar to the images used to create a function named "coastal images". The results are images with a coastline.
The engineer then tunes the function parameter that results in images that are dissimilar to the images used to create a function named "coastal images". The results are images without any coastline.
Then, the engineer queries for images that are not labeled (do not contain annotation named "ship").
Finally, the engineer can sample 100 random images from the results and submit the batch to a labeling project.
Updated about 12 hours ago