You can use the filters in Catalog to help you gather insights about your data, surface high-impact subsets of data rows, and remove unwanted data.
Below are the attributes you can use to filter your data rows in Catalog. For a summary of each filter's compatibility with every data type, please see Search and view compatibility.
|Annotation||Find data rows with labels that contain or do not have certain counts of annotations|
|Batch||Find data rows that belong to a particular batch|
|Data row||Filter based on global key, data row ID, created at, and last activity at|
|Dataset||Find data rows that belong to a particular dataset|
|Find text||Find data rows that contain a particular keyword|
|Media attributes||Find data rows based on their media type (e.g., image, video, text) or other attributes computed upon upload (e.g., video duration, height, width)|
|Metadata||Find data rows that contain a certain metadata field and/or value|
|Natural language search||Filter based on natural language (e.g., photos of birds in grass fields)|
|Project||Find data rows that are associated with a specific labeling project|
|Similarity||Automatically surface data rows that are more or less similar to selected data rows|
Think of creating a filter like constructing a pyramid with layers of logical sequence. Each layer is an
AND operation. Within a layer, you can use
OR operations. Each filter provides a count of data rows that match the filter. Only non-zero counts of instances of an attribute are available for selection and provided as a possible selection.
Labelbox search and curation capabilities allow for unmatched flexibility, scale, performance, and automation.
Labelbox search is uniquely flexible. Users can combine all kinds of filters in any arbitrary way. You can do both structured searches and unstructured searches at the same time.
- Structured searches include filters on metadata, annotations, metrics, confidence, datasets, projects, and more.
- Unstructured searches include similarity search, natural language search, and text search.
Combine different kinds of search results for even more powerful data curation. Training datasets that are carefully visualized, curated, and debugged are the most successful for increasing model performance.
Labelbox search operates at a scale of 100+ million data points. Building an in-house data catalog that scales to hundreds of millions of data points — and that provides results instantaneously in just one click — is difficult for even the most advanced machine learning teams.
Labelbox strives to deliver high-performing search capabilities. Search queries take less than 15 seconds to return results, even on hundreds of millions of data points.
Labelbox search results are visual. This makes it easy to refine and iterate on a search. Given the results from the previous search, you can easily add, remove or edit filters. With Labelbox, non-technical teams have the most powerful data exploration and data curation at their fingertips.
For a summary of data visualization capabilities for every data type, please see Search and view compatibility.
After populating filters in Catalog, you can save these filters as a slice of data. When you save a filter as a slice, you will not need to populate the same filters over and over again. Slices are dynamic; thus, any incoming data rows in your Catalog will show up in the relevant slices.
Read through the following resources to learn how to take action on the filtered data:
- Refine the similarity search
- Send filtered data rows to a labeling project as a batch
- Add metadata to the filtered data rows
Users can easily share search results with each other by sharing slice URLs.
Updated 5 months ago