Filters

A guide for filtering and sorting data rows in the Catalog UI.

You can use the filters in Catalog to help you gather insights about your data, surface high-impact subsets of data rows, and remove unwanted data.

Supported attributes

Below are the attributes you can use to filter your data rows in Catalog. For a summary of each filter's compatibility with every data type, please see Search and view compatibility.

AttributeDescription
AnnotationFind data rows with labels that contain or do not have certain counts of annotations
BatchFind data rows that belong to a particular batch
Data rowFilter based on global key, data row ID, created at, and last activity at
DatasetFind data rows that belong to a particular dataset
Find textFind data rows that contain a particular keyword
Media attributesFind data rows based on their media type (e.g., image, video, text) or other attributes computed upon upload (e.g., video duration, height, width)
MetadataFind data rows that contain a certain metadata field and/or value
Natural language searchFilter based on natural language (e.g., photos of birds in grass fields)
ProjectFind data rows that are associated with a specific labeling project
SimilarityAutomatically surface data rows that are more or less similar to selected data rows

How filters work

Think of creating a filter like constructing a pyramid with layers of logical sequence. Each layer is an AND operation. Within a layer, you can use OR operations. Each filter provides a count of data rows that match the filter. Only non-zero counts of instances of an attribute are available for selection and provided as a possible selection.

Labelbox search and curation capabilities allow for unmatched flexibility, scale, performance, and automation.

Flexibility

Labelbox search is uniquely flexible. Users can combine all kinds of filters in any arbitrary way. You can do both structured searches and unstructured searches at the same time.

Combine different kinds of search results for even more powerful data curation. Training datasets that are carefully visualized, curated, and debugged are the most successful for increasing model performance.

Scale

Labelbox search operates at a scale of 100+ million data points. Building an in-house data catalog that scales to hundreds of millions of data points — and that provides results instantaneously in just one click — is difficult for even the most advanced machine learning teams.

Performance

Labelbox strives to deliver high-performing search capabilities. Search queries take less than 15 seconds to return results, even on hundreds of millions of data points.

Visual search

Labelbox search results are visual. This makes it easy to refine and iterate on a search. Given the results from the previous search, you can easily add, remove or edit filters. With Labelbox, non-technical teams have the most powerful data exploration and data curation at their fingertips.

For a summary of data visualization capabilities for every data type, please see Search and view compatibility.

Automatic search

After populating filters in Catalog, you can save these filters as a slice of data. When you save a filter as a slice, you will not need to populate the same filters over and over again. Slices are dynamic; thus, any incoming data rows in your Catalog will show up in the relevant slices.

Read through the following resources to learn how to take action on the filtered data:

Ascending or Descending

To gain a deeper understanding of your data, consider filtering based on the date of import, the latest activity on a data row, or metadata. This approach allows for a more dynamic comprehension of how your data is functioning and helps identify areas that may require additional resources.

Collaborative search

Users can easily share search results with each other by sharing slice URLs.

Programmatic search

Coming soon.