Catalog is a data curation tool for organizing, searching, visualizing, and exploring labeled and unlabeled data (including any metadata). Teams developing and operating production AI systems need a data catalog to enable data selection for downstream data-centric workflows. This includes data labeling, model training, model evaluation, error analysis, and active learning.

Anatomy of the Catalog

20022002

Understanding data flow

Data can flow into and out of the Catalog in numerous ways. This diagram indicates how data can flow through the Catalog.

Data inflow

Catalog can ingest the following pieces of data. Catalog makes it easy to search, visualize, and explore the following data in one place.

Data typeOverview
Data rows & datasetsData Rows in Catalog are imported when a dataset is created or appended with Data Rows.
Custom metadataCustom metadata fields are imported during data row creation or update events.
Media attributesMedia attributes are a special class of metadata automatically pre-computed by Labelbox at data row creation or update events.

The media attributes include file type, dimensions, and pre-computed embeddings. Media attributes are essential for your optimal experience with Labelbox.
Ground truth annotationsYou can view Ground truth annotations in the Catalog after creating annotations in Labelbox or importing the annotations to Labelbox.
Model predictionsModel predictions are not yet supported in Catalog.

Data outflow

Data can flow out of the Catalog in two ways.

TypeOverview
BatchCreate a batch of data rows and send it to Annotate for labeling.

Create a batch for labeling
ExportUse Python SDK to retrieve data row content (asset URL, media attributes, metadata).

Export data rows from dataset

Export labels

Data collections

Data rows can be part of two collections in the Catalog.

TypeOverview
DatasetsEvery data row in the Catalog belongs to exactly one dataset. Every data row in Catalog is imported when a dataset is created or appended to an existing dataset.
SlicesA data row in the Catalog may be part of any number of slices. A data row is part of a slice if it matches its associated filters.
17181718

You can explore the Catalog by looking at All data, or by looking at the data inside a dataset or a slice.

Getting started with Catalog

  1. Chose data configuration: IAM Integration or Signed URLs

  2. Import data

  3. Curate data

  4. Create a batch


What’s Next