Metadata provides additional information about a given data row. You can use metadata to bring additional context to your data rows. You can also use metadata and filter your data rows for more efficient labeling, model diagnostics, and data selection workflows.

Metadata is broken into two categories, reserved fields and custom fields. Reserved fields are defined by Labelbox and custom fields enable you to import your own metadata schema.

For more information on all available fields, see Metadata.

Create metadata schema

At this time, you can create metadata schema through the UI.

See Metadata schema to learn more about how to create metadata schema prior to uploading your metadata.

Import metadata

You can upload data rows together with metadata. You can see a bulk upload example in
Create a dataset.

Here is a more detailed walk-through.


Limits on uploading data rows with metadata

Currently, there is a 30k limit on bulk uploading data rows containing metadata.

Import metadata classes

import labelbox 
from labelbox.schema.data_row_metadata import DataRowMetadataField
import datetime
from uuid import uuid4

client = labelbox.Client(api_key="LABELBOX_API_KEY")

Construct metadata fields

To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the Schema ID and value in a dictionary format.

## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = client.get_data_row_metadata_ontology()

# List all available fields

## Construct a metadata field of string kind
tag_schema = metadata_ontology.reserved_by_name["tag"]
tag_metadata_field = DataRowMetadataField(
    schema_id=tag_schema.uid,  # specify the schema id
    value="tag_string", # typed inputs

# Construct an metadata field of datetime
datetime_schema = metadata_ontology.reserved_by_name["captureDateTime"]
capture_datetime_field = DataRowMetadataField(
    schema_id=datetime_schema.uid,  # specify the schema id
    value=datetime.datetime.utcnow(), # typed inputs

# Construct a metadata field of Enums options
train_schema = metadata_ontology.reserved_by_name["split"]["train"]
split_metadta_field = DataRowMetadataField(
    schema_id=train_schema.parent,  # specify the schema id
    value=train_schema.uid, # typed inputs

# Custom fields, must be created in UI prior to this
custom_field = metadata_ontology.custom_by_name["my-custom-field"]
custome_metadta_field = DataRowMetadataField(
    schema_id=custom_field.uid,  # specify the schema id
    value="custome_field_value", # typed inputs

Upload data rows with metadata

Option 1: Specify metadata with a list of DataRowMetadataField. This is the recommended option since it comes with validation for metadata fields.

dataset = client.create_dataset(name="Bulk import example")

data_row = {"row_data": "", "external_id": str(uuid4())}
data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field,  split_metadta_field]

task = dataset.create_data_rows([data_row])

Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the DataRowMetadataField objects.

dataset = client.create_dataset(name="Bulk import example")

data_row = {"row_data": "", "external_id": str(uuid4())}
data_row['metadata_fields'] = [
  {"schema_id": tag_schema.uid, "value": "tag_string"},             
  {"schema_id": datetime_schema.uid, "value": datetime.datetime.utcnow()}, 
  {"schema_id": train_schema.parent, "value": train_schema.uid}, 

task = dataset.create_data_rows([data_row])

Export data rows with metadata

data_rows = dataset.export_data_rows(include_metadata=True)

Examine metadata of a data row

datarow = next(dataset.data_rows())

Update or add metadata of existing Data Rows

Labelbox supports individual or bulk metadata upsert of data rows. Metadata will overwrite on a per-field basis.

tag_schema = metadata_ontology.reserved_by_name["tag"]

# Construct an enum field
field = DataRowMetadataField(
    schema_id=tag_schema.uid,  # specify the schema id
    value="updated", # typed inputs

# Completed object ready for import
metadata_payload = DataRowMetadata(
    data_row_id="DATAROW_ID",  # DataRow Id not ExternalId

# Provide a list of DataRowMetadata objects to upload

Add metadata to data rows

Metadata can be bulk added to up to 100,000 data rows at once via the Catalog user interface. This method is valuable for tagging a set of data rows in the same instance they are uncovered or gathered, such as an edge case or data from a single cluster in an embedding space.


Interested in adding metadata using the Python SDK?

For information on adding metadata to data rows programmatically, please see our documentation Python SDK metadata guide.

Add via filters

To add metadata to a set of data rows using filters follow these steps:

  1. Go to Catalog
  2. Select a batch of data rows
  3. From the n selected button, click Add metadata
  4. Select the metadata to apply to those data rows
  5. Click Confirm

Add via embedding projector

To add metadata to a set of data rows via the embedding projector, follow these steps:

  1. Go to Model
  2. Click the projector view icon
  3. Select a set of data points from the cluster
  4. Enter the metadata information
  5. Click Confirm

Filter & view metadata

Once you upload your metadata, you can easily filter and view metadata in Catalog. Note: this functionality will not work for labelers.

Click on any data row to open the detailed view. There you can find the metadata for that asset.

Export metadata

You can bulk export metadata by data row with the SDK. You can specify multiple data row IDs in the array.

# Export metadata from a list of data row ids.
metadata = mdo.bulk_export([data_row.uid])

Delete metadata

You can delete metadata for a data row through the SDK. To delete metadata for a data row, you must specify the fields you want to delete by schema ID. The deletion will not fail if the schema ID is not present on the data row.

# Specify the schemas to delete
schemas = [tag_schema, ...]

# Create a delete object
deletes = DeleteDataRowMetadata(
    fields=[s.uid for s in schemas]

mdo.bulk_delete([deletes]) # pass an array of deletes

What’s Next