Import metadata

Metadata provides additional information about a given Data Row. You can use metadata to bring additional context to your Data Rows. You can also use metadata and filter your Data Rows for more efficient labeling, model diagnostics, and data selection workflows.

Metadata is broken into two categories, reserved fields and custom fields. Reserved fields are defined by Labelbox and custom fields enable you to import your own metadata schema.

For more information on all available fields, see Metadata.

Create metadata schema

At this time, you can create metadata schema through the UI.

See Metadata schema to learn more about how to create metadata schema prior to uploading your metadata.

Import metadata

You can upload data rows together with metadata. You can see a bulk upload example in
Create a dataset.

Here is a more detailed walk-through.

📘

Limits on uploading data rows with metadata

Currently, there is a 30k limit on bulk uploading data rows containing metadata.

Import metadata classes

import labelbox 
from labelbox.schema.data_row_metadata import DataRowMetadataField
import datetime
from uuid import uuid4

client = labelbox.Client(api_key="LABELBOX_API_KEY")

Construct metadata fields

To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the Schema Id and value in a dictionary format.

## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = client.get_data_row_metadata_ontology()

# List all available fields
metadata_ontology.fields

## Construct a metadata field of string kind
tag_schema = metadata_ontology.reserved_by_name["tag"]
tag_metadata_field = DataRowMetadataField(
    schema_id=tag_schema.uid,  # specify the schema id
    value="tag_string", # typed inputs
)

# Construct an metadata field of datetime
datetime_schema = metadata_ontology.reserved_by_name["captureDateTime"]
capture_datetime_field = DataRowMetadataField(
    schema_id=datetime_schema.uid,  # specify the schema id
    value=datetime.datetime.utcnow(), # typed inputs
)

# Construct a metadata field of Enums options
train_schema = metadata_ontology.reserved_by_name["split"]["train"]
split_metadta_field = DataRowMetadataField(
    schema_id=train_schema.parent,  # specify the schema id
    value=train_schema.uid, # typed inputs
)

# Custom fields, must be created in UI prior to this
custom_field = metadata_ontology.custom_by_name["my-custom-field"]
custome_metadta_field = DataRowMetadataField(
    schema_id=custom_field.uid,  # specify the schema id
    value="custome_field_value", # typed inputs
)

Upload Data Rows with metadata

Option 1: Specify metadata with a list of DataRowMetadataField. This is the recommended option since it comes with validation for metadata fields.

dataset = client.create_dataset(name="Bulk import example")

data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "external_id": str(uuid4())}
data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field,  split_metadta_field]

task = dataset.create_data_rows([data_row])
task.wait_till_done()

Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the DataRowMetadataField objects.

dataset = client.create_dataset(name="Bulk import example")

data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "external_id": str(uuid4())}
data_row['metadata_fields'] = [
  {"schema_id": tag_schema.uid, "value": "tag_string"},             
  {"schema_id": datetime_schema.uid, "value": datetime.datetime.utcnow()}, 
  {"schema_id": train_schema.parent, "value": train_schema.uid}, 
 ]

task = dataset.create_data_rows([data_row])
task.wait_till_done()

Examine metadata of a Data Row

datarow = next(dataset.data_rows())
print(datarow)

Update or add metadata of existing Data Rows

Labelbox supports individual or bulk metadata upsert of Data Rows. Metadata will overwrite on a per-field basis.

tag_schema = metadata_ontology.reserved_by_name["tag"]

# Construct an enum field
field = DataRowMetadataField(
    schema_id=tag_schema.uid,  # specify the schema id
    value="updated", # typed inputs
)

# Completed object ready for import
metadata_payload = DataRowMetadata(
    data_row_id="DATAROW_ID",  # DataRow Id not ExternalId
    fields=[field]
)

# Provide a list of DataRowMetadata objects to upload
metadata_ontology.bulk_upsert([metadata_payload])

Filter & view

Once you upload your metadata, you can easily filter and view metadata in Catalog. Note: this functionality will not work for labelers.

Filter Data Rows by metadata in CatalogFilter Data Rows by metadata in Catalog

Filter Data Rows by metadata in Catalog

View metadata of a single Data RowView metadata of a single Data Row

View metadata of a single Data Row

Export

You can bulk export Metadata by Data Row with the SDK. You can specify multiple Data Row IDs in the array.

# Export metadata from a list of data row ids.
metadata = mdo.bulk_export([data_row.uid])

Delete

You can delete Metadata for a Data Row through the SDK. To delete Metadata for a Data Row, you must specify the fields you want to delete by Schema Id. The deletion will not fail if the Schema Id is not present on the Data Row.

# Specify the schemas to delete
schemas = [tag_schema, ...]

# Create a delete object
deletes = DeleteDataRowMetadata(
    data_row_id=md.data_row_id,
    fields=[s.uid for s in schemas]
)

mdo.bulk_delete([deletes]) # pass an array of deletes

Complete Python SDK tutorial


What’s Next
Did this page help you?