Data Row Metadata

Getting started

Metadata is the data that provides information about other data. Use metadata to bring additional context to your Data Rows and slice and dice your data for more efficient labeling, model diagnostics, and data selection workflows.

Metadata Ontology

The metadata ontology defines the fields available within an organization. You can see the ontology by navigating to the Schema page. There are two types of fields, reserved and custom. Reserved fields are pre-defined by Labelbox to help you easily access the full range of features in Labelbox and Custom fields are ones you've defined yourself. Each metadata field has a unique Schema Id that is used to upload data to Labelbox. A DataRow can have a maximum of 5 metadata fields at a time.

πŸ“˜

Attachments vs Metadata

This should not be confused with attachments. Attachments provide additional context for labelers but are not searchable within Catalog.

Reserved fields

Fields pre-defined by Labelbox.

Name

Type

tag

Free text field

embedding

An embedding field

split

Enum - [train, valid, test]

captureDateTime

ISO 8601 datetime field. All times must be in UTC

precomputedImageEmbedding

Embedding computed for uploaded image data

precomputedTextEmbedding

Embedding computed for uploaded text data

Custom fields

You can create custom metadata fields through the App by navigating to the Schema page. Each metadata field must have a unique name and a type. Once a field has been created you cannot change the type. An organization has a limited number of fields based on their account tier.

Data types

All metadata is strictly typed. Fields can be one of the following kinds.

Name

Notes

Filtering

Embedding

Used for similarity. A float vector of length 128

Similarity

String

free text field

Equals & prefix matching

Enum

Enum field with options

Equals

Option (Enum)

Option of an enum

DateTime

An ISO 8601 datetime field. All times must be in UTC timezone

Equals, greater than, less than, between

Number

Integer or Float

Equals, greater than, less than, between

Uploading metadata

At this time you can only import and export metadata with the Python SDK.

import labelbox 
client = labelbox.Client()

mdo = client.get_data_row_metadata_ontology()

# List all available fields
mdo.fields

# Access a field by name, names are unique
tag_schema = mdo.reserved_by_name["tag"]
# Enums options can be accessed as such
train_schema = mdo.reserved_by_name["split"]["train"]
# Custom fields
custom_field = mdo.custom_by_name["my-custom-field"]

Create

To upload Metadata you must construct two objects. To upload metadata for a DataRow you construct a DataRowMetadata object which contains the fields DataRowMetadataFields to put on the DataRow. To construct a metadata field you must provide the Schema Id for the field and value that will be uploaded. Metadata will overwrite on a per-field basis. All metadata uploads are through a bulk endpoint.

# Get data row
dr = next(dataset.export_data_rows())

train_schema = mdo.reserved_by_name["split"]["train"]

# Construct an enum field
field = DataRowMetadataField(
    schema_id=train_schema.parent,  # specify the schema id
    value=train_schema.uid, # typed inputs
)

# Completed object ready for upload
upload = DataRowMetadata(
    data_row_id=dr.uid,  # DataRow Id not ExternalId
    fields=[field]
)
# Provide a list of DataRowMetadata objects to upload
mdo.bulk_upsert([upload])
# Construct a string field
DataRowMetadataField(
    schema_id=tag_schema.uid,
    value="my-message",
)
from datetime import datetime

# Create a utc timezone datetime object
dt = datetime.utcnow()

# Construct a datetime field
DataRowMetadataField(
     schema_id=mdo.reserved_by_name["captureDateTime"].uid,
     value=dt,
)
# Construct a number field
confidence = 0.4
# Construct a datetime field
DataRowMetadataField(
     schema_id=mdo.custom_by_name["confidence"].uid, # custom field
     value=confidence,
)

Filter & view

Metadata can be used within Catalog and Projects to filter data. You can view Metadata on a DataRow using the detail view for a DataRow.

Export

You can also bulk export Metadata with the SDK by DataRow. You can specify multiple Data Row IDs in the array.

# Export metadata from a list of data row ids.
metadata = mdo.bulk_export([data_row.uid])

Delete

You can delete Metadata for a DataRow through the SDK. To delete Metadata for a DataRow, you must specify the fields you want to delete by Schema Id. The deletion will not fail if the Schema Id is not present on the DataRow.

# Specify the schemas to delete
schemas = [tag_schema, ...]

# Create a delete object
deletes = DeleteDataRowMetadata(
    data_row_id=md.data_row_id,
    fields=[s.uid for s in schemas]
)

mdo.bulk_delete([deletes]) # pass an array of deletes

Complete Python SDK tutorial

Tutorial

Github

Google Colab

Data Row Metadata

View in Github

View in Google Colab

FAQ

Can labelers see metadata?

No, that’s what Attachments are for.

Can metadata be used to customize the queue?

Not directly. But you could use the Catalog to query a set of Data Rows that have specific metadata in common and add to the project for labeling using Batch queue (beta) .


Did this page help you?