A developer guide for creating, importing, exporting, and modifying metadata fields via the Python SDK.
Notebook tutorial: data_row_metadata.ipynb
Metadata kinds
You can create custom metadata schema fields that will be accessible in your organization. The metadata schema name must be unique in your organization. This list of customer metadata schema kinds you can create are:
DataRowMetadataKind.string
DataRowMetadataKind.number
DataRowMetadataKind.datetime
DataRowMetadataKind.enum
Reserved fields
tag
split
captureDateTime
embedding
Construct metadata fields
To construct a metadata field you must provide the Schema ID for the field and the value that will be uploaded. You can do this in two ways:
- Option 1: Specify the metadata using the DataRowMetadataField object (comes with validation for metadata fields)
- Option 2: Specify the metadata fields in dictionary format without declaring the DataRowMetadataField objects
metadata_fields = []
## Construct a metadata field of string kind
tag_schema = metadata_ontology.get_by_name("tag")
tag_metadata_field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="tag_string", # typed inputs
)
metadata_fields.append(tag_metadata_field)
# Construct an metadata field of datetime
datetime_schema = metadata_ontology.get_by_name("captureDateTime")
capture_datetime_field = DataRowMetadataField(
name=datetime_schema.name, # specify the schema id
value=datetime.datetime.utcnow(), # typed inputs
)
metadata_fields.append(capture_datetime_field)
# # Construct a metadata field of Enums options. You can import multiple options.
test_schema = metadata_ontology.get_by_name("split")["test"]
test_schema_field = DataRowMetadataField(
schema_id=test_schema.parent, # specify the schema id
value=test_schema.uid, # typed inputs
)
metadata_fields.append(test_schema_field)
valid_schema = metadata_ontology.get_by_name("split")["valid"]
valid_schema_field = DataRowMetadataField(
schema_id=valid_schema.parent, # specify the schema id
value=valid_schema.uid, # typed inputs
)
metadata_fields.append(valid_schema_field)
metadata_fields = []
## Construct a metadata field of string kind
tag_schema = metadata_ontology.get_by_name("tag")
metadata_fields.append({"name": tag_schema.name, "value": "tag_value"})
# Construct an metadata field of datetime
datetime_schema = metadata_ontology.get_by_name("captureDateTime")
metadata_fields.append({"name": datetime_schema.name, "value": datetime.datetime.utcnow()})
## Construct a metadata field of Enums options. You can import multiple options.
train_schema = metadata_ontology.get_by_name("split")["valid"]
metadata_fields.append({"schema_id": train_schema.parent, "value": train_schema.uid})
test_schema = metadata_ontology.get_by_name("split")["test"]
metadata_fields.append({"schema_id": test_schema.parent, "value": test_schema.uid})
Create custom metadata schema
import labelbox
from labelbox.schema.data_row_metadata import DataRowMetadataKind
client = labelbox.Client(api_key="LABELBOX_API_KEY")
metadata_ontology = client.get_data_row_metadata_ontology()
# create a custom metadata schema (string, number, datetime, embedding)
metadata_schema = metadata_ontology.create_schema(name="metadata_name", kind=DataRowMetadataKind.string)
# You can create other metadata schema kinds, just change the kind to be one of:
# DataRowMetadataKind.number, DataRowMetadataKind.datetime, DataRowMetadataKind.embedding
# get the schema id
schema_id = metadata_schema.uid
Upload data rows with metadata
Limits
See this page to learn the limits for bulk uploading data rows with metadata.
data_row = {
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": "metadata_tutorial",
"metadata_fields": metadata_fields
}
dataset = client.create_dataset(name="Create data row with metadata")
task = dataset.create_data_rows([data_row])
task.wait_till_done()
Get metadata schema
# you can look up a schema by name.
metadata_schema = metadata_ontology.get_by_name("tag")
metadata_schema = metadata_ontology.get_by_name("enum_metadata_name")
# check the schema
print(metadata_schema)
schema_id = metadata_schema.uid
Get metadata fields (ontology)
## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = client.get_data_row_metadata_ontology()
# List all available fields
metadata_ontology.fields
Get metadata fields
datarow = next(dataset.data_rows())
for metadata_field in datarow.metadata_fields:
print(metadata_field['name'], ":", metadata_field['value'])
Result:
tag : custom_tag
split : train
captureDateTime : 2023-04-04T15:24:37.229417Z
Bulk export data rows with metadata
export_task = dataset.export_v2(params={
"performance_details": True,
"label_details": True,
"metadata_fields": True
})
export_task.wait_till_done()
print(export_task.result_url)
Result:
{
"data_row": {
"id": "clflxqzty07fj077qa3dd4v27",
"global_key": "metadata_tutorial",
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg"
},
"media_attributes": {
"height": 1285,
"width": 2258,
"mime_type": "image/jpeg"
},
"metadata_fields": [{
"schema_id": "cko8s9r5v0001h2dk9elqdidh",
"schema_name": "tag",
"value": "tag_string"
}, {
"schema_id": "cko8sbczn0002h2dkdaxb5kal",
"schema_name": "split",
"value": [{
"schema_id": "cko8sc2yr0004h2dk69aj5x63",
"schema_name": "valid"
}, {
"schema_id": "cko8scbz70005h2dkastwhgqt",
"schema_name": "test"
}]
}, {
"schema_id": "cko8sdzv70006h2dk8jg64zvb",
"schema_name": "captureDateTime",
"value": "2023-03-24T02:40:40.832576+00:00"
}]
}
Export metadata by data row ID
You can bulk export metadata by data row with the SDK. You can specify multiple data row IDs in the array.
# Export metadata from a list of data row ids.
metadata = mdo.bulk_export([data_row.uid])
Upsert metadata to existing data rows
Labelbox supports individual or bulk metadata upsert of data rows. Metadata will overwrite on a per-field basis.
tag_schema = metadata_ontology.get_by_name("tag")
# Construct a string field
field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="updated", # typed inputs
)
# Completed object ready for import
metadata_payload = DataRowMetadata(
global_key="<global key>", # optionally, set the argument to data_row_id to use a data row ID
fields=[field]
)
# Provide a list of DataRowMetadata objects to upload
metadata_ontology.bulk_upsert([metadata_payload])
Update metadata schema
You can update any custom metadata schema's name. However, the type cannot be modified. You also cannot modify the names of reserved fields.
# update a metadata schema's name
metadata_schema = metadata_ontology.update_schema(name="metadata_name", new_name="metadata_name_updated")
# Enum metadata schema is a bit different since it contains options.
# create an Enum metadata with options
enum_schema = metadata_ontology.create_schema(name="enum_metadata_name", kind=DataRowMetadataKind.enum,
options=["option 1", "option 2"])
# update an Enum metadata schema's name, similar to other metadata schema types
enum_schema = metadata_ontology.update_schema(name="enum_metadata_name", new_name="enum_metadata_name_updated")
# update an Enum metadata schema option's name, this only applies to Enum metadata schema.
enum_schema = metadata_ontology.update_enum_option(name="enum_metadata_name_option_updated", option="option 1",
new_option="option 3")
Delete metadata schema
You can delete a metadata schema by name.
status = metadata_ontology.delete_schema(name=metadata_schema.name)
# returns True if successfully deleted