Developer guide for creating, importing, exporting, and modifying metadata fields via the Python SDK.
Metadata types
All metadata needs to be one of the following types:
Type | Descriptions | Enum value |
---|---|---|
DateTime | An ISO 8601 datetime field. All times must be in UTC timezone | DataRowMetadataKind.datetime |
Number | Floating-point value (max: 64-bit float) | DataRowMetadataKind.number |
String | Free text field. Max 4,096 characters. | DataRowMetadataKind.string |
Enum | Enum field with options. Multiple options can be imported. | DataRowMetadataKind.enum |
Option (Enum) | Option of an enum. Max 64 options can be created per Enum type. 128 for enterprise customers and can be further increased upon request. | DataRowMetadataKind.enumoption |
Embedding | 128 float 32 vector used for similarity | DataRowMetadataKind.embedding |
Reserved fields
The following field names are reserved and cannot be used as custom metadata field names:
Name | Type | Description |
---|---|---|
tag | String | The tags of the data row |
split | Enum | The split of the dataset that the data row belongs, includingtrain , valid , and test |
captureDateTime | DateTime | The timestamp when the data is captured |
skipNFrames | Number | (Video data only) The number of frames to skip |
Construct metadata fields
To construct a metadata field, you must provide the Schema ID for the field and the value that will be uploaded. You can do this in two ways:
- Option 1: Specify the metadata using the DataRowMetadataField object (comes with validation for metadata fields)
- Option 2: Specify the metadata fields in dictionary format without declaring the DataRowMetadataField objects
The Metadata value
attribute should not be null
.
metadata_fields = []
## Construct a metadata field of string kind
tag_schema = metadata_ontology.get_by_name("tag")
tag_metadata_field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="tag_string", # typed inputs
)
metadata_fields.append(tag_metadata_field)
# Construct an metadata field of datetime
datetime_schema = metadata_ontology.get_by_name("captureDateTime")
capture_datetime_field = DataRowMetadataField(
name=datetime_schema.name, # specify the schema id
value=datetime.datetime.utcnow(), # typed inputs
)
metadata_fields.append(capture_datetime_field)
# # Construct a metadata field of Enums options. You can import multiple options.
test_schema = metadata_ontology.get_by_name("split")["test"]
test_schema_field = DataRowMetadataField(
schema_id=test_schema.parent, # specify the schema id
value=test_schema.uid, # typed inputs
)
metadata_fields.append(test_schema_field)
valid_schema = metadata_ontology.get_by_name("split")["valid"]
valid_schema_field = DataRowMetadataField(
schema_id=valid_schema.parent, # specify the schema id
value=valid_schema.uid, # typed inputs
)
metadata_fields.append(valid_schema_field)
metadata_fields = []
## Construct a metadata field of string kind
tag_schema = metadata_ontology.get_by_name("tag")
metadata_fields.append({"name": tag_schema.name, "value": "tag_value"})
# Construct an metadata field of datetime
datetime_schema = metadata_ontology.get_by_name("captureDateTime")
metadata_fields.append({"name": datetime_schema.name, "value": datetime.datetime.utcnow()})
## Construct a metadata field of Enums options. You can import multiple options.
train_schema = metadata_ontology.get_by_name("split")["valid"]
metadata_fields.append({"schema_id": train_schema.parent, "value": train_schema.uid})
test_schema = metadata_ontology.get_by_name("split")["test"]
metadata_fields.append({"schema_id": test_schema.parent, "value": test_schema.uid})
Create custom metadata schema
import labelbox
from labelbox.schema.data_row_metadata import DataRowMetadataKind
client = labelbox.Client(api_key="LABELBOX_API_KEY")
metadata_ontology = client.get_data_row_metadata_ontology()
# create a custom metadata schema (set the kind value to a supported data type)
metadata_schema = metadata_ontology.create_schema(name="metadata_name", kind=DataRowMetadataKind.string)
# get the schema id
schema_id = metadata_schema.uid
Upload data rows with metadata
Custom metadata field limits vary according to your subscription; for details, see Limits.
data_row = {
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": "metadata_tutorial",
"metadata_fields": metadata_fields
}
dataset = client.create_dataset(name="Create data row with metadata")
task = dataset.create_data_rows([data_row])
task.wait_till_done()
Get metadata schema
# you can look up a schema by name.
metadata_schema = metadata_ontology.get_by_name("tag")
metadata_schema = metadata_ontology.get_by_name("enum_metadata_name")
# check the schema
print(metadata_schema)
schema_id = metadata_schema.uid
Get metadata fields (ontology)
## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = client.get_data_row_metadata_ontology()
# List all available fields
metadata_ontology.fields
Get metadata fields
datarow = next(dataset.data_rows())
for metadata_field in datarow.metadata_fields:
print(metadata_field['name'], ":", metadata_field['value'])
Result:
tag : custom_tag
split : train
captureDateTime : 2023-04-04T15:24:37.229417Z
Bulk export data rows with metadata
export_params= {
"performance_details": True,
"label_details": True,
"metadata_fields": True
}
export_task = dataset.export(params=export_params)
export_task.wait_till_done()
if export_task.has_errors():
export_task.get_buffered_stream(
stream_type=lb.StreamType.ERRORS).start(
stream_handler=lambda error: print(error))
if export_task.has_result():
stream = export_task.get_buffered_stream()
for data_row in stream:
print(data_row.json)
Result:
{
"data_row": {
"id": "clflxqzty07fj077qa3dd4v27",
"global_key": "metadata_tutorial",
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg"
},
"media_attributes": {
"height": 1285,
"width": 2258,
"mime_type": "image/jpeg"
},
"metadata_fields": [{
"schema_id": "cko8s9r5v0001h2dk9elqdidh",
"schema_name": "tag",
"value": "tag_string"
}, {
"schema_id": "cko8sbczn0002h2dkdaxb5kal",
"schema_name": "split",
"value": [{
"schema_id": "cko8sc2yr0004h2dk69aj5x63",
"schema_name": "valid"
}, {
"schema_id": "cko8scbz70005h2dkastwhgqt",
"schema_name": "test"
}]
}, {
"schema_id": "cko8sdzv70006h2dk8jg64zvb",
"schema_name": "captureDateTime",
"value": "2023-03-24T02:40:40.832576+00:00"
}]
}
Export metadata by data row ID
You can bulk export metadata by data row with the SDK.
data_row_ids = ['<data_row_id>']
global_keys = ['<global_key>']
#The data row identifiers methods (lb.DataRowIds and lb.GlobalKeys) validate whether the provided ID is a global key or a data row ID.
#Additionally, they ensure that all IDs from the list provided are unique
datarow_identifiers = lb.DataRowIds(data_row_ids)
global_key_identifiers = lb.GlobalKeys(global_keys)
# Use one of the identifiers
mdo.bulk_export(data_row_ids=global_key_identifiers)
# mdo.bulk_export(data_row_ids=datarow_identifiers)
Delete metadata fields from a data row
global_key = '<global_key>'
schema_ids_to_delete =['<metadata_schema_id>']
data_row_id = '<data_row_id>'
deletions = [
lb.DeleteDataRowMetadata(data_row_id=lb.GlobalKey(global_key), fields=schema_ids_to_delete)
]
# Delete the specified metadata on the data row
mdo.bulk_delete(deletes=deletions)
Upsert metadata to existing data rows
Labelbox supports individual or bulk metadata upsert of data rows. Metadata overwrites occur on a per-field basis.
tag_schema = metadata_ontology.get_by_name("tag")
# Construct a string field
field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="updated", # typed inputs
)
# Completed object ready for import
metadata_payload = DataRowMetadata(
global_key="<global key>", # optionally, set the argument to data_row_id to use a data row ID
fields=[field]
)
# Provide a list of DataRowMetadata objects to upload
metadata_ontology.bulk_upsert([metadata_payload])
Update metadata schema
You can update any custom metadata schema's name. However, the type cannot be modified. You also cannot modify the names of reserved fields.
# update a metadata schema's name
metadata_schema = metadata_ontology.update_schema(name="metadata_name", new_name="metadata_name_updated")
# Enum metadata schema is a bit different since it contains options.
# create an Enum metadata with options
enum_schema = metadata_ontology.create_schema(name="enum_metadata_name", kind=DataRowMetadataKind.enum,
options=["option 1", "option 2"])
# update an Enum metadata schema's name, similar to other metadata schema types
enum_schema = metadata_ontology.update_schema(name="enum_metadata_name", new_name="enum_metadata_name_updated")
# update an Enum metadata schema option's name, this only applies to Enum metadata schema.
enum_schema = metadata_ontology.update_enum_option(name="enum_metadata_name_option_updated", option="option 1",
new_option="option 3")
Delete metadata schema
You can delete a metadata schema by name.
status = metadata_ontology.delete_schema(name=metadata_schema.name)
# returns True if successfully deleted