Metadata
Metadata provides additional information about a given data row. You can use metadata to bring additional context to your data rows. You can also use metadata and filter your data rows for more efficient labeling, model diagnostics, and data selection workflows.
Metadata is broken into two categories, reserved fields and custom fields. Reserved fields are defined by Labelbox and custom fields enable you to import your own metadata schema.
For more information on all available fields, see Metadata.
Create metadata schema
At this time, you can create metadata schema through the UI.
See Metadata schema to learn more about how to create metadata schema prior to uploading your metadata.
Import metadata
You can upload data rows together with metadata. You can see a bulk upload example in
Create a dataset.
Here is a more detailed walk-through.
Limits on uploading data rows with metadata
Currently, there is a 30k limit on bulk uploading data rows containing metadata.
Import metadata classes
import labelbox
from labelbox.schema.data_row_metadata import DataRowMetadataField
import datetime
from uuid import uuid4
client = labelbox.Client(api_key="LABELBOX_API_KEY")
Construct metadata fields
To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField
object or specify the Schema ID and value in a dictionary format.
## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = client.get_data_row_metadata_ontology()
# List all available fields
metadata_ontology.fields
## Construct a metadata field of string kind
tag_schema = metadata_ontology.reserved_by_name["tag"]
tag_metadata_field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="tag_string", # typed inputs
)
# Construct an metadata field of datetime
datetime_schema = metadata_ontology.reserved_by_name["captureDateTime"]
capture_datetime_field = DataRowMetadataField(
schema_id=datetime_schema.uid, # specify the schema id
value=datetime.datetime.utcnow(), # typed inputs
)
# Construct a metadata field of Enums options
train_schema = metadata_ontology.reserved_by_name["split"]["train"]
split_metadta_field = DataRowMetadataField(
schema_id=train_schema.parent, # specify the schema id
value=train_schema.uid, # typed inputs
)
# Custom fields, must be created in UI prior to this
custom_field = metadata_ontology.custom_by_name["my-custom-field"]
custome_metadta_field = DataRowMetadataField(
schema_id=custom_field.uid, # specify the schema id
value="custome_field_value", # typed inputs
)
Upload data rows with metadata
Option 1: Specify metadata with a list of DataRowMetadataField
. This is the recommended option since it comes with validation for metadata fields.
dataset = client.create_dataset(name="Bulk import example")
data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "external_id": str(uuid4())}
data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field, split_metadta_field]
task = dataset.create_data_rows([data_row])
task.wait_till_done()
Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the DataRowMetadataField
objects.
dataset = client.create_dataset(name="Bulk import example")
data_row = {"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg", "external_id": str(uuid4())}
data_row['metadata_fields'] = [
{"schema_id": tag_schema.uid, "value": "tag_string"},
{"schema_id": datetime_schema.uid, "value": datetime.datetime.utcnow()},
{"schema_id": train_schema.parent, "value": train_schema.uid},
]
task = dataset.create_data_rows([data_row])
task.wait_till_done()
Export data rows with metadata
data_rows = dataset.export_data_rows(include_metadata=True)
Examine metadata of a data row
datarow = next(dataset.data_rows())
print(datarow)
Update or add metadata of existing Data Rows
Labelbox supports individual or bulk metadata upsert of data rows. Metadata will overwrite on a per-field basis.
tag_schema = metadata_ontology.reserved_by_name["tag"]
# Construct an enum field
field = DataRowMetadataField(
schema_id=tag_schema.uid, # specify the schema id
value="updated", # typed inputs
)
# Completed object ready for import
metadata_payload = DataRowMetadata(
data_row_id="DATAROW_ID", # DataRow Id not ExternalId
fields=[field]
)
# Provide a list of DataRowMetadata objects to upload
metadata_ontology.bulk_upsert([metadata_payload])
Add metadata to data rows
Metadata can be bulk added to up to 100,000 data rows at once via the Catalog user interface. This method is valuable for tagging a set of data rows in the same instance they are uncovered or gathered, such as an edge case or data from a single cluster in an embedding space.
Interested in adding metadata using the Python SDK?
For information on adding metadata to data rows programmatically, please see our documentation Python SDK metadata guide.
Add via filters
To add metadata to a set of data rows using filters follow these steps:
- Go to Catalog
- Select a batch of data rows
- From the n selected button, click Add metadata
- Select the metadata to apply to those data rows
- Click Confirm
Add via embedding projector
To add metadata to a set of data rows via the embedding projector, follow these steps:
- Go to Model
- Click the projector view icon
- Select a set of data points from the cluster
- Enter the metadata information
- Click Confirm
Filter & view metadata
Once you upload your metadata, you can easily filter and view metadata in Catalog. Note: this functionality will not work for labelers.
Click on any data row to open the detailed view. There you can find the metadata for that asset.
Export metadata
You can bulk export metadata by data row with the SDK. You can specify multiple data row IDs in the array.
# Export metadata from a list of data row ids.
metadata = mdo.bulk_export([data_row.uid])
Delete metadata
You can delete metadata for a data row through the SDK. To delete metadata for a data row, you must specify the fields you want to delete by schema ID. The deletion will not fail if the schema ID is not present on the data row.
# Specify the schemas to delete
schemas = [tag_schema, ...]
# Create a delete object
deletes = DeleteDataRowMetadata(
data_row_id=md.data_row_id,
fields=[s.uid for s in schemas]
)
mdo.bulk_delete([deletes]) # pass an array of deletes
Updated 17 days ago