Developer guide for creating and modifying data rows via the Python SDK.
Data rows are the assets that are being labeled; a data row cannot exist without belonging to a dataset. Data rows are added to a labeling task by uploading them to datasets and then creating project batches.
Client
import labelbox as lb
client = lb.Client(api_key="<YOUR_API_KEY>")
Get a data row
data_row = client.get_data_row("<data_row_id>")
data_row = client.get_data_row_by_global_key("key1")
data_row_ids = get_data_row_ids_for_global_keys(["key1", "key2"])
Assign global keys
global_key_data_row_inputs = [
{"data_row_id": "<data_row_id>", "global_key": "key1"},
{"data_row_id": "<data_row_id>", "global_key": "key2"}
]
client.assign_global_keys_to_data_rows(global_key_data_row_inputs)
Clear global keys
client.clear_global_keys(["key1", "key2"])
Fundamentals
Create data rows
Data rows are created via methods from the Dataset
class. For complete details and additional examples of approaches for creating data rows, please see Dataset.
We recommend using created_data_rows()
and upsert_data_rows()
methods for large data row upload operations.
Special character handling
Please note that certain characters like
#
,<
,>
and|
| are not supported in URLs and should be avoided in your file names to prevent loading issues.Please refer to https://datatracker.ietf.org/doc/html/rfc2396#section-2.4.3 on URI standards.
A good test for the handling of special characters is to test URLs in your browser address bar — if the URL doesn't load properly in your browser, it won't load in Labelbox.
The only required argument when creating a data row is the row_data
. However, Labelbox strongly recommends supplying each data row with a global key upon creation.
# this example uses the uuid package to generate unique global keys
from uuid import uuid4
data =[
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
}
]
dataset.upsert_data_rows(data)
# or alternatively use
dataset.create_data_rows(data)
Export data rows
For a complete reference on how to export data rows, please visit the export overview documentation
Export a single data row:
DATA_ROW_GLOBAL_KEY = "<global key>"
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True,
"interpolated_frames": True,
"embeddings": True
}
# Provide a list of data row global keys
export_task = lb.DataRow.export(client=client, global_keys=[DATA_ROW_GLOBAL_KEY], params=export_params)
export_task.wait_till_done()
if export_task.has_results():
# Start stream
stream = export_task.get_buffered_stream()
# Iterate through data rows
for data_row in stream:
print(data_row.json)
Methods
Create and update an attachment
Please reference our attachment documentation for more information on supported attachment data types.
For details on creating attachments in the same step as creating data rows, see Dataset.
attachment = data_row.create_attachment(
attachment_type="<attachment_type>", # specify a type from the table above
attachment_value="<attachment_value>", # provide a value of the appropriate type
attachment_name="<attachment_name>" # name the attachment for reference
)
# Update an attachment
attachment.update(type= "<attachment_type>", value="<new_value>")
Get the winning label ID
For more details on what a "winning" label is and how it is chosen, see Consensus.
data_row.get_winning_label_id(project_id="<project_id>")
Update data rows from a dataset
We recommend using theupsert_data_rows()
method to update data rows.
When using this method to update data rows, you need to pass a key
, which can reference either a global key or data row ID.
Include any fields you wish to update along with their new values.
# Update the global key assodicated with the DATAROW_ID or GLOBAL_KEY, and include a additional metadata
dataset = client.get_dataset("<DATASET_ID>")
data = {
"key": lb.UniqueId(DATA_ROW_ID),
"global_key": "NEW-ID-%id" % uuid.uuid1(),
"metadata_fields": [
# New metadata
lb.DataRowMetadataField(
schema_id=mdo.reserved_by_name['captureDateTime'].uid,
value="2000-01-01 00:00:00"
),
# Include original metadata otherwise it will be removed
lb.DataRowMetadataField(
schema_id=mdo.reserved_by_name["tag"].uid,
value="tag_string",
),
]
}
task5 = dataset.upsert_data_rows([data])
task5.wait_till_done()
print("ERRORS: " , task5.errors)
print("RESULTS:" , task5.result)
The update
method is also available; however, for a large number of data row updates, we recommend using upsert_data_rows()
The update
method only supports updates on global_key
, row_data
or external_id
values.
data_row.update(
row_data="<new_row_data>",
global_key="<new_unique_global_key>",
# external IDs are soon to be deprecated, use global keys instead
external_id="new_external_id"
)
Delete data rows
Deleting data rows cannot be undone
These methods delete data rows along with all labels made on each data row. This action cannot be reverted without the assistance of Labelbox support.
# delete one data row
data_row.delete()
# bulk delete data rows -- takes a list of Data Row objects
lb.DataRow.bulk_delete(data_rows=[<DataRow>, <DataRow>])
# for example, delete the data rows in a dataset, but not the Dataset object
dataset = client.get_dataset("<dataset_id>")
lb.DataRow.bulk_delete(data_rows=list(dataset.data_rows()))
Limit on bulk deleting data rows
The
lb.DataRow.bulk_delete()
method can delete a maximum of 4,000 data rows per call.
Attributes
Get the basics
# global key (str)
data_row.global_key
# external ID (str) -- soon to be deprecated, use global keys instead
data_row.external_id
# row data (str)
data_row.row_data
# media attributes (dict)
data_row.media_attributes
# updated at (datetime)
data_row.updated_at
# created at (datetime)
data_row.created_at
# created by (relationship to User object)
user = data_row.created_by()
# organization (relationship to Organization object)
organization = data_row.organization()
# dataset (relationship to Dataset object)
dataset = data_row.dataset()
Get the attachments
# relationship to many AssetAttachment objects
attachments = data_row.attachments()
# inspect one attachment
next(attachments)
# inspect all attachments
for attachment in attachments:
print(attachment)
# for ease of use, you can convert the paginated collection to a list
list(attachments)
Get the metadata
# get the metadata fields associated with the data row (list)
data_row.metadata_fields
# get the metadata fields as DataRowMetadataField objects (list)
data_row.metadata
Get the labels
# relationship to many Label objects
labels = data_row.labels()
# inspect one label made on the data row
next(labels)
# inspect all labels made on the data row
for label in labels:
print(label)
# for ease of use, you can convert the paginated collection to a listlabel
list(labels)