A developer guide for creating and modifying data rows via the Python SDK.
Client
import labelbox as lb
client = lb.Client(api_key="<YOUR_API_KEY>")
Get a data row
data_row = client.get_data_row("<data_row_id>")
data_row_ids = get_data_row_ids_for_global_keys(["key1", "key2"])
Assign global keys
global_key_data_row_inputs = [
{"data_row_id": "<data_row_id>", "global_key": "key1"},
{"data_row_id": "<data_row_id>", "global_key": "key2"}
]
client.assign_global_keys_to_data_rows(global_key_data_row_inputs)
Clear global keys
client.clear_global_keys(["key1", "key2"])
Fundamentals
Create data rows
Data rows are created via methods of the Dataset
class. For complete details and additional examples of approaches for creating data rows, please see Dataset.
The only required argument when creating a data row is the row_data. However, Labelbox strongly recommends supplying each data row with a global key upon creation.
# this example uses the uuid package to generate unique global keys
from uuid import uuid4
dataset.create_data_rows(
[
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
}
]
)
Export data rows
data_rows = dataset.export_data_rows()
# optionally, you can include metadata in the export
data_rows = dataset.export_data_rows(include_metadata=True)
Methods
Create an attachment
Type | Value | Description |
---|---|---|
IMAGE | URL of an image (PNG/JPG) (HTTPS or IAM delegated access path) | Labelers can see the attached image(s) while labeling the primary data row. |
VIDEO | URL of a video (MP4) (HTTPS or IAM delegated access path) | Labelers can see the attached video(s) while labeling the primary data row. |
RAW_TEXT | Text string or hyperlink | Labelers can see the attached text or hyperlink while labeling the primary data row. If passing a hyperlink, it will be clickable. If you want to display text from a URL endpoint, please use the TEXT_URL type. |
TEXT_URL | URL of a text file (HTTPS or IAM delegated access path) | Labelers can see the attached text from the linked text URL while labeling the primary data row. |
HTML | URL of an HTML file (HTTPS or IAM delegated access path) | Renders HTML in an iframe as an attachment. Labelers can see and interact with the attached HTML widget while labeling the primary data row. |
IMAGE_OVERLAY | URL of the image layer (PNG/JPG) (HTTPS or IAM delegated access path). | A visualization tool designed to help you view images in different ways by adding layers over the asset. Image overlays can only be attached to image assets. |
For details on creating attachments in the same step as creating data rows, see Dataset.
data_row.create_attachment(
attachment_type="<attachment_type>", # specify a type from the table above
attachment_value="<attachment_value>", # provide a value of the appropriate type
attachment_name="<attachment_name>" # name the attachment for reference
)
Get the winning label ID
For more details on what a "winning" label is and how it is chosen, see Consensus.
data_row.get_winning_label_id(project_id="<project_id>")
Update a data row
You can update any of the row_data
, global_key
, or external_id
for an existing data row.
data_row.update(
row_data="<new_row_data>",
global_key="<new_unique_global_key>",
# external IDs are soon to be deprecated, use global keys instead
external_id="new_external_id"
)
Delete data rows
Deleting data rows cannot be undone
These methods delete data rows along with all labels made on each data row. This action cannot be reverted without the assistance of Labelbox support.
# delete one data row
data_row.delete()
# bulk delete data rows -- takes a list of Data Row objects
lb.DataRow.bulk_delete(data_rows=[<DataRow>, <DataRow>])
# for example, delete the data rows in a dataset, but not the Dataset object
dataset = client.get_dataset("<dataset_id>")
lb.DataRow.bulk_delete(data_rows=list(dataset.data_rows()))
Limit on bulk deleting data rows
The
lb.DataRow.bulk_delete()
method can delete a maximum of 4,000 data rows per call.
Attributes
Get the basics
# global key (str)
data_row.global_key
# external ID (str) -- soon to be deprecated, use global keys instead
data_row.external_id
# row data (str)
data_row.row_data
# media attributes (dict)
data_row.media_attributes
# updated at (datetime)
data_row.updated_at
# created at (datetime)
data_row.created_at
# created by (relationship to User object)
user = data_row.created_by()
# organization (relationship to Organization object)
organization = data_row.organization()
# dataset (relationship to Dataset object)
dataset = data_row.dataset()
Get the attachments
# relationship to many AssetAttachment objects
attachments = data_row.attachments()
# inspect one attachment
next(attachments)
# inspect all attachments
for attachment in attachments:
print(attachment)
# for ease of use, you can convert the paginated collection to a list
list(attachments)
Get the metadata
# get the metadata fields associated with the data row (list)
data_row.metadata_fields
# get the metadata fields as DataRowMetadataField objects (list)
data_row.metadata
Get the labels
# relationship to many Label objects
labels = data_row.labels()
# inspect one label made on the data row
next(labels)
# inspect all labels made on the data row
for label in labels:
print(label)
# for ease of use, you can convert the paginated collection to a listlabel
list(labels)