Data row

Developer guide for creating and modifying data rows via the Python SDK.

Data rows are the assets that are being labeled; a data row cannot exist without belonging to a dataset. Data rows are added to a labeling task by uploading them to datasets and then creating project batches.

Client

import labelbox as lb
client = lb.Client(api_key="<YOUR_API_KEY>")

Get a data row

data_row = client.get_data_row("<data_row_id>")

data_row = client.get_data_row_by_global_key("key1")

data_row_ids = get_data_row_ids_for_global_keys(["key1", "key2"])

Assign global keys

global_key_data_row_inputs = [
  {"data_row_id": "<data_row_id>", "global_key": "key1"},
  {"data_row_id": "<data_row_id>", "global_key": "key2"}
]

client.assign_global_keys_to_data_rows(global_key_data_row_inputs)

Clear global keys

client.clear_global_keys(["key1", "key2"])

Fundamentals

Create data rows

Data rows are created via methods from the Dataset class. For complete details and additional examples of approaches for creating data rows, please see Dataset.

We recommend using created_data_rows() and upsert_data_rows() methods for large data row upload operations.

🚧

Special character handling

Please note that certain characters like #,<, > and || are not supported in URLs and should be avoided in your file names to prevent loading issues.

Please refer to https://datatracker.ietf.org/doc/html/rfc2396#section-2.4.3 on URI standards.

A good test for the handling of special characters is to test URLs in your browser address bar β€” if the URL doesn't load properly in your browser, it won't load in Labelbox.

The only required argument when creating a data row is the row_data. However, Labelbox strongly recommends supplying each data row with a global key upon creation.

# this example uses the uuid package to generate unique global keys
from uuid import uuid4

data =[
    {
      "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
        "global_key": str(uuid4())
    },
    {
      "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
      "global_key": str(uuid4())
    }
]

dataset.upsert_data_rows(data) 

# or alternatively use 

dataset.create_data_rows(data)

Export data rows

For a complete reference on how to export data rows, please visit the export overview documentation

Export a single data row:

DATA_ROW_GLOBAL_KEY = "<global key>"

export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "embeddings": True
}

# Provide a list of data row global keys
export_task = lb.DataRow.export(client=client, global_keys=[DATA_ROW_GLOBAL_KEY], params=export_params)
export_task.wait_till_done()

if export_task.has_results():
  # Start stream
  stream = export_task.get_buffered_stream()

  # Iterate through data rows
  for data_row in stream:
    print(data_row.json)


Methods

Create and update an attachment

Please reference our attachment documentation for more information on supported attachment data types.

For details on creating attachments in the same step as creating data rows, see Dataset.

attachment = data_row.create_attachment(
  attachment_type="<attachment_type>",		# specify a type from the table above
  attachment_value="<attachment_value>",	# provide a value of the appropriate type
  attachment_name="<attachment_name>"		# name the attachment for reference
)

# Update an attachment 
attachment.update(type= "<attachment_type>", value="<new_value>")

Get the winning label ID

For more details on what a "winning" label is and how it is chosen, see Consensus.

data_row.get_winning_label_id(project_id="<project_id>")

Update data rows from a dataset

We recommend using theupsert_data_rows() method to update data rows.

When using this method to update data rows, you need to pass a key, which can reference either a global key or data row ID.

Include any fields you wish to update along with their new values.

# Update the global key assodicated with the DATAROW_ID or GLOBAL_KEY, and include a additional metadata
dataset = client.get_dataset("<DATASET_ID>")

data = {
    "key": lb.UniqueId(DATA_ROW_ID),
    "global_key": "NEW-ID-%id" % uuid.uuid1(),
    "metadata_fields": [
        # New metadata
        lb.DataRowMetadataField(
            schema_id=mdo.reserved_by_name['captureDateTime'].uid,
            value="2000-01-01 00:00:00"
        ),
        # Include original metadata otherwise it will be removed
        lb.DataRowMetadataField(
            schema_id=mdo.reserved_by_name["tag"].uid,
            value="tag_string",
        ),
    ]
}

task5 = dataset.upsert_data_rows([data])
task5.wait_till_done()
print("ERRORS: " , task5.errors)
print("RESULTS:" , task5.result)

The update method is also available; however, for a large number of data row updates, we recommend using upsert_data_rows()

The update method only supports updates on global_key, row_data or external_id values.

data_row.update(
  row_data="<new_row_data>",
  global_key="<new_unique_global_key>",
  # external IDs are soon to be deprecated, use global keys instead
  external_id="new_external_id"
)

Delete data rows

❗️

Deleting data rows cannot be undone

These methods delete data rows along with all labels made on each data row. This action cannot be reverted without the assistance of Labelbox support.

# delete one data row
data_row.delete()

# bulk delete data rows -- takes a list of Data Row objects
lb.DataRow.bulk_delete(data_rows=[<DataRow>, <DataRow>])

# for example, delete the data rows in a dataset, but not the Dataset object
dataset = client.get_dataset("<dataset_id>")
lb.DataRow.bulk_delete(data_rows=list(dataset.data_rows()))

πŸ“˜

Limit on bulk deleting data rows

The lb.DataRow.bulk_delete() method can delete a maximum of 4,000 data rows per call.


Attributes

Get the basics

# global key (str)
data_row.global_key

# external ID (str) -- soon to be deprecated, use global keys instead
data_row.external_id

# row data (str)
data_row.row_data

# media attributes (dict)
data_row.media_attributes

# updated at (datetime)
data_row.updated_at

# created at (datetime)
data_row.created_at

# created by (relationship to User object)
user = data_row.created_by()

# organization (relationship to Organization object)
organization = data_row.organization()

# dataset (relationship to Dataset object)
dataset = data_row.dataset()

Get the attachments

# relationship to many AssetAttachment objects
attachments = data_row.attachments()

# inspect one attachment
next(attachments)

# inspect all attachments
for attachment in attachments:
  print(attachment)

# for ease of use, you can convert the paginated collection to a list
list(attachments)

Get the metadata

# get the metadata fields associated with the data row (list)
data_row.metadata_fields

# get the metadata fields as DataRowMetadataField objects (list)
data_row.metadata

Get the labels

# relationship to many Label objects
labels = data_row.labels()

# inspect one label made on the data row
next(labels)

# inspect all labels made on the data row 
for label in labels:
  print(label)

# for ease of use, you can convert the paginated collection to a listlabel
list(labels)