Data row - Labelbox

Open in Colab

GitHub

Data rows are the assets that are being labeled; a data row cannot exist without belonging to a dataset. Data rows are added to a labeling task by uploading them to datasets and then creating project batches.

Client

import labelbox as lb
client = lb.Client(api_key="<YOUR_API_KEY>")

Get a data row

data_row = client.get_data_row("<data_row_id>")

data_row = client.get_data_row_by_global_key("key1")

data_row_ids = get_data_row_ids_for_global_keys(["key1", "key2"])

Assign global keys

global_key_data_row_inputs = [
  {"data_row_id": "<data_row_id>", "global_key": "key1"},
  {"data_row_id": "<data_row_id>", "global_key": "key2"}
]

client.assign_global_keys_to_data_rows(global_key_data_row_inputs)

Clear global keys

client.clear_global_keys(["key1", "key2"])

Fundamentals

Create data rows

Data rows are created via methods from the Dataset class. For complete details and additional examples of approaches for creating data rows, please see Dataset. We recommend using created_data_rows() and upsert_data_rows() methods for large data row upload operations.

Special character handlingPlease note that certain characters like #,<, > and || are not supported in URLs and should be avoided in your file names to prevent loading issues.Please refer to https://datatracker.ietf.org/doc/html/rfc2396#section-2.4.3 on URI standards.A good test for the handling of special characters is to test URLs in your browser address bar — if the URL doesn’t load properly in your browser, it won’t load in Labelbox.

The only required argument when creating a data row is the row_data. However, Labelbox strongly recommends supplying each data row with a global key upon creation.

# this example uses the uuid package to generate unique global keys
from uuid import uuid4

data =[
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
},
{
"row_data": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg",
"global_key": str(uuid4())
}
]

dataset.upsert_data_rows(data)

# or alternatively use

dataset.create_data_rows(data)

Export data rows

For a complete reference on how to export data rows, please visit the export overview documentation Export a single data row:

DATA_ROW_GLOBAL_KEY = "<global key>"

export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True,
"interpolated_frames": True,
"embeddings": True
}

# Provide a list of data row global keys

export_task = lb.DataRow.export(client=client, global_keys=[DATA_ROW_GLOBAL_KEY], params=export_params)
export_task.wait_till_done()

if export_task.has_results():

# Start stream

stream = export_task.get_buffered_stream()

# Iterate through data rows

for data_row in stream:
print(data_row.json)

Methods

Create and update an attachment

Please reference our attachment documentation for more information on supported attachment data types. For details on creating attachments in the same step as creating data rows, see Dataset.

attachment = data_row.create_attachment(
  attachment_type="<attachment_type>",		# specify a type from the table above
  attachment_value="<attachment_value>",	# provide a value of the appropriate type
  attachment_name="<attachment_name>"		# name the attachment for reference
)

# Update an attachment

attachment.update(type= "<attachment_type>", value="<new_value>")

Get the winning label ID

For more details on what a “winning” label is and how it is chosen, see Consensus.

data_row.get_winning_label_id(project_id="<project_id>")

Update data rows from a dataset

We recommend using theupsert_data_rows() method to update data rows. When using this method to update data rows, you need to pass a key, which can reference either a global key or data row ID. Include any fields you wish to update along with their new values.

# Update the global key assodicated with the DATAROW_ID or GLOBAL_KEY, and include a additional metadata
dataset = client.get_dataset("<DATASET_ID>")

data = {
"key": lb.UniqueId(DATA_ROW_ID),
"global_key": "NEW-ID-%id" % uuid.uuid1(),
"metadata_fields": [ # New metadata
lb.DataRowMetadataField(
schema_id=mdo.reserved_by_name['captureDateTime'].uid,
value="2000-01-01 00:00:00"
), # Include original metadata otherwise it will be removed
lb.DataRowMetadataField(
schema_id=mdo.reserved_by_name["tag"].uid,
value="tag_string",
),
]
}

task5 = dataset.upsert_data_rows([data])
task5.wait_till_done()
print("ERRORS: " , task5.errors)
print("RESULTS:" , task5.result)

The update method is also available; however, for a large number of data row updates, we recommend using upsert_data_rows() The update method only supports updates on global_key, row_data or external_id values.

data_row.update(
  row_data="<new_row_data>",
  global_key="<new_unique_global_key>",
  # external IDs are soon to be deprecated, use global keys instead
  external_id="new_external_id"
)

Delete data rows

Deleting data rows cannot be undoneThese methods delete data rows along with all labels made on each data row. This action cannot be reverted without the assistance of Labelbox support.

# delete one data row
data_row.delete()

# bulk delete data rows -- takes a list of Data Row objects

lb.DataRow.bulk_delete(data_rows=[<DataRow>, <DataRow>])

# for example, delete the data rows in a dataset, but not the Dataset object

dataset = client.get_dataset("<dataset_id>")
lb.DataRow.bulk_delete(data_rows=list(dataset.data_rows()))

Limit on bulk deleting data rows

The lb.DataRow.bulk_delete() method can delete a maximum of 4,000 data rows per call.

Attributes

Get the basics

# global key (str)
data_row.global_key

# external ID (str) -- soon to be deprecated, use global keys instead
data_row.external_id

# row data (str)
data_row.row_data

# media attributes (dict)
data_row.media_attributes

# updated at (datetime)
data_row.updated_at

# created at (datetime)
data_row.created_at

# created by (relationship to User object)
user = data_row.created_by()

# organization (relationship to Organization object)
organization = data_row.organization()

# dataset (relationship to Dataset object)
dataset = data_row.dataset()

Get the attachments

# relationship to many AssetAttachment objects
attachments = data_row.attachments()

# inspect one attachment

next(attachments)

# inspect all attachments

for attachment in attachments:
print(attachment)

# for ease of use, you can convert the paginated collection to a list

list(attachments)

Get the metadata

# get the metadata fields associated with the data row (list)
data_row.metadata_fields

# get the metadata fields as DataRowMetadataField objects (list)
data_row.metadata

Get the labels

# relationship to many Label objects
labels = data_row.labels()

# inspect one label made on the data row

next(labels)

# inspect all labels made on the data row

for label in labels:
print(label)

# for ease of use, you can convert the paginated collection to a listlabel

list(labels)

Open in Colab

GitHub

​Client

​Get a data row

​Assign global keys

​Clear global keys

​Fundamentals

​Create data rows

​Export data rows

​Methods

​Create and update an attachment

​Get the winning label ID

​Update data rows from a dataset

​Delete data rows

​Limit on bulk deleting data rows

​Attributes

​Get the basics

​Get the attachments

​Get the metadata

​Get the labels

Client

Get a data row

Assign global keys

Clear global keys

Fundamentals

Create data rows

Export data rows

Methods

Create and update an attachment

Get the winning label ID

Update data rows from a dataset

Delete data rows

Limit on bulk deleting data rows

Attributes

Get the basics

Get the attachments

Get the metadata

Get the labels