Project

Developer guide for creating and modifying projects using the Python SDK.

Client

import labelbox as lb
from labelbox.schema.quality_modes import QualityMode
client = lb.Client(api_key="<YOUR_API_KEY>")

Create a project

When creating a project, specify a media_type using one of the following values:

  • lb.MediaType.Audio
  • lb.MediaType.Conversational
  • lb.MediaType.Dicom
  • lb.MediaType.Document
  • lb.MediaType.Geospatial_Tile
  • lb.MediaType.Html
  • lb.MediaType.Image
  • lb.MediaType.Simple_Tile
  • lb.MediaType.Text
  • lb.MediaType.Video
# Create a new project
project = client.create_project(
    name="<project_name>",
    description="<project_description>",    # optional
    media_type=lb.MediaType.Image           # specify the media type
)

Get a project

project = client.get_project("<project_id>")

# alternatively, you can get a dataset by name 
project = client.get_projects(where=lb.Project.name == "<project_name>").get_one()

Methods

Create a batch

When creating a batch to send to a project, one of either global_keys or data_rows must be supplied as an argument. If using the data_rows argument, you can supply either a list of data row IDs or a list of DataRow class objects.

Optionally, you can supply a priority, ranging from 1 (highest) to 5 (lowest), for which the batch should be labeled. This will determine the order in which the included data rows appear in the labeling queue compared to other batches. If no value is provided, the batch will assume the lowest priority.

For more details, see Batch.

project.create_batch(
  name="<unique_batch_name>",
  global_keys=["key1", "key2", "key3"],
  priority=5,
)

# if the project uses consensus, you can optionally supply a dictionary with consensus settings
# if provided, the batch will use consensus with the specificed coverage and votes
project.create_batch(
  name="<unique_batch_name>",
  data_rows=["<data_row_id>", "<data_row_id>"],
  priority=1,
  consensus_settings={"number_of_labels": 3, "coverage_percentage": 0.1}
)

Create multiple batches

The project.create_batches() method accepts up to 1 million data rows. Batches are chunked into groups of 100k data rows (if necessary), which is the maximum batch size.

This method takes in a list of either data row IDs or DataRowobjects into a data_rows argument or global keys into a global_keys argument, but both approaches cannot be used in the same method. Batches will be created with the specified name_prefix argument and a unique suffix to ensure unique batch names. The suffix will be a 4-digit number starting at 0000.

For example, if the name prefix is demo-create-batches- and three batches are created, the names will be demo-create-batches-0000, demo-create-batches-0001, and demo-create-batches-0002. This method will throw an error if a batch with the same name already exists.

task = project.create_batches(
  name_prefix="demo-create-batches-",
  global_keys=global_keys,
  priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())

Create batches from a dataset

If you wish to create batches in a project using all the data rows of a dataset, instead of gathering global keys or IDs and iterating over subsets of data rows, you can use the project.create_batches_from_dataset() method.

This method takes in a dataset ID and creates a batch (or batches if there are more than 100k data rows) comprised of all data rows not already in the project. The same logic applies to the name_prefix argument and the naming of batches as described in the section immediately above.

dataset = client.get_dataset("<dataset_id>")

task = project.create_batches_from_dataset(
    name_prefix="demo-dataset-",
    dataset_id=dataset.uid,
    priority=5
)

print("Errors: ", task.errors())
print("Result: ", task.result())

Get the batches

# get the batches (objects of the Batch class)
batches = project.batches()

# inspect one batch
next(batches)

# inspect all batches
for batch in batches:
  print(batch)
    
# for ease of use, you can convert the paginated collection to a list
list(batches)

Get a batch

You can retrieve the batch of a particular project withclient.get_batch().

project_id = "<project_id>"
batch_id = "<batch_id>"

# returns a Batch object
batch = client.get_batch(project_id, batch_id)

Connect an ontology

# the argument must be an object of the Ontology class
project.connect_ontology(ontology)

Get the members and their roles

The scope of a member is provided by the attribute access_from from the class ProjectMember.

It can have one of the following values:

  • ORGANIZATION: project membership is derived from the organization role
  • PROJECT_MEMBERSHIP: access is given specifically to the project
  • USER_GROUP: access is given via a group
# get the members (objects of the ProjectMember class with relationships to a User and Role)  
members = project.members()

# inspect one member
member = next(members)
print(member.user(), member.role(), member.access_from)

# Display member info:
print(member.user().uid, member.user().email, member.role().name, access_from, sep="\t")

# inspect all members
for member in members:
  print(member.user(), member.role(), access_from)
  
# for ease of use, you can convert the paginated collection to a list
list(members)

Upload labeling instructions

Note that if the ontology connected to your project is connected to other projects, calling this method will attach the instructions to those projects as well.

# must be a PDF or HTML file
project.upsert_instructions("<local_file_path>")

Get the workflow tasks

# get the task queues (relationship to TaskQueue objects)
task_queues = project.task_queues()

# inspect all task queues
for task_queue in task_queues:
  print(task_queue)

Move data rows to a workflow task

Note that data rows need labels attached before being moved to a different workflow task. They can not be moved from "Initial Labeling."

project.move_data_rows_to_task_queue(
  data_row_ids=lb.GlobalKeys(["<global_key>", "<global_key>"]), # Use "lb.UniqueIds" for "<data_row_ids>"
  task_queue_id="<task_queue_id>" # Use "None" to move data rows to the "Done" bucket 
)

Modify data row priority

Once a batch has been added to a project, you can set the priority of its data rows. To do so, define a list of label parameter overrides (LPOs), which are tuples that set the priority for individual data rows.

Each override has three values: an object of the DataRow class or a DataRowIdentifier object, the new priority. All values must be integers that match the range of the list.

The priority is an integer between -2,147,483,648 to 2,147,483,647. The lowest value has the highest priority.

Override lists are limited to 1,000 items; larger lists trigger an error.

Once the override list is defined, pass it to project.set_labeling_parameter_overrides to change the priority of the corresponding data rows. Use project.labeling_parameter_overrides to get a list of data row priorities and project.update_data_row_labeling_priority to update existing data row priority.

Set data row priority

# Extract the global keys
export_params = {
    "data_row_details": True
}

export_task = project.export(params=export_params)

# Wait until the export task is complete
export_task.wait_till_done()

# Check for any errors in the export task
if export_task.has_errors():
  export_task.get_buffered_stream(
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

export_json = export_task.get_stream()

global_keys = [item["data_row"]["global_key"] for item in export_json]

# Add LPOs
lpos = []
priority=1

# With global keys 
 for global_key in global_keys: 
   lpos.append((lb.GlobalKey(global_key), priority))
   priority+=1

# With data row ids
# data_row_ids = ["clw7jlmav35yn0768xrpawwrc", "clw7jlmav35yo0768a5amfztu"]
# for dr_id in data_row_ids: 
#   lpos.append((lb.UniqueId(dr_id), priority))
#   priority+=1

# With data row objects
# data_rows = [data_row_1, data_row_2]
# for data_row in data_rows: 
#  lpos.append((data_row, priority)) 
#  priority+=1

# Set data row priorities
project.set_labeling_parameter_overrides(lpos)

# Check results
project_lpos = list(project.labeling_parameter_overrides())
for lpo in project_lpos:
  print(lpo)

Update data row priority

# Update LPOs

# With global keys
 global_keys = ["global_key1", "global_key2"]
 project.update_data_row_labeling_priority(data_rows=lb.GlobalKeys(global_keys), priority=1)

# With data row ids
# data_row_ids = ["clw7jlmav35yn0768xrpawwrc", "clw7jlmav35yo0768a5amfztu"]
# project.update_data_row_labeling_priority(data_rows=lb.UniqueIds(data_row_ids), priority=1)

# With data row objects
# data_rows = [data_row_1, data_row_2]
# project.update_data_row_labeling_priority(data_rows=data_rows, priority=1)


# Check results
project_lpos = list(project.labeling_parameter_overrides())

for lpo in project_lpos:
  print(lpo)

Add project tags

tags = project.update_project_resource_tags(["<project_tag_id>", "<project_tag_id>"])

Get project tags

tags = project.get_resource_tags()

The tags variable is a list where each element is an object of type ResourceTag with the attributes, uid, color(ex: "008856") andtext.

Manage the list of MAL imports

You obtain the list of import jobs with project.bulk_import_requests().

Using appropriate filters, you can also:

  • Get a specific MAL import
  • Delete a MAL import

This is useful if you remove the pre-labels created with MAL and use ground truth instead.

Note: Deleting an import job can't be undone.

# Retrieve the import jobs of a project
import_requests = project.bulk_import_requests()

# Retrieve a particular import request
job_id = "<job_id>"

import_request = [import_request for import_request 
 in project.bulk_import_requests()
 if import_request.uid == job_id][0]
 
# Delete an import and all associated pre annotations
# WARNING: this can't be undone
# import_request.delete()


Get the project overview

With project.get_overview(details)you can obtain some of the data from the Project Overview tab.

Output

The boolean parameter details will change the output to display the distribution of data rows between the queues.

When details is to false:

AttributeDescriptionName in the Overview tab
to_labelNumber of data rows that are yet to be labeledTo Label
in_reviewNumber of data rows to be reviewedIn Review
in_reworkNumber of data rows to be reworkedIn Rework
skippedNumber of skipped data rowsSkipped
doneNumber of data rows marked as DoneDone
issuesNumber of data rows with associated issuesIssues
labeledNumber of data rows with one or more labels-
total_data_rowsTotal number of data rows in the project-

When details is set to true, the output will be the same as before, except for the following:

AttributeDescription
in_reviewdata: List of task queues in review with the associated number of data rows
total: Number of data rows to be reviewed
in_reworkdata: List of task queues in rework with the associated number of data rows
total: Number of data rows to be reworked

Equivalences

The following are equal:

AttributeSum of attributes
overview.labeledoverview.in_review + overview.in_rework + overview.done
overview.total_data_rowsoverview.to_label + overview.in_review + overview.in_rework + overview.done

Example project overview

# Example without details
overview = project.get_overview()

# Selection of some attributes
print(f"""
      To label:\t{overview.to_label / overview.total_data_rows:.2%}
      Labeled:\t{overview.labeled / overview.total_data_rows:.2%}
     """)
# To label:	18.37%
# Labeled:	81.63%


# Example with details
detailed_overview = project.get_overview(details=True)

# Task queues in review
print(f"""
      Number of data rows {detailed_overview.total_data_rows},
      In review:,
      \tQueues {detailed_overview.in_review["data"]},
      \tNumber of data rows  {detailed_overview.in_review["total"]},
      In rework:
      \tQueues {detailed_overview.in_rework["data"]},
      \tNumber of data rows  {detailed_overview.in_rework["total"]}
      """,
      sep="\n")
# Number of data rows 23220,
# In review:,
# Queues [{'Initial review task': 7}],
# Number of data rows  7,
# In rework:
#  Queues [{'Rework (all rejected)': 1830}],
#  Number of data rows  1830

Export a project

For complete details, see Export overview.

# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:
def json_stream_handler(output: lb.BufferedJsonConverterOutput):
  print(output.json)

if export_task.has_errors():
  export_task.get_buffered_stream(
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_buffered_stream(
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True
}

# You can set the range for last_activity_at and label_created_at. You can also set a list of data 
# row ids to export. 
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.

# Note: Combinations of filters apply AND logic.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

Export issues and comments

import requests

url = project.export_issues()
issues = requests.get(url).json()

# optionally, you can export only the open or resolved issues
open_issues_url = project.export_issues(status="Open")
resolved_issues_url = project.export_issues(status="Resolved")

Duplicate a project

See the section Duplicate a project for the scope of this method.

cloned_project = project.clone()

Update a project

project.update(name="<new_project_name>")

Delete a project

❗️

Deleting a project cannot be undone

This method deletes the project along with all labels made in the project. This action cannot be reverted.

project.delete()

Attributes

Get the basics

# name (str)
project.name

# description (str)
project.description

# updated at (datetime)
project.updated_at

# created at (datetime)
project.created_at

# last activity time (datetime)
project.last_activity_time

# number of required labels per consensus data row (int) 
project.auto_audit_number_of_labels

# default percentage of consensus data rows per batch (float)
project.auto_audit_percentage

# created by (relationship to User object)
user = project.created_by()

# organization (relationship to Organization object)
organization = project.organization()

Get the ontology

# get the ontology connected to the project (relationship to Ontology object)
ontology = project.ontology()

Get the benchmarks

# get the benchmarks (relationship to Benchmark objects)
benchmarks = project.benchmarks()

# inspect one benchmark
next(benchmarks)

# inspect all benchmarks
for benchmark in benchmarks:
  print(benchmark)
  
# for ease of use, you can convert the paginated collection to a list
list(benchmarks)

Get the webhooks

# get the webhooks connected to the project (relationship to Webhook objects)
webhooks = project.webhooks()

# inspect one webhook
next(webhooks)

# inspect all webhooks
for webhook in webhooks:
  print(webhook)
  
# for ease of use, you can convert the paginated collection to a list
list(webhooks)

Get data row priority

Use project.labeling_parameter_overrides to get a list of labeling parameter overrides (LPOs), which define the priority for each label in the override list. Use set_labeling_parameter_overridesand update_data_row_labeling_priority to modify data row priority.

# gets the LPOs created in the project (relationship to LabelingParameterOverride objects)
lpos = project.labeling_parameter_overrides()

# inspect one LPO
next(lpos)

# inspect all LPOs
for lpo in lpos:
  print(lpo)

# for ease of use, you can convert the paginated collection to a list
list(lpos)

# Get the data row id 
for lpo in lpos:
  print(lpo)
  print("Data row:", lpo.data_row().uid)

Get the number of labels

Use project.get_label_count() to return the sum of labels in the different task queues of a project.

# Return the number of 
project.get_label_count()

Copy data rows and labels

To copy our data rows and labels to a different project from a source project, use the client.send_to_annotate_from_catalog method with our Labelbox client.

Send to Annotate does not currently support consensus projects.

Parameters

When you send data rows with labels to our destination project, you may choose to include or exclude certain parameters inside a Python dictionary, at a minimum, a source_project_id will need to be provided:

  • source_project_id
    • The id of the project where our data rows with labels will originate.
  • annotation_ontology_mapping
    • A dictionary containing the mapping of the source project's ontology feature schema IDs to the destination project's ontology feature schema IDs. If left empty, only the data rows with no labels will be sent to our destination project.
    • {"<source_feature_schema_id>" : "<destination_feature_schema_id>"}
  • exclude_data_rows_in_project
    • Excludes data rows that are already in the project.
  • override_existing_annotations_rule
    • The strategy defines how to handle conflicts in classifications between the data rows that already exist in the project and incoming labels from the source project.
      • Defaults to ConflictResolutionStrategy.KeepExisting
      • Options include:
        • ConflictResolutionStrategy.KeepExisting
        • ConflictResolutionStrategy.OverrideWithPredictions
        • ConflictResolutionStrategy.OverrideWithAnnotations
  • param batch_priority
    • The priority of the batch.
from labelbox.schema.conflict_resolution_strategy import ConflictResolutionStrategy

send_to_annotate_params = {
    "source_project_id": project.uid,
    "annotations_ontology_mapping": annotation_ontology_mapping, # to be defined
    "exclude_data_rows_in_project": False,
    "override_existing_annotations_rule": ConflictResolutionStrategy.OverrideWithPredictions,
    "batch_priority": 5,
}

# Get task id to workflow you want to send data rows. If sent to initial labeling queue, labels will be pre-labels. 
queue_id = [queue.uid for queue in destination_project.task_queues() if queue.queue_type == "MANUAL_REVIEW_QUEUE" ][0]

task = client.send_to_annotate_from_catalog(
    destination_project_id=destination_project.uid,
    task_queue_id=queue_id, # ID of workflow task, set ID to None if you want to send data rows with labels to the Done queue.
    batch_name="Prediction Import Demo Batch",
    data_rows=lb.GlobalKeys(
        global_keys # Provide a list of global keys from source project
    ),
    params=send_to_annotate_params
    )

task.wait_till_done()

print(f"Errors: {task.errors}")