Project

A developer guide for creating and modifying projects via the Python SDK.

Client

import labelbox as lb
client = lb.Client(api_key="<YOUR_API_KEY>")

Create a project

ArgumentDescription
media_typeThe argument must take one of the following values:
- lb.MediaType.Audio
- lb.MediaType.Conversational
- lb.MediaType.Dicom
- lb.MediaType.Document
- lb.MediaType.Geospatial_Tile
- lb.MediaType.Html
- lb.MediaType.Image
- lb.MediaType.Json
- lb.MediaType.Simple_Tile
- lb.MediaType.Text
- lb.MediaType.Video
# Create a new project
project = client.create_project(
    name="<project_name>",
    description="<project_description>",    # optional
    media_type=lb.MediaType.Image           # specify the media type
)
project = client.create_project(
  name="benchmark-project",
  description="description",
  media_type=lb.MediaType.Image,
  # setting the two arguments below to 1 will make the project use benchmark quality control
  # if these two arguments are not specified, the project will default to benchmark mode regardless
  auto_audit_percentage=1,
  auto_audit_number_of_labels=1
)
# create a project that uses consensus quality control
project = client.create_project(
  name="consensus-project",
  description="description",
  media_type=lb.MediaType.Image,
  # if the below two arguments are not specified when creating a batch, these defaults will be used
  # in this example, 10% of the data rows in each batch must be labeled 3 times
  auto_audit_percentage=0.1,
  auto_audit_number_of_labels=3
)

Get a project

project = client.get_project("<project_id>")

# alternatively, you can get a dataset by name 
project = client.get_projects(where=lb.Project.name == "<project_name>").get_one()

Methods

Create a batch

When creating a batch to send to a project, one of either global_keys or data_rows must be supplied as an argument. If using the data_rows argument, you can supply either a list of data row IDs or a list of DataRow class objects.

Optionally, you can supply a priority, ranging from 1 (highest) to 5 (lowest), for which the batch should be labeled. This will determine the order in which the included data rows appear in the labeling queue compared to other batches. If no value is provided, the batch will assume the lowest priority.

For more details, see Batch.

project.create_batch(
  name="<unique_batch_name>",
  global_keys=["key1", "key2", "key3"],
  priority=5,
)

# if the project uses consensus, you can optionally supply a dictionary with consensus settings
# if provided, the batch will use consensus with the specificed coverage and votes
project.create_batch(
  name="<unique_batch_name>",
  data_rows=["<data_row_id>", "<data_row_id>"],
  priority=1,
  consensus_settings={"number_of_labels": 3, "coverage_percentage": 0.1}
)

Get the batches

# get the batches (objects of the Batch class)
batches = project.batches()

# inspect one batch
next(batches)

# inspect all batches
for batch in batches:
  print(batch)
    
# for ease of use, you can convert the paginated collection to a list
list(batches)

Connect an ontology

# the argument must be an object of the Ontology class
project.setup_editor(ontology)

Get the members and their roles

# get the members (objects of the ProjectMember class with relationships to a User and Role)  
members = project.members()

# inspect one member
member = next(members)
print(member.user(), member.role())

# inspect all members
for member in members:
  print(member.user(), member.role())
  
# for ease of use, you can convert the paginated collection to a list
list(members)

Upload labeling instructions

Note that if the ontology connected to your project is connected to other projects, calling this method will attach the instructions to those projects as well.

# must be a PDF or HTML file
project.upsert_instructions("<local_file_path>")

Get the workflow tasks

# get the task queues (relationship to TaskQueue objects)
task_queues = project.task_queues()

# inspect all task queues
for task_queue in task_queues:
  print(task_queue)

Move data rows to a workflow task

project.move_data_rows_to_task_queue(
  data_row_ids=["<data_row_id>", "<data_row_id>"],
  task_queue_id="<task_queue_id>"
)

Further specify data row priority

After data rows have been added to a project in a batch, the priority can be further modified. The priority here can take all integer values. Currently, the modification of priorities is only supported via the Python SDK.

# get some data rows queued in the project
batch = project.batches().get_one()
data_rows = list(batch.export_data_rows())

# set the LPOs
# each LPO must be a tiple containing (DataRow, priority<int>, number_of_labels<int>)
lpos = [(data_rows[0], 1, 1), (data_rows[1], 2, 1), (data_rows[2], 3, 1)]
project.set_labeling_parameter_overrides(lpos)

# check results
project_lpos = list(project.labeling_parameter_overrides())
for lpo in project_lpos:
  print(lpo)

Add project tags

tags = project.update_project_resource_tags(["<project_tag_id>", "<project_tag_id>"])

Export the labels

For complete details, see Export overview.

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "performance_details": True
}

# You can set the range for last_activity_at and label_created_at. 
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments, and reviews.
# Note: This is an AND logic between the filters, so usually using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)

Export the queued data rows

Note that for consensus or benchmark data rows, if the data row has been labeled once, it will not be included in the export of queued data rows.

# export results of this method are cached for 30 minutes
data_rows = project.export_queued_data_rows()

# optionally, you can include metadata in the export
data_rows = project.export_queued_data_rows(include_metadata=True)

Export the issues and comments

import requests

url = project.export_issues()
issues = requests.get(url).json()

# optionally, you can export only the open or resolved issues
open_issues_url = project.export_issues(status="Open")
resolved_issues_url = project.export_issues(status="Resolved")

Update a project

project.update(name="<new_project_name>")

Delete a project

❗️

Deleting a project cannot be undone

This method deletes the project along with all labels made in the project. This action cannot be reverted without the assistance of Labelbox support.

project.delete()

Attributes

Get the basics

# name (str)
project.name

# description (str)
project.description

# updated at (datetime)
project.updated_at

# created at (datetime)
project.created_at

# last activity time (datetime)
project.last_activity_time

# number of required labels per consensus data row (int)
project.auto_audit_number_of_labels

# default percentage of consensus data rows per batch (float)
project.auto_audit_percentage

# created by (relationship to User object)
user = project.created_by()

# organization (relationship to Organization object)
organization = project.organization()

Get the ontology

# get the ontology connected to the project (relationship to Ontology object)
ontology = project.ontology()

Get the benchmarks

# get the benchmarks (relationship to Benchmark objects)
benchmarks = project.benchmarks()

# inspect one benchmark
next(benchmarks)

# inspect all benchmarks
for benchmark in benchmarks:
  print(benchmark)
  
# for ease of use, you can convert the paginated collection to a list
list(benchmarks)

Get the webhooks

# get the webhooks connected to the project (relationship to Webhook objects)
webhooks = project.webhooks()

# inspect one webhook
next(webhooks)

# inspect all webhooks
for webhook in webhooks:
  print(webhook)
  
# for ease of use, you can convert the paginated collection to a list
list(webhooks)

Get the LPOs

# gets the LPOs created in the project (relationship to LabelingParameterOverride objects)
lpos = project.labeling_parameter_overrides()

# inspect one LPO
next(lpos)

# inspect all LPOs
for lpo in lpos:
  print(lpo)

# for ease of use, you can convert the paginated collection to a list
list(lpos)