A developer guide for creating and modifying projects via the Python SDK.
Client
import labelbox as lb
from labelbox.schema.quality_mode import QualityMode
client = lb.Client(api_key="<YOUR_API_KEY>")
Create a project
Argument | Description |
---|---|
media_type | The argument must take one of the following values: - lb.MediaType.Audio - lb.MediaType.Conversational - lb.MediaType.Dicom - lb.MediaType.Document - lb.MediaType.Geospatial_Tile - lb.MediaType.Html - lb.MediaType.Image - lb.MediaType.Json - lb.MediaType.Simple_Tile - lb.MediaType.Text - lb.MediaType.Video |
# Create a new project
project = client.create_project(
name="<project_name>",
description="<project_description>", # optional
media_type=lb.MediaType.Image # specify the media type
)
project = client.create_project(
name="benchmark-project",
description="description",
media_type=lb.MediaType.Image,
quality_mode=QualityMode.Benchmark
)
# create a project that uses consensus quality control
project = client.create_project(
name="consensus-project",
description="description",
media_type=lb.MediaType.Image,
quality_mode=QualityMode.Consensus
)
As of SDK v3.52.0, the auto_audit_number_oflabels
or auto_audit_percentage
can no longer be passed to create_project
. Instead, Instead, use quality_mode
to specify the desired project type.
Get a project
project = client.get_project("<project_id>")
# alternatively, you can get a dataset by name
project = client.get_projects(where=lb.Project.name == "<project_name>").get_one()
Methods
Create a batch
When creating a batch to send to a project, one of either global_keys
or data_rows
must be supplied as an argument. If using the data_rows
argument, you can supply either a list of data row IDs or a list of DataRow
class objects.
Optionally, you can supply a priority
, ranging from 1 (highest) to 5 (lowest), for which the batch should be labeled. This will determine the order in which the included data rows appear in the labeling queue compared to other batches. If no value is provided, the batch will assume the lowest priority.
For more details, see Batch.
project.create_batch(
name="<unique_batch_name>",
global_keys=["key1", "key2", "key3"],
priority=5,
)
# if the project uses consensus, you can optionally supply a dictionary with consensus settings
# if provided, the batch will use consensus with the specificed coverage and votes
project.create_batch(
name="<unique_batch_name>",
data_rows=["<data_row_id>", "<data_row_id>"],
priority=1,
consensus_settings={"number_of_labels": 3, "coverage_percentage": 0.1}
)
Create multiple batches
The project.create_batches()
method accepts up to 1 million data rows. Batches are chunked into groups of 100k data rows (if necessary), which is the maximum batch size.
This method takes in a list of either data row IDs or DataRow
objects into a data_rows
argument or global keys into a global_keys
argument, but both approaches cannot be used in the same method. Batches will be created with the specified name_prefix
argument and a unique suffix to ensure unique batch names. The suffix will be a 4-digit number starting at 0000
.
For example, if the name prefix is demo-create-batches-
and three batches are created, the names will be demo-create-batches-0000
, demo-create-batches-0001
, and demo-create-batches-0002
. This method will throw an error if a batch with the same name already exists.
task = project.create_batches(
name_prefix="demo-create-batches-",
global_keys=global_keys,
priority=5
)
print("Errors: ", task.errors())
print("Result: ", task.result())
Create batches from a dataset
If you wish to create batches in a project using all the data rows of a dataset, instead of gathering global keys or IDs and iterating over subsets of data rows, you can use the project.create_batches_from_dataset()
method.
This method takes in a dataset ID and creates a batch (or batches if there are more than 100k data rows) comprised of all data rows not already in the project. The same logic applies to the name_prefix
argument and the naming of batches as described in the section immediately above.
dataset = client.get_dataset("<dataset_id>")
task = project.create_batches_from_dataset(
name_prefix="demo-dataset-",
dataset_id=dataset.uid,
priority=5
)
print("Errors: ", task.errors())
print("Result: ", task.result())
Data rows are added to datasets asynchronously and may require processing time. This means that data rows may require may not be available immediately after being added to a dataset.
When adding data rows to a dataset, you can wait for the process to complete.
# upload data
task = dataset.create_data_rows(...)
# optionally, wait for data to be processed
task.wait_till_done()
Get the batches
# get the batches (objects of the Batch class)
batches = project.batches()
# inspect one batch
next(batches)
# inspect all batches
for batch in batches:
print(batch)
# for ease of use, you can convert the paginated collection to a list
list(batches)
Connect an ontology
# the argument must be an object of the Ontology class
project.setup_editor(ontology)
Get the members and their roles
The scope of a member is provided by the attribute access_from
from the class ProjectMember
.
It can have one of the following values:
- ORGANIZATION: project membership is derived from the organization role
- PROJECT_MEMBERSHIP: access is given specifically to the project
- USER_GROUP: access is given via a group
# get the members (objects of the ProjectMember class with relationships to a User and Role)
members = project.members()
# inspect one member
member = next(members)
print(member.user(), member.role(), member.access_from)
# Display member info:
print(member.user().uid, member.user().email, member.role().name, access_from, sep="\t")
# inspect all members
for member in members:
print(member.user(), member.role(), access_from)
# for ease of use, you can convert the paginated collection to a list
list(members)
Upload labeling instructions
Note that if the ontology connected to your project is connected to other projects, calling this method will attach the instructions to those projects as well.
# must be a PDF or HTML file
project.upsert_instructions("<local_file_path>")
Get the workflow tasks
# get the task queues (relationship to TaskQueue objects)
task_queues = project.task_queues()
# inspect all task queues
for task_queue in task_queues:
print(task_queue)
Move data rows to a workflow task
project.move_data_rows_to_task_queue(
data_row_ids=["<data_row_id>", "<data_row_id>"],
task_queue_id="<task_queue_id>"
)
Set data row priority
Once a batch has been added to a project, you can set the priority of its data rows. To do so, define a list of label parameter overrides (LPOs), which are tuples that set the priority for individual data rows.
Each override has three values: an object of the DataRow
class, the new priority, and the number of labels affected. All values must be integers that match the range of the list.
Override lists are limited to 1,000 items; larger lists trigger an error.
Once the override list is defined, pass it to project.set_labeling_parameter_overrides
to change the priority of the corresponding data rows. Use project.labeling_parameter_overrides
to get a list of data row priorities.
Data row priority can be set only through the Python SDK, as shown here:
# get some data rows queued in the project
batch = project.batches().get_one()
data_rows = list(batch.export_data_rows())
# set the LPOs
# each LPO must be a tuple containing (DataRow, priority<int>, number_of_labels<int>)
lpos = [(data_rows[0], 1, 1), (data_rows[1], 2, 1), (data_rows[2], 3, 1)]
project.set_labeling_parameter_overrides(lpos)
# check results
project_lpos = list(project.labeling_parameter_overrides())
for lpo in project_lpos:
print(lpo)
This example sets the priority of the first three items in the batch data rows and then displays the results.
Add project tags
tags = project.update_project_resource_tags(["<project_tag_id>", "<project_tag_id>"])
Export the labels
For complete details, see Export overview.
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"performance_details": True
}
# You can set the range for last_activity_at and label_created_at.
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments, and reviews.
# Note: This is an AND logic between the filters, so usually using one filter is sufficient.
filters= {
"last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}
export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
Export the queued data rows
Note that for consensus or benchmark data rows, if the data row has been labeled once, it will not be included in the export of queued data rows.
# export results of this method are cached for 30 minutes
data_rows = project.export_queued_data_rows()
# optionally, you can include metadata in the export
data_rows = project.export_queued_data_rows(include_metadata=True)
Export the issues and comments
import requests
url = project.export_issues()
issues = requests.get(url).json()
# optionally, you can export only the open or resolved issues
open_issues_url = project.export_issues(status="Open")
resolved_issues_url = project.export_issues(status="Resolved")
Update a project
project.update(name="<new_project_name>")
Delete a project
Deleting a project cannot be undone
This method deletes the project along with all labels made in the project. This action cannot be reverted without the assistance of Labelbox support.
project.delete()
Attributes
Get the basics
# name (str)
project.name
# description (str)
project.description
# updated at (datetime)
project.updated_at
# created at (datetime)
project.created_at
# last activity time (datetime)
project.last_activity_time
# number of required labels per consensus data row (int)
project.auto_audit_number_of_labels
# default percentage of consensus data rows per batch (float)
project.auto_audit_percentage
# created by (relationship to User object)
user = project.created_by()
# organization (relationship to Organization object)
organization = project.organization()
As of SDK v3.52.0, the auto_audit_number_of_labels
and auto_audit_percentage
attributes can no longer be passed to the create_project
To learn more, see Create a project.
Get the ontology
# get the ontology connected to the project (relationship to Ontology object)
ontology = project.ontology()
Get the benchmarks
# get the benchmarks (relationship to Benchmark objects)
benchmarks = project.benchmarks()
# inspect one benchmark
next(benchmarks)
# inspect all benchmarks
for benchmark in benchmarks:
print(benchmark)
# for ease of use, you can convert the paginated collection to a list
list(benchmarks)
Get the webhooks
# get the webhooks connected to the project (relationship to Webhook objects)
webhooks = project.webhooks()
# inspect one webhook
next(webhooks)
# inspect all webhooks
for webhook in webhooks:
print(webhook)
# for ease of use, you can convert the paginated collection to a list
list(webhooks)
Get data row priority
Use project.labeling_parameter_overrides
to get a list of labeling parameter overrides (LPOs), which define the priority for each label in the override list. Use set_labeling_parameter_overrides
to set data row priority.
# gets the LPOs created in the project (relationship to LabelingParameterOverride objects)
lpos = project.labeling_parameter_overrides()
# inspect one LPO
next(lpos)
# inspect all LPOs
for lpo in lpos:
print(lpo)
# for ease of use, you can convert the paginated collection to a list
list(lpos)