Labelbox Python SDK

How to use these docs

These docs will help you get started with the Python SDK. The Python SDK was created for customers who wish to integrate programmatically with Labelbox with an extensive set of custom methods.

The Python SDK docs are organized into the following sections:

  • Introduction: Contains instructions for creating your API key, installation, authentication, and API rate limits.

  • Fundamental concepts: Explains pagination, object relationships, object fields, and field caching.

  • Tutorials: Contains full end-to-end exercises for the most common workflows with Labelbox. Includes tutorials on creating a sample project, importing annotations, attaching metadata, accessing video labels, and more.

  • API reference: A comprehensive list of all methods available in the Python SDK along with sample code blocks, descriptions for attributes, and explanations for error messages.

Installation & authentication

The Python SDK allows you to access all of the functionalities of the Labelbox API without having to use GraphQL.

Requirements

  • Python version 3.6 or 3.7.

  • Make sure pip is installed.

  • Create your API key in the Account section.

  • Save your API key somewhere as it will be hidden afterward.

Install/upgrade

To install, run pip install labelbox in your command line.

To upgrade, consult the changelog and run pip install --upgrade labelbox. See the Python SDK repo in Github.

Authentication

There are 3 ways to set up authentication with Labelbox.

Option 1

Pass your API key as an environment variable in the command line.

Then, import and initialize the API Client.

user@machine:~$ export LABELBOX_API_KEY="<your_api_key>"
user@machine:~$ python3
from labelbox import Client
client = Client()
Option 2

Pass a custom endpoint. This is only applicable for on-premises use cases. If this applies to you, you may pass the API key and server endpoint explicitly when you initialize the Client object. Otherwise, refer to Option 1.

from labelbox import Client
client = Client("<your_api_key_here>", "https://app.your-domain.com/api/graphql")
Option 3

Run this Python script and pass your API key as a string.

from labelbox import Client

if __name__ == '__main__':
    API_KEY = "<your_api_key_here>"
    client = Client(API_KEY)

Rate limits

Labelbox uses rate limits as a safeguard against bursts of incoming requests to maximize stability for all of our customers.

If you hit the rate limit, you’ll get a 429 HTTP status code and a retry-after response header that will contain a value for the number of seconds to wait before making a follow-up request.

We recommend building a retry mechanism that uses exponential backoff to space out your requests once you reach the quota.

If you are consistently hitting your rate limit, contact our support team.

Fundamental concepts

Fields (fetch/update)

Each data class has a set of fields. This document explains how to access or update one or more fields on an object as well as how field caching works.

Fetch

Object fields must be accessed as attributes (e.g. project.name). Unlike relationships, fields are cached.

In this example, client.get_projects() returns a PaginatedCollection object. You can iterate through the PaginatedCollection object and specify the fields you want to be returned in the fetch. To fetch multiple fields from an object, you need to separate them by commas.

projects = client.get_projects()
for project in projects:
    print(project.name, project.uid)

Update

To update a field, use update() to pass the field and the new value. The following data types support the update method: Project, Dataset, DataRow, Label, Webhook, and Review.

Each data update using object.update() on the client-side immediately performs the same update on the server-side. If the client-side update does not raise an exception, you can assume that the update successfully passed on the server-side.

This example uses project.update() to change a project's name and description.

project = client.get_project("<project_id>")
project.update(name="Project Name", description="New description")

Field caching

When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip fetch to the server to get the field value you have already fetched. Server-side updates that happen after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.

Pagination

Some calls to the API return a very large number of results. To prevent too many results from being returned at once, the Labelbox API limits the number of returned objects and the Python SDK automatically creates a PaginatedCollection instead.

When a PaginatedCollection object is created, nothing is actually fetched from the server. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing).

  1. For both the top-level object fetch, client.get_projects(), and the relationship call, project.datasets(), a PaginatedCollection object is returned. This PaginatedCollection object takes care of the paginated fetching.

  2. Note that nothing is fetched immediately when the PaginatedCollection object is created.

  3. Round-trips to the server are made only as you iterate through a PaginatedCollection. In the code above that happens when a list is initialized with a PaginatedCollection, and when a PaginatedCollection is iterated over in a for-loop.

  4. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing). You can only iterate over it.

projects = client.get_projects()
type(projects) # Returns a PaginatedCollection

projects = list(projects)
type(projects) # List

project = projects[0]
datasets = project.datasets()
type(datasets) # Returns a PaginatedCollection
for dataset in datasets:
    dataset.name

Iterate over PaginatedCollection

Be careful about converting a PaginatedCollection into a list. This will cause all objects in that collection to be fetched from the server. In cases when you need only some objects (let's say the first 10 objects), it is much faster to iterate over the PaginatedCollection and simply stop once you're done.

This sample script demonstrates how to do this.

data_rows = dataset.data_rows()
first_ten = []
for data_row in data_rows:
    first_ten.append(data_row)
    if len(first_ten) >= 10:
        break

Relationships (fetch/add/update)

If you need to connect two data classes, you can do so via relationships. This document gives you some examples of how to get/add/update relationships between different data classes.

Fetch

Relationships are fetched every time you call them. They are not cached. This is made explicit by defining relationships as callable methods on objects (e.g. project.datasets()).

To get all datasets for a project, define project and call datasets() as a method on project to access the datasets related to that project.

If you will only be modifying your data during small timeframes, then keep references to related objects.

project = client.get_project(“<project_uid>”)
datasets = project.datasets()

Create

To create a relationship between two objects, call the connect method directly from the relationship.

Example 1: To connect dataset_1 to a project, define project then call connect as a method.

Example 2: To select the Editor as your labeling frontend for a project, select the project then call connect as a method.

#Example 1
project = client.get_project(“<project_uid>”)
project.datasets.connect(dataset_1) # Connect dataset_1 to project
#Example 2
project = client.create_project(name=“<project_name>”) 
all_frontends = list(client.get_labeling_frontends())
    for frontend in all_frontends:
    if frontend.name == 'Editor':
        project_frontend = frontend
        break    project.labeling_frontend.connect(project_frontend)

Update

To update a relationship, use the disconnect() and then connect().

Caution

Note: update() does not work for updating relationships.

project.datasets.disconnect(dataset_1)
project.datasets.connect(dataset_2)

Tutorials

Project setup script

Below is an end-to-end python script that includes most of the common actions customers do when configuring Labelbox programmatically with the Python SDK.

from labelbox import Client, Project, schema
import json
import re

def Copy_Ontology(project_name):
    project_id = ""
    try:
        project = next(iter(client.get_projects(where=Project.name == project_name)))
        project_id = project.uid
    except:
        print("Couldn't find project with name " + project_name)
        exit()

    # Use uid, to Query the Ontology for project
    res_str = client.execute("""
            query get_ontology($proj_id: ID!) {
                project(where: {id: $proj_id}) {
                    ontology {
                        normalized
                    }
                }
            }
        """, {"proj_id": project_id})
    
    # Make small adjustments to format
    ontology = res_str['project']['ontology']['normalized']
    ontology = re.sub(r'(\'featureSchemaId\'|\'schemaNodeId\'): \'\w*\',', '', str(
ontology))
    ontology = ontology.replace("False", "false") 
    
    # Python dict -> JSON format (or use json.dumps())
    ontology = ontology.replace("\'", "\"")

    return ontology

def CreateProject(ontology, project_name):
    project = client.create_project(name=project_name)

    frontends = client.get_labeling_frontends()
    #   ImageEditor : 1 STRONGLY RECOMMENDED    
    #   LegacyEditor: 0    
    project.labeling_frontend.connect(list(frontends)[1])
    project.setup(list(frontends)[1], ontology)
    return project

# View all ways to create data rows in API documentation
# https://labelbox.com/docs/python-api/data-rows
def Create_Dataset_From_Remote_Images(project, dataset_name, images):
    dataset = client.create_dataset(name=dataset_name, projects=project)
    mapped_images = list(map(lambda image: { schema.data_row.DataRow.row_data:
image.url,schema.data_row.DataRow.external_id: image.external_id }, images))
    dataset.create_data_rows(mapped_images)
    return dataset

# Metadata can be video, text or image
def Asset_Text_Metadata(dataset, external_id, text):
    dataset = client.get_dataset(dataset.uid)
    data_row = dataset.data_row_for_external_id(external_id)
    data_row.create_metadata("TEXT", text)  

def Get_Labels(project_id):
    # Query for Label Information
    # - Limit of Labels to Query for is 100, a skip function can also be used
    #    - labels(first:5, skip: 10)
    #    - Will grab Labels 10-15
    res_str = client.execute("""
    query APIGetPageOfLabels ($Project_ID : ID!){
         project(where:{id: $Project_ID}) {
             labels(first: 5){
                 id
                 label
                 createdBy{
                     id
                     email }
                 type {
                     id
                     name }
                 secondsToLabel
                 agreement
                 dataRow {
                     id
                     rowData
                 }
             }
    }
    }
    """,{"Project_ID": project_id})
    #   Print Label infromation
    print(res_str['project']['labels'])
    
    info = [(label['id'], label['createdBy']['email'], label['secondsToLabel']) for
label in res_str['project']['labels']]
    print(info)

def Export_Project(project):
    url = project.export_labels()
    print(url)

def Get_Example_Ontology():
    return """
        {
            "tools": [
            {
'                "required": false,
                "name": "Test Polygon ",
                "color": "#FF0000",
                "tool": "polygon",
                "classifications": [
                {
                    "name": "nested_classification",
                    "instructions": "Nested Classification",
                    "type": "radio",
                    "options": [
                    {
                        "value": "option_1",
                        "label": "Option 1"
                    },
                    {
                        "value": "option_2_",
                        "label": "Option 2 "
                    }
                    ],
                    "required": false
                }
                ]
            }
            ],
            "classifications": [
            {
                "name": "test_classification",
                "instructions": "Test Classification",
                "type": "radio",
                "options": [
                {
                    "value": "yes",
                    "label": "Yes"
                },
                {
                    "value": "no",
                    "label": "No"
                }
                ],
                "required": false
            }
            ]
        }
    """

class External_Image:
    def __init__(self, url, external_id = None):
         self.url = url
        self.external_id = external_id

if __name__ == '__main__':
    client = Client()
    
    ### CHANGE THESE VARIABLES
    test_project_name = "Test Project"
    test_dataset_name = "Astronomy Dataset"
    test_images = [
        External_Image("https://www.universetoday.com/wp-content/uploads/2009/
09/bluemarble-e1452178366615.jpg","earth_image"),
        External_Image("http://www.astronomytrek.com/wp-content/uploads/2017/
09/Crab-Nebula.jpg", "crab_nebula")
    ]
    test_annotation_label = "earth_image"
    test_annotation_text = "A photo of Earth"


    # Choose either copying ontology from project or using example
    # Comment out whichever isn't used
    # ontology = Copy_Ontology("Explore the example project")
    ontology = Get_Example_Ontology()

    project = CreateProject(ontology, test_project_name)
    dataset = Create_Dataset_From_Remote_Images(project, test_dataset_name,
test_images)
    Asset_Text_Metadata(dataset, test_annotation_label, test_annotation_text)
    
    input("Create label(s) and then press enter")
    Get_Labels(project.uid)
    Export_Project(project)

Model-assisted labeling Python script

This script showcases the basic functionality of the Model-assisted labeling workflow. For an overview of the MAL workflow, see Model-assisted labeling.

This script is broken into two parts. Note: you will need an API key to run this script.

  1. Create a project, dataset, ontology, and select a labeling frontend.

  2. Turn on MAL, get annotation schemas, and create annotations on the data row.

from labelbox import Client
from labelbox import Project
from labelbox import Dataset
import json
import os
from labelbox.schema.bulk_import_request import BulkImportRequest
from labelbox.schema.enums import BulkImportRequestState
import requests
import ndjson

API_KEY = "<API KEY>"
#IMPORT_NAME must be unique per project
IMPORT_NAME = "<IMPORT NAME>"

def get_project_ontology(project_id: str) -> dict:
    """
    Gets the ontology of the given project

    Args:
        project_id (str): The id of the project
    Returns:
        The ontology of the project in a dict format
    """
    res_str = client.execute("""
                    query get_ontology($proj_id: ID!) {
                        project(where: {id: $proj_id}) {
                            ontology {
                                normalized
                            }
                        }
                    }
                """, {"proj_id": project_id})
    return res_str

def turn_on_model_assisted_labeling(client: Client, project_id: str) -> None:
    """
    Turns model assisted labeling on for the given project

    Args:
        client (Client): The client that is connected via API key
        project_id (str): The id of the project
    Returns:
        None

    """
    client.execute("""
         mutation TurnPredictionsOn($proj_id: ID!){
             project(
                 where: {id: $proj_id}
             ){
                 showPredictionsToLabelers(show:true){
                     id
                     showingPredictionsToLabelers
                 }
             }
         }
     """, {"proj_id": project_id})

def get_schema_ids(ontology: dict) -> dict:
    """
    Gets the schema id's of each tool given an ontology
    
    Args:
        ontology (dict): The ontology that we are looking to parse the schema id's from
    Returns:
        A dict containing the tool name and the schema information
    """
    schemas = {}
    for tool in ontology['tools']:
        schema = {
            'schemaNodeId': tool['featureSchemaId'],
            'color': tool['color'],
            'tooltype':tool['tool']
                    }
        schemas[tool['name']] = schema
    return schemas

"""
PART ONE.

This will do the following:
    1. Create a new project named "New MAL Project"
    2. Create a new dataset named "New MAL Dataset" and attach it to the project
    3. Find an updated Editor frontend and attach it to the project
    4. Create a new ontology and attach it to the project
"""

client = Client(API_KEY)
new_project = client.create_project(name="New MAL Project")

new_dataset = client.create_dataset(name="New MAL Dataset", projects = new_project)
new_dataset.create_data_row(row_data="https://storage.googleapis.com/labelbox-sample-datasets/sample-mapillary/lb-segment-data_validation_images_--BJs76vloEaiH-wppzWNA.jpg")

all_frontends = list(client.get_labeling_frontends())
for frontend in all_frontends:
    if frontend.name == 'Editor':
        new_project_frontend = frontend
        break

new_project.labeling_frontend.connect(new_project_frontend)
new_project_ontology = "{\"tools\": [{  \"required\": false, \"name\": \"polygon tool\", \"tool\": \"polygon\", \"color\": \"navy\", \"classifications\": []}, {  \"required\": false, \"name\": \"segmentation tool\", \"tool\": \"superpixel\", \"color\": \"#1CE6FF\", \"classifications\": []}, {  \"required\": false, \"name\": \"point tool\", \"tool\": \"point\", \"color\": \"#FF4A46\", \"classifications\": []}, {  \"required\": false, \"name\": \"bbox tool\", \"tool\": \"rectangle\", \"color\": \"#008941\", \"classifications\": []}, {  \"required\": false, \"name\": \"polyline tool\", \"tool\": \"line\", \"color\": \"#006FA6\", \"classifications\": []}], \"classifications\": [{  \"required\": false, \"instructions\": \"Are there classification options?\", \"name\": \"classification options\", \"type\": \"radio\", \"options\": [{  \"label\": \"Yes\", \"value\": \"yes\"}, {  \"label\": \"Definitely\", \"value\": \"definitely\"}, {  \"label\": \"Third one?!\", \"value\": \"third one?!\"}]}]}"
new_project.setup(new_project_frontend, new_project_ontology)

new_project.datasets.connect(new_dataset)

print(f"The project id is: {new_project.uid}")
print(f"The dataset id is: {new_dataset.uid}")


"""
PART TWO.

This will do the following:
    1. Turn on model assisted labeling for the project
    2. Query for the existing ontology
    3. Get the schemas from the queried ontology
    4. Get the datarow that we want to annotate on
    5. Create a list of annotations for each tool
    6. Upload the annotations
    7. Provide errors, if any

Note: importing annotations is not immediate and can take a few minutes.
     If you would like to track while it is importing, include the following lines:

import logging
logging.basicConfig(level = logging.INFO)
"""
client = Client(API_KEY)

project_for_mal = client.get_project(new_project.uid)
dataset_for_mal = client.get_dataset(new_dataset.uid)
turn_on_model_assisted_labeling(client = client, project_id = project_for_mal.uid)

ontology = get_project_ontology(project_for_mal.uid)['project']['ontology']['normalized']

schemas = get_schema_ids(ontology)

datarow_id = list(dataset_for_mal.data_rows())[0].uid

annotations = [
    {
         "uuid": "d6fc18e4-13ed-11eb-8e85-acde48001122",
         "schemaId": schemas['polygon tool']['schemaNodeId'],
         "dataRow": {"id": datarow_id},
         "polygon": [
             {"x": 132.536, "y": 73.217},
             {"x": 177.494, "y": 69.363},
             {"x": 243.004, "y": 93.769},
             {"x": 198.046, "y": 208.09},
             {"x": 105.562, "y": 140.011}
         ]
    },
    {
         "uuid": "d6fc1a88-13ed-11eb-8e85-acde48001122",
         "schemaId": schemas['segmentation tool']['schemaNodeId'],
         "dataRow": {"id": datarow_id},
         "mask": {    
             "instanceURI": "https://storage.googleapis.com/labelbox-sample-datasets/sample-mapillary/lb-segment-data_validation_labels_--BJs76vloEaiH-wppzWNA_mask.png",                 "colorRGB": [255, 255, 255]
         }
    },
    {
         "uuid": "d6fc1ac4-13ed-11eb-8e85-acde48001122",
         "schemaId": schemas['point tool']['schemaNodeId'],
         "dataRow": {"id": datarow_id},
         "point": {"x": 176, "y": 128}
    },
    {
         "uuid": "d6fc1aec-13ed-11eb-8e85-acde48001122",
         "schemaId": schemas['bbox tool']['schemaNodeId'],
         "dataRow": {"id": datarow_id},
         "bbox": {
             "top": 48,
             "left": 58,
             "height": 213,
             "width": 215
         }
    },
    {
         "uuid": "d6fc1b0a-13ed-11eb-8e85-acde48001122",
         "schemaId": schemas['polyline tool']['schemaNodeId'],
         "dataRow": {"id": datarow_id},
         "line": [
             {"x": 163.364, "y": 21.837},
             {"x": 269.978, "y": 59.087},
             {"x": 205.753, "y": 146.433},
             {"x": 225.021, "y": 175.977},
             {"x": 149.235, "y": 240.202},
             {"x": 85.01, "y": 169.554},
             {"x": 123.545, "y": 104.045},
             {"x": 82.441, "y": 74.501},
             {"x": 120.976, "y": 21.837}
        ]
    }
]

project_for_mal.upload_annotations(annotations = annotations, name = IMPORT_NAME)
upload_job = BulkImportRequest.from_name(client, project_id = project_for_mal.uid, name = IMPORT_NAME)
upload_job.wait_until_done()

print(f"The annotation import is: {upload_job.state}")

if upload_job.error_file_url:
    res = requests.get(upload_job.error_file_url)
    errors = ndjson.loads(res.text)
    print("\nErrors:")
    for error in errors:
         print(
             "An annotation failed to import for "
             f"datarow: {error['dataRow']} due to: "
             f"{error['errors']}")

Bulk add Data Rows

Dataset.create_data_rows() accepts a list of items and is an asynchronous bulk operation. This is typically faster and avoids API limit issues. Unless you specify otherwise, your code will continue before the Data Rows are fully created on the server-side.

Paths to local files must be passed as a string. To pass an external URL, use DataRow.row_data.

Field

Type

Description

external_id

String

User-generated file name or identifier.

row_data

String

Should only be used when passing a path to an external URL. Each external URL entry must be formatted as a dict.

You can (but don't have to) use task.wait_till_done() wait for the bulk creation to complete before continuing with other tasks.

dataset = client.get_dataset("<dataset_uid>")
task = dataset.create_data_rows([{labelbox.schema.data_row.DataRow.row_data:"http://my_site.com/photos/img_01.jpg"},{labelbox.schema.data_row.DataRow.row_data:"http://my_site.com/photos/img_02.jpg"},"path/to/file3.jpg"])

Add Data Rows via JSON

When you import data via JSON using the Python SDK, you should include the following information in your JSON file. At a minimum, the JSON file must include row_data.

Field

Type

Description

external_id

String

User-generated file name or identifier.

row_data

String

Should only be used when passing a path to an external URL. Each external URL entry must be formatted as a dict.

Use the create_data_rows method to import Data Rows from your JSON file. This will return a task ID.

[
    {
        "row_data": "<IMG_URL>",
        "external_id": "12345"
    },
    {
        "row_data": "<IMG_URL_2>"
    },
    {
        "row_data": "<path/to/file.jpg>"
    }
]
dataset = client.get_dataset("<dataset_uid>")
with open("file.json") as f: dataset.create_data_rows(json.load(f))

Import annotations

There are three ways to import annotations using the instance method, upload_annotations.

For images, the Model-assisted labeling workflow supports all annotation types, except Dropdown classification and classifications nested within classifications.

For videos, the Model-assisted workflow only supports classifications at the frame-level.

For text, the Model-assisted workflow supports Named entity recognition and text classification.

If you are importing more than 1,000 mask annotations at a time, consider submitting separate jobs, as they can take longer than other annotation types to import.

Caution

Wait until the import job is complete before opening the Editor to make sure all annotations are imported properly.

URL to NDJSON file

This sample script uses the upload_annotations method to pass a URL pointing to the NDJSON file containing the annotations (URL can be public or signed).

If you are uploading a public URL to an NDJSON file, check that the host of the public URL allows standard browsers to download by doing the following:

  1. Navigate to your URL using any browser. It should return the expected NDJSON.

  2. Run wget -O- --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36  (KHTML, like Gecko) Chrome/81.0.4044.138  Safari/537.36' <url> | cat. It should return the expected NDJSON.

upload_job = project.upload_annotations(
    name="upload_annotation_job_1",
    annotations="https://storage.googleapis.com/public-bucket/predictions.ndjson")

Local NDJSON file

You can use the upload_annotations method to import a local NDSJON file. Labelbox will validate whether the file is a proper NDJSON file by checking that every line of the file is a valid JSON.

from pathlib import Path

predictions_file = Path("/home/predictions/predictions.ndjson")

upload_job = project.upload_annotations(
    name="upload_annotation_job_1",
    annotations=predictions_file)

List of dictionaries

The upload_annotations method also accepts annotations as a list of dictionaries. Labelbox will automatically convert the dicts to an NDJSON file.

annotations = [ 
    { 
        "uuid": "9fd9a92e-2560-4e77-81d4-b2e955800092", 
        "schemaId": "ckappz7d700gn0zbocmqkwd9i", 
        "dataRow": { 
            "id": "ck1s02fqxm8fi0757f0e6qtdc" 
        }, 
        "bbox": { 
            "top": 48, 
            "left": 58, 
            "height": 865, 
            "width": 1512 
        } 
    }, 
    { 
        "uuid": "29b878f3-c2b4-4dbf-9f22-a795f0720125",
        "schemaId": "ckappz7d800gp0zboqdpmfcty",
        "dataRow": { 
            "id": "ck1s02fqxm8fi0757f0e6qtdc" 
        }, 
        "polygon": [ 
            {"x": 147.692, "y": 118.154}, 
            {"x": 142.769, "y": 404.923}, 
            {"x": 57.846, "y": 318.769}, 
            {"x": 28.308, "y": 169.846} 
        ] 
    }
]

upload_job = project.upload_annotations( 
    name="upload_annotation_job_1", 
    annotations=annotations)

Check bulk import status

You can use BulkImportRequest to check the state of the job. BulkImportRequestState refers to the whole import job and will be one of the following:

State

Description

RUNNING

Indicates that the import job is not done yet.

FAILED

Indicates the import job failed. You'll get an errorFileUrl to an NDJSON containing the error message. The statusFileUrl will be null.

FINISHED

Indicates the import job is no longer running. You’ll get a statusFileUrl to an NDJSON (expires after 24 hours) that contains a SUCCESS or FAILED status per annotation. You’ll also get an errorFileUrl to an NDJSON which has the same format as the outputFileUrl except it contains ONLY error messages for each annotation that did not import successfully.

Additionally, the BulkImportRequest object exposes wait_until_done(). wait_until_done() blocks the BulkImportRequest until the state changes to FINISHED or FAILED, periodically refreshing the object's state.

from labelbox import Client
from labelbox.schema.bulk_import_request import BulkImportRequest
from labelbox.schema.enums import BulkImportRequestState

client = Client(api_key="<LABELBOX_API_KEY>")

upload_job = BulkImportRequest.from_name(
    client,
    project_id="<project_id>"
    name="test_bulk_import_request")

upload_job.wait_until_done()
assert ( 
    upload_job.state == BulkImportRequestState.FINISHED or
    upload_job.state == BulkImportRequestState.FAILED
)

Check for errors

Once the import has finished, refer to the BulkImportRequest object for the status_file_url and error_file_url.

To reupload failed annotations, refer to the errors indicated in the error_file_url and do another bulk import containing only the failed annotations.

import ndjson
import requests

print(f'Here is the status file: {upload_job.status_file_url}')
print(f'Here is the error file: {upload_job.error_file_url}')

if upload_job.error_file_url: 
    res = requests.get(upload_job.error_file_url)
    errors = ndjson.loads(res.text)
    for error in errors: 
        print( 
            "An annotation failed to import for " 
            f"datarow: {error['dataRow']} due to: " 
            f"{error['errors']}")

Sample error response

Each line in the NDJSON response will contain the following information.

Field

Description

uuid

Specifies the annotation for the status row.

dataRow

JSON object containing the Labelbox data row ID for the annotation.

status

Indicates SUCCESS or FAILURE.

errors

An array of error messages included when status is FAILURE. Each error has a name, message and optional (can be null) additional_info.

{
    "uuid": "15f47a22-ba9e-4381-8525-5c3317529365",
    "dataRow": {
        "id": "ckkeedqqt4dqc0sek9abr1isw"
    },
    "status": "FAILURE",
    "errors": [
        {
            "name": "ValidationError",
            "message": "{'bbox': {'width': ['Missing data for required field.']}}",
            "additionalInfo": null
        }
    ]
}

Fetch multiple Datasets

Below are three examples of fetching multiple datasets using the get_datasets method.

Fetch all datasets

This sample code uses the get_datasets method to fetch all datasets and print the collection of datasets by unique ID.

for dataset in client.get_datasets():
    print(dataset.uid)

Use comparison operators

Use the get_datasets method with a where parameter with a comparison operator to filter. Any of the standard comparison operators (==, !=, >, =>, <, <=) will work. The get_datasets method will give you a PaginatedCollection. You can iterate over the PaginatedCollection to get your dataset.

from labelbox import Dataset
datasets_x = client.get_datasets(where=Dataset.name == "X")
for x in datasets_x:
    print(x)

Filter using combined comparisons

Use the get_datasets method to combine comparisons using logical expressions. Currently, the where clause supports the logical AND operator.

from labelbox import Project
datasets = client.get_datasets(where=(Project.name == "X") & (Project.description== "Y"))
for x in datasets_x:
    print(x)

Fetch multiple projects

Here are three ways you can use the get_projects method to fetch multiple projects.

Get all projects

Use the get_projects method to get a list of all projects. This sample code returns the name and uid for each project.

for project in client.get_projects():
    print(project.name, project.uid)

Use comparison operators

Use the get_projects method and filter using comparison operators. Any of the standard comparison operators (==, !=, >, =>, <, <=) will work. The get_projects method will give you a PaginatedCollection. You can iterate over the PaginatedCollection to get the projects. The sample code will raise an error if there is not at least one item in the PaginatedCollection.

from labelbox import Project
projects_x = client.get_projects(where=Project.name =="MyProject")
item = next(iter(projects_x))
    print(item)

Use combined comparisons

Combine comparisons using logical expressions to filter projects. Currently, the where clause supports the logical AND operator.

from labelbox import Project
projects = client.get_projects(where=(Project.name == "X") & (Project.description == "Y"))
for x in projects:
    print(x)

Export labels

The sample below uses the export_labels method to print a URL to a JSON file containing the labels of a project.

The response will be a URL of the label data file.

project = client.get_project("<project_unique_id>")
url = project.export_labels()
print(url)
'https://storage.googleapis.com/labelbox-exports/cjnywra4rytzd
079735j0hfnt/ck22dy2gmnbw08111o6y2ycs9/export-2019-10-29T22:59:08.592Z.json'

Bulk delete labels

Use the bulk_delete method for deleting multiple labels at a time. Users usually seek to do this when they realize there was an error in the way that the labels were initially created. This method allows you to remove the existing annotations.

Here are two ways for bulk deleting labels within a specified project.

Delete all labels

Use the bulk_delete method to delete all labels from a project.

project = client.get_project("<project_unique_id>")
Label.bulk_delete(project.labels())

Filtered relationship expansion

Use a filtered relationship expansion to specify which labels to delete within a project.

project = client.get_project("<project_unique_id>")
Label.bulk_delete(project.labels(where=Label.name=="x"))

API reference

To see the full Labelbox Python API reference on Read the docs, click here.