Labelbox Python SDK

How to use these docs

These docs will help you get started with the Python SDK. The Python SDK is for customers who want to integrate programmatically with Labelbox using an extensive set of custom methods.

The Python SDK docs are organized into the following sections:

  • Installation & authentication: Contains instructions for creating your API key, installation, authentication, and API rate limits.

  • Fundamental concepts: Describes the Client, DBObjects, comparison operators, pagination, object relationships, object fields, and field caching.

  • Best practices: Some recommendations to ensure the best possible experience with the Python SDK.

  • Tutorials: Contains full end-to-end exercises for the most common workflows and tasks in Labelbox.

  • API reference: A comprehensive list of all methods available in the Python SDK along with sample code blocks, descriptions for attributes, and explanations for error messages.

    see-labelbox-python-ref.png

Rate limits

Labelbox uses rate limits as a safeguard against bursts of incoming requests to maximize stability for all of our customers.

If you hit the rate limit, the SDK will raise an exception: labelbox.exceptions.ApiLimitError

We recommend building a retry mechanism that uses exponential backoff to space out your requests once you reach the quota.

If you are consistently hitting your rate limit, contact our support team.

Installation & authentication

The Python SDK allows you to access all of the functionalities of the Labelbox API without having to use GraphQL.

Requirements

  • Python version 3.6 or 3.7.

  • Make sure pip is installed.

  • Create your API key in the Account section.

  • Save your API key somewhere as it will be hidden afterward.

Install/upgrade

To install, run pip install labelbox in your command line.

To upgrade, consult the changelog and run pip install --upgrade labelbox. See the Python SDK repo in Github.

Authentication

There are 3 ways to set up authentication with Labelbox.

Option 1

Pass your API key as an environment variable in the command line.

Then, import and initialize the API Client.

user@machine:~$ export LABELBOX_API_KEY="<your_api_key>"
user@machine:~$ python3
from labelbox import Client
client = Client()

Option 2

Pass a custom endpoint. This is only applicable for on-premises use cases. If this applies to you, you may pass the API key and server endpoint explicitly when you initialize the Client object. Otherwise, refer to Option 1.

from labelbox import Client

client = Client("<your_api_key_here>", "https://app.your-domain.com/api/graphql")

Option 3

Run this Python script and pass your API key as a string.

from labelbox import Client

if __name__ == '__main__':
    API_KEY = "<your_api_key_here>"
    client = Client(API_KEY)

Fundamental concepts

Client

The Client is the main entry point for working with the Labelbox SDK. The Client is used to fetch DbObjects and execute queries.

DbObject

The Labelbox SDK is primarily used for interacting with Labelbox's database. The core abstraction used to represent a database entity is a DbObject . It has special attributes called Fields and Relationships that facilitate database queries.

Most objects that users work with in the Labelbox SDK are DbOjects. You will see Project and Datasets among others throughout the documentation.

Field caching

When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip fetch to the server to get the field value you have already fetched. Server-side updates that happen after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.

Fields (fetch/update)

Each DbObject has a set of fields. This document explains how to access or update one or more fields on a DbObject as well as how field caching works.

Fetch

DbObject fields must be accessed as attributes (e.g. project.name). Unlike relationships, fields are cached.

In this example, client.get_projects() returns a PaginatedCollection object. You can iterate through the PaginatedCollection object and specify the fields you want to be returned in the fetch. To fetch multiple fields from an object, you need to separate them by commas.

Example 1. Iterate through projects in PaginatedCollection
from labelbox import Client

client = Client()
projects = client.get_projects()
for project in projects:
    print(project.name, project.uid)

Update

To update a field, use update() to pass the field and the new value. The following data types support the update method: Project, Dataset, DataRow, Label, Webhook, and Review.

Each data update using object.update() on the client-side immediately performs the same update on the server-side. If the client-side update does not raise an exception, you can assume that the update successfully passed on the server-side.

This example uses project.update() to change a project's name and description.

Example 2. Update a project name and description
project = client.get_project("<project_id>")
project.update(name="Project Name", description="New description")

Use comparison operators

Paginated collections support comparisons for filtering a query.

Use the get_datasets method with a where parameter with a comparison operator to filter. Any of the standard comparison operators (==, !=, >, =>, <, <=) will work. The get_datasets method will give you a PaginatedCollection. You can iterate over the PaginatedCollection to get your dataset.

Use combined comparisons

Combine comparisons using logical expressions to filter projects. Currently, the where clause supports the logical AND operator.

Example 3. Iterate over the PaginatedCollection
from labelbox import Dataset
datasets_x = client.get_datasets(where=Dataset.name == "X")
for x in datasets_x:
    print(x)
Example 4. Combined comparisons to filter
from labelbox import Project, Client

client = Client()
projects = client.get_projects(where=(Project.name == "X") & (Project.description == "Y"))
for x in projects:
    print(x) 

Pagination

Some calls to the API return a very large number of results. To prevent too many results from being returned at once, the Labelbox API limits the number of returned objects and the Python SDK automatically creates a PaginatedCollection instead.

When a PaginatedCollection object is created, nothing is actually fetched from the server. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing).

  1. For both the top-level object fetch, client.get_projects(), and the relationship call, project.datasets(), a PaginatedCollection object is returned. This PaginatedCollection object takes care of the paginated fetching.

  2. Note that nothing is fetched immediately when the PaginatedCollection object is created.

  3. Round-trips to the server are made only as you iterate through a PaginatedCollection. In the code above that happens when a list is initialized with a PaginatedCollection, and when a PaginatedCollection is iterated over in a for-loop.

  4. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing). You can only iterate over it.

Example 5. Returns a PaginatedCollection
projects = client.get_projects()
type(projects)
Example 6. Returns a list
projects = list(projects)
type(projects)
Example 7. Iterate through a PaginatedCollection
project = projects[0]
datasets = project.datasets()
type(datasets)
for dataset in datasets:
    dataset.name

Iterate over PaginatedCollection

Be careful about converting a PaginatedCollection into a list. This will cause all objects in that collection to be fetched from the server. In cases when you need only some objects (let's say the first 10 objects), it is much faster to iterate over the PaginatedCollection and simply stop once you're done.

This sample script demonstrates how to do this.

If you expect a single element in your PaginatedCollection you can also grab one at a time by calling next on it.

Example 8. Iterate over a PaginatedCollection
data_rows = dataset.data_rows()
first_ten = []
for data_row in data_rows:
    first_ten.append(data_row)
    if len(first_ten) >= 10:
        break
Example 9. Call next
data_rows = dataset.data_rows()
data_row = next(data_rows)

Relationships (fetch/add/update)

If you need to connect two data classes, you can do so via relationships. This document gives you some examples of how to get/add/update relationships between different data classes.

Fetch

Relationships are fetched every time you call them. They are not cached. This is made explicit by defining relationships as callable methods on objects (e.g. project.datasets()).

To get all datasets for a project, define project and call datasets() as a method on project to access the datasets related to that project.

If you will only be modifying your data during small timeframes, then keep references to related objects.

Example 10. Get all datasets for a project
project = client.get_project("<project_uid>")
datasets = project.datasets()

Create

To create a relationship between two objects, call the connect method directly from the relationship.

Example 11. Connect a dataset to a project
project = client.get_project("<project_uid>")
project.datasets.connect(dataset_1)

Update

To update a relationship, use the disconnect() and then connect().

Caution

Note: update() does not work for updating relationships.

Example 12. Connect a new dataset
project.datasets.disconnect(dataset_1)
project.datasets.connect(dataset_2)

Best Practices

In order to get the best experience from the SDK, we recommend the following:

1. Always use bulk operations

  • Unless you are testing or working on small datasets use bulk operations for faster and more reliable performance.

  • create_data_rows instead of create_data_row

  • project.export_labels() instead of project.labels()

  • Label.bulk_delete(labels) instead of [label.delete() for label in labels]

2. Use next to get elements of a paginated collection instead of list

  • list(dataset.data_rows())[0] is going to query for every single data_row in your dataset.

  • Instead use next(dataset.data_rows()). This will request much less information and should be faster

  • If you don't want next to raise StopIteration on an empty result, you can use next( dataset.data_rows(), "default_value")

3. Make sure to use the latest version of the SDK

  • Labelbox is a rapidly evolving company and we are constantly adding new features and optimizations.

Tutorials

Use these end-to-end tutorials on your own data.

To run the notebook tutorials locally, clone this Github repository.

To run the notebook tutorials in Google Colab, click on the links below. When you run a notebook, you will be prompted to authenticate with Google Drive in order to load your API key. After authenticating, you will be asked for your API key if you have never added it before.

Basics

Tutorial

Github

Google Colab

Fundamentals

Open in Github

open-in-colab.svg

Data Rows

Open in Github

open-in-colab.svg

Datasets

Open in Github

open-in-colab.svg

Labels

Open in Github

open-in-colab.svg

Ontologies

Open in Github

open-in-colab.svg

Projects

Open in Github

open-in-colab.svg

User management

Open in Github

open-in-colab.svg

Project configuration

Tutorial

Github

Google Colab

Project setup

Open in Github

open-in-colab.svg

Queue management

Open in Github

open-in-colab.svg

Webhooks

Open in Github

open-in-colab.svg

Model-assisted labeling (MAL)

Tutorial

Github

Google Colab

MAL basics

Open in Github

open-in-colab.svg

MAL for images

Open in Github

open-in-colab.svg

MAL for named entity recognition

Open in Github

open-in-colab.svg

MAL for tiled imagery

Open in Github

open-in-colab.svg

Debugging MAL

Open in Github

open-in-colab.svg

MAL with subclasses

Open in Github

open-in-colab.svg

Label exports

Tutorial

Github

Google Colab

Image annotation export

Open in Github

open-in-colab.svg

Text annotation export

Open in Github

open-in-colab.svg

Video annotation export

Open in Github

open-in-colab.svg

API reference

Click the button below to view our full Python API reference.

see-labelbox-python-ref.png