Client
The Client is the main entry point for working with the Labelbox SDK. The Client is used to fetch DbObjects and execute queries.
DbObject
The Labelbox SDK is primarily used for interacting with Labelbox's database. The core abstraction used to represent a database entity is a DbObject. It has special attributes called Fields and Relationships that facilitate database queries.
Most objects within the Labelbox SDK are DbOjects. You will see Project and Datasets among others throughout the documentation.
Client.execute() for inline GraphQL
You can use client.execute()
to write inline GraphQL operations with the Python SDK. Below is an example of how to use client.execute()
to insert an inline GraphQL query to get a project ontology.
res_str = client.execute("""
query get_ontology($project_id: ID!) {
project(where: {id: $project_id}) {
ontology {
normalized
}
}
}
""", {"project_id": project.uid})
# Get Ontology from existing project
ontology = res_str['project']['ontology']['normalized']
Field caching
When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip fetch to the server to get the field value you have already fetched. Server-side updates that happen after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.
Fields (fetch/update)
Each DbObject has a set of fields. This document explains how to access or update one or more fields on a DbObject as well as how field caching works.
Fetch
DbObject fields must be accessed as attributes (e.g. project.name
). Unlike relationships, fields are cached.
In this example, client.get_projects()
returns a PaginatedCollection
object. You can iterate through the PaginatedCollection
object and specify the fields you want to be returned in the fetch. To fetch multiple fields from an object, you need to separate them by commas.
from labelbox import Client
client = Client()
projects = client.get_projects()
for project in projects:
print(project.name, project.uid)
Update
To update a field, use update()
to pass the field and the new value. The following data types support the update method: Project, Dataset, DataRow, Label, Webhook, and Review.
Each data update using object.update()
on the client-side immediately performs the same update on the server-side. If the client-side update does not raise an exception, you can assume that the update successfully passed on the server-side.
This example uses project.update()
to change a project's name and description.
project = client.get_project("<project_id>")
project.update(name="Project Name", description="New description")
Use comparison operators
Paginated collections support comparisons for filtering a query.
Use the get_datasets
method with a where parameter with a comparison operator to filter. Any of the standard comparison operators (==, !=, >, =>, <, <=) will work. The get_datasets
method will give you a PaginatedCollection
. You can iterate over the PaginatedCollection
to get your dataset.
from labelbox import Dataset
datasets_x = client.get_datasets(where=Dataset.name == "X")
for x in datasets_x:
print(x)
Use combined comparisons
Combine comparisons using logical expressions to filter projects. Currently, the where clause supports the logical AND operator.
from labelbox import Project, Client
client = Client()
projects = client.get_projects(where=(Project.name == "X") & (Project.description == "Y"))
for x in projects:
print(x)
Pagination
Some calls to the API return a very large number of results. To prevent too many results from being returned at once, the Labelbox API limits the number of returned objects and the Python SDK automatically creates a PaginatedCollection instead.
When a PaginatedCollection object is created, nothing is actually fetched from the server. You cannot get a count of objects in the relationship from a PaginatedCollection nor can you access objects within it like you would a list (using squared-bracket indexing).
- For both the top-level object fetch,
client.get_projects()
, and the relationship call,project.datasets()
, aPaginatedCollection
object is returned. ThisPaginatedCollection
object takes care of the paginated fetching.
projects = client.get_projects()
type(projects)
projects = list(projects)
type(projects)
-
Note that nothing is fetched immediately when the
PaginatedCollection
object is created. -
Round-trips to the server are made only as you iterate through a
PaginatedCollection
. In this code block, that happens when alist
is initialized with aPaginatedCollection
, and when aPaginatedCollection
is iterated over in a for-loop.
project = projects[0]
datasets = project.datasets()
type(datasets)
for dataset in datasets:
dataset.name
- You cannot get a count of objects in the relationship from a
PaginatedCollection
nor can you access objects within it like you would a list (using squared-bracket indexing). You can only iterate over it.
Iterate over PaginatedCollection
Be careful about converting a PaginatedCollection into a list. This will cause all objects in that collection to be fetched from the server.
In cases when you need only some objects, you can use two convenient methods: get_one()
and get_many(n)
.
data_row = dataset.data_rows().get_one() # returns first result
# or
data_rows = dateset.date_rows.get_many(10) # returns list of 10 first results
You can still iterate over a PaginatedCollection object as with a normal iterator object:
data_rows = dataset.data_rows()
for data_row in data_rows:
# your business logic
pass
Relationships (fetch/add/update)
If you need to connect two data classes, you can do so via relationships. This document gives you some examples of how to get/add/update relationships between different data classes.
Fetch relationship
Relationships are fetched every time you call them. They are not cached. This is made explicit by defining relationships as callable methods on objects (e.g. project.datasets()
).
To get all datasets for a project, define project
and call datasets()
as a method on project to access the datasets related to that project.
If you will only be modifying your data during small timeframes, then keep references to related objects.
project = client.get_project("<project_uid>")
datasets = project.datasets()
Create relationship
To create a relationship between two objects, call the connect
method directly from the relationship. This code sample connects a dataset to a project.
project = client.get_project("<project_uid>")
project.datasets.connect(dataset_1)
Update relationship
To update a relationship, use the disconnect()
and then connect()
.
Note
Note:
update()
does not work for updating relationships.
project.datasets.disconnect(dataset_1)
project.datasets.connect(dataset_2)