Fundamental concepts

Overview of concepts used throughout Labelbox-Python SDK.

DbObject

The Labelbox SDK is primarily used to interact with Labelbox's database. The core abstraction used to represent a database entity is a DbObject. It has special attributes called Fields and Relationships that facilitate database queries.

Most objects within the Labelbox SDK are DbOjects. You will see Project and Dataset, for example, among others throughout the documentation.

Field caching

When you fetch an object from the server, the client obtains all field values for that object. When you access that obtained field value, the cached value is returned. There is no round-trip fetch to the server to get the field value you have already fetched. Server-side updates made after the client-side fetch are not auto-propagated, meaning the values returned will still be the cached values.

Fields

Each DbObject has a set of fields. The below sections explain how to access or update one or more fields on a DbObject.

Fetch

DbObject fields must be accessed as attributes (e.g.,project.name). Unlike relationships, fields are cached.

In this example, client.get_projects() returns a PaginatedCollection object. You can iterate through the PaginatedCollection object and specify the fields you want to be returned in the fetch. To fetch multiple fields from an object, you must separate them by commas.

from labelbox import Client

client = Client()
projects = client.get_projects()
for project in projects:
    print(project.name, project.uid)

Update

To update a field, use the DbObject.update() method to pass the field and the new value. Each data update that usesDbObject.update() on the client side immediately performs the same update on the server side. If the client-side update does not raise an exception, you can assume the update was successfully passed on the server side.

This example uses theproject.update() method to change a project's name and description.

project = client.get_project("<project_id>")
project.update(name="Project Name", description="New description")

Use comparison operators

Paginated collections support comparisons for filtering a query.

For example, you can use the get_datasets method with a where parameter with a comparison operator to filter. Any of the standard comparison operators (==, !=, >, =>, <, <=) will work. The get_datasets method will give you a PaginatedCollection. You can iterate over the PaginatedCollection to get your dataset.

from labelbox import Dataset
datasets_x = client.get_datasets(where=Dataset.name == "X")
for x in datasets_x:
    print(x)

Use combined comparisons

Combine comparisons using logical expressions to filter projects. Currently, the where clause supports the logical AND operator.

from labelbox import Project, Client

client = Client()
projects = client.get_projects(where=(Project.name == "X") & (Project.description == "Y"))
for x in projects:
    print(x)

Pagination

Some API calls return a large number of results. To prevent too many results from being returned at once, the Labelbox API limits the number of returned objects, and the Python SDK automatically creates a PaginatedCollection instead.

  1. For both the top-level object fetch, client.get_projects(), and the relationship call, project.batches(), a PaginatedCollection object is returned. This PaginatedCollection object takes care of the paginated fetching.
projects = client.get_projects()
type(projects)
project = projects[0]
datasets = project.batches()
type(batches)
for batch in batches:
    batch.name
  1. Note that nothing is fetched immediately when the PaginatedCollection object is created.

  2. Round-trips to the server are made only as you iterate through a PaginatedCollection. In this code block, that happens when a list is initialized with a PaginatedCollection, and when a PaginatedCollection is iterated over in a for-loop.

  3. You cannot get a count of objects in the relationship from a PaginatedCollection nor access objects within it like a Python list (using squared-bracket indexing). You can only iterate over it.

Iterate over PaginatedCollection

Be careful about converting a PaginatedCollection into a list. This will cause all objects in that collection to be fetched from the server.

In cases when you need only some objects, you can use two convenient methods: get_one() and get_many(n).

data_row = dataset.data_rows().get_one() # returns first result
# or
data_rows = dateset.date_rows.get_many(10) # returns list of 10 first results

You can still iterate over a PaginatedCollection object as with a normal iterator object:

data_rows = dataset.data_rows()
for data_row in data_rows:
  	# your business logic
    pass