Catalog slice

How to programmatically retrieve data from a Catalog slice through the SDK

Get Catalog slices via SDK

You can programmatically retrieve a slice's data rows and all associated information via our Python SDK. From there, you can use Catalog to inspect the data rows you retrieved via the SDK visually.

Retrieving a slice programmatically is a convenient way to curate a new batch or a model run dataset directly from a saved slice.

End-to-end example: create and retrieve Catalog slices

Before you start

You must import these libraries to use the code examples in this section

import labelbox as lb
import uuid

Replace with your API key

API_KEY = ""
# To get your API key go to: Workspace settings -> API -> Create API Key
client = lb.Client(api_key=API_KEY)

Create a Catalog slice

Currently, we do not support creating slices through the SDK; for the purpose of this demo, we will create a catalog slice through the UI.

  1. Navigate to the Catalog section of the Labelbox Platform, and select All datasets or a particular dataset you would like to create a slice from.
  2. Navigate to Search your data dropdown menu or use a similarity search to create a filter.
Example of final view after executing step 1 - 2

Example of final view after executing steps 1 - 2

  1. Hit Enter and select Save slice
  2. Give the slice name and select Save
  3. Copy the Slice ID
Example of final view after executing step 3 - 5

Example of final view after executing steps 3 - 5

  1. Paste Slice ID
catalog_slice_id = "<CATALOG_SLICE_ID_FROM_UI>"

Get Catalog Slice

catalog_slice = client.get_catalog_slice(catalog_slice_id) 

Obtain Data Row IDs and Data Row objects from the Catalog slice

# Get a data row id
slice_data_rows_ids = catalog_slice.get_data_row_ids()

# Get a data row objects
for data_row_id in slice_data_rows_ids:
  print(client.get_data_row(data_row_id))

Obtain Data Row identifiers

Data row identifiers are objects that contain both the data row ID(s) and global keys.

data_row_identifiers = catalog_slice.get_data_row_identifiers()

drids = [dr for dr in data_row_identifiers]

# get both global keys and data row ids 
# and utilize the hash method to combine both global keys and data row ids into a dictionary
for dr in drids: 
  print(f"Data row: {dr.id}, Global Key: {dr.global_key}, dr_gk: {dr.to_hash()}")


Curate a batch from a Catalog slice via SDK

You can create a new batch from your slice or create a random sample from a slice using our Python SDK. See the Python example below to learn how to do this.

# Optional: sample Data rows from your Slice
sampled_data_row_ids = random.sample(slice_data_rows_ids, 5)

batch = project.create_batch(
  "test batch", # name of the batch
  sampled_data_row_ids, # list of Data Rows
  1 # priority between 1-5
)

You can append data rows to your model runs for inference from your slice. See the Python example below to learn how to do this.

model_run.upsert_data_rows(list(slice_data_rows_ids))