Slices
Slices in Catalog
A slice is a subset of data rows that share a common characteristic. You can construct a slice by narrowing down the data rows by one or more filters and saving the subset of data rows as a slice. Often users will use filters to surface high-impact data and then save it as a slice.
Create a slice
Once you have narrowed down a subset of data rows using one or more filters, click Save slice to save the subset of data rows as a slice.
Slice limit
Every org can create up to 100 slices.
You will be prompted to give a name (3 to 30 characters) and an optional description for your slice.
After you create a slice, the slice will show up in your list of Slices, in the left side panel of the Catalog.
Explore a slice
To explore a slice, go to the left side panel of the Catalog, click on Slices, and click on the slice name you want to explore.
The filter associated with the slice will show up instantly. All data rows matching the filter will show up as well.
Once you are done exploring a slice, you can:
- Click on a different slice name, to explore a different slice
- Click on a dataset name to explore a dataset
- Click on All data to explore your full Catalog
Update a slice
Over time, you might need to adjust the filter associated with a slice. To do so, navigate to your slice by clicking on Slices, then click on a slice name.
The filter associated with the slice will show up. You can modify, add or delete attributes of the filter. After you update the filter, save your changes by clicking on Update slice.
Labelbox will prompt you to choose between the following:
- Update the slice by associating it with the new filter OR
- Create a new slice associated with the new filter
Automate data curation using slices
Slices are dynamic
Slices are dynamic, meaning the data rows in a Slice may change over time. There are two ways this can happen:
- New data rows may appear a slice: If you add new data rows to the Catalog, they will appear in any slice whose filter they match. A data row can appear in many slices.
- Existing data rows may disappear from a slice: If a data row is deleted from the Catalog, or if it does not match a slice filter anymore, then it will not show up in the slice anymore.
You can use slices to enable automated data curation workflows. For example, here's a workflow for surfacing high-impact data in the Catalog:
- Set up a filter in the Catalog UI. Check that the filter is surfacing the data rows you wish to target.
- Save the filter by creating a slice.
- Set up the SDK to upload all incoming data (e.g. a new dataset, new data coming from production) to Catalog.
After you create the slice, any incoming data rows that match the filter will automatically show up in the slice. You can open the slice in Catalog every day, week, or month to explore the incoming high-impact data that automatically surfaced and take action on it.
Get Catalog slice via SDK
You can retrieve a slice's data rows and all associated information programmatically via our Python SDK. From there, you can use Catalog to visually inspect the data rows you retrieved via the SDK.
Retrieving a slice programmatically is a convenient way to programmatically curate a new batch or a model run dataset from a saved slice directly.
- Go to Catalog
- Click Slices and select a slice
- Click on the settings icon
- Copy slice ID
catalog_slice_id = "<CATALOG_SLICE_ID_FROM_UI>"
catalog_slice = client.get_catalog_slice(catalog_slice_id) #-> CatalogSlice
print(catalog_slice) # list the filter used for catalog slice.
# --
# <CatalogSlice {'created_at': datetime.datetime(2022, 10, 24, 18, 47, 43, 666000, tzinfo=datetime.timezone.utc), 'description': None, 'filter': [{'ids': ['cl6wheen01ucx0y169n8v2m3g'], 'type': 'project', 'operator': 'is'}], 'name': 'test slice', 'uid': 'cl9n4smy906h60yy6cy8f37wb', 'updated_at': datetime.datetime(2022, 10, 24, 18, 47, 43, 666000, tzinfo=datetime.timezone.utc)}>
# Get data row ids in a slice
slice_data_rows_ids = catalog_slice.get_data_row_ids()
for data_row_id in slice_data_rows_ids:
print(client.get_data_row(data_row_id))
Curate a batch from a Catalog slice via SDK
You can create a new batch from your slice or create a random sample from a slice using our Python SDK. See the Python example below to learn how to do this.
# Optional: sample Data rows from your Slice
sampled_data_row_ids = random.sample(slice_data_rows_ids, 5)
batch = project.create_batch(
"test batch", # name of the batch
sampled_data_row_ids, # list of Data Rows
1 # priority between 1-5
)
You can append data rows to your model runs for inference from your slice. See the Python example below to learn how to do this.
model_run.upsert_data_rows(list(slice_data_rows_ids))
model_run.upsert_data_rows(list(slice_data_rows_ids))
Updated about 1 month ago