How to export data, with examples for each type of export along with details on optional parameters and filters.
Export specifications
Data type | Annotation export formats | Project export | Model run export |
---|---|---|---|
Image | See export formats | See sample | See sample |
Video | See export formats | See sample | See sample |
Text | See export formats | See sample | See sample |
Geospatial | See export formats | See sample | See sample |
Documents | See export formats | See sample | Not supported yet |
Audio | See export formats | See sample | Not supported yet |
Conversational text | See export formats | See sample | Not supported yet |
DICOM | See export formats | See sample | Not supported yet |
HTML | See export formats | See sample | Not supported yet |
There are three ways to export data from Labelbox: export from Catalog, export from a labeling project, and export from a model run.
Required & optional fields
Below is the complete list of required and optional fields supported for exports.
Field | Description | Project export | Model run export | Catalog export |
---|---|---|---|---|
data_row | Contains the basic information of the data row: - id - row_data - global_key - data_row_details (optional, see below) | Always | Always | Always |
data_row_details | Contains additional details of the data row: - dataset_id - dataset_name - created_at - updated_at - last_activity_at - created_by | Optional | Optional | Optional |
media_attributes | See Media attributes | Optional | Optional | Optional |
attachments | See Attachments | Optional | Optional | Optional |
metadata_fields | See Metadata | Optional | Optional | Optional |
projects | Contains the ID of the project in which the data row was labeled. | Always | n/a | Optional |
<project_id> | Contains the following sections, which are expanded on below: - labels - project_details | Always | n/a | Optional |
labels | Contains a list of labels attached to this data row: - label_kind - version - id - annotations | Always | Always | Optional |
label_details | Contains details of each specific label: - created_at - updated_at - created_by - reviews | Optional | n/a | Optional |
performance_details | Contains label-specific performance details: - seconds_to_create - seconds_to_review - skipped - benchmark_reference_label - benchmark_score - consensus_score - consensus_label_count - consensus_labels | Optional | n/a | Optional |
project_details | Contains project-specific information about this data row: - ontology_id - task_id - task_name - batch_id - batch_name - workflow_status - priority - selected_label_id - consensus_expected_label_count - workflow_history | Optional | n/a | Optional |
experiments | Contains the ID of the model experiment(s) in which the data row was stored. | n/a | Always | Optional |
<model_experiment_id> | Contains the following sections, which are expanded on below: - name - runs | n/a | Always | Optional |
name | Name of the model. | n/a | Always | Optional |
runs | Contains the ID of the model run(s) in which the data row was stored. | n/a | Always | Optional |
<model_run_id> | Contains the following sections, which are expanded on below: - name - annotation_group_id - labels - predictions - split | n/a | Always | Optional |
name | Name of the model run. | n/a | Always | Optional |
run_data_row_id | Model run data row ID, similar to data_row_id but in a model run's context. | n/a | Always | Optional |
labels | Contains a list of the ground truth labels attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Always | Always |
predictions | Contains a list of predictions attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Optional | Optional |
split | Contains the split the data row belongs to (either Training , Validation , or Test ). | n/a | Optional | Optional |
Optional parameters and filters
Parameters
When you export data rows from a project, a model run, or Catalog, you can set parameters to include optional fields in the exports. The table below expresses the parameters that are available for each type of export.
Parameter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) |
---|---|---|---|---|
attachments | ✔ | ✔ | ✔ | ✔ |
metadata_fields | ✔ | ✔ | ✔ | ✔ |
data_row_details | ✔ | ✔ | ✔ | ✔ |
project_details | ✔ | - | ✔ | ✔ |
label_details | ✔ | - | ✔ | ✔ |
performance_details | ✔ | - | ✔ | ✔ |
interpolated_frames | ✔ | ✔ | ✔ | ✔ |
predictions | - | ✔ | - | - |
project_ids | - | - | ✔ | ✔ |
model_run_ids | - | - | ✔ | ✔ |
For explanations of each field and subfield, see Export v2 glossary. For a detailed explanation of the project_ids
and model_run_ids
parameters, see Export data rows from Catalog below.
To learn how to apply these filters, see the below sections specific to each export type.
Filters
You can use filters to select a subset of data rows to export. The table below contains the filters that are supported for each type of export. You can apply multiple supported filters to the same export. Combinations of filters apply AND
operator logic.
Filter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) |
---|---|---|---|---|
last_activity_at | ✔ | - | ✔ | - |
label_created_at | ✔ | - | ✔ | - |
workflow_status | ✔ | - | - | - |
global_keys | ✔ | - | ✔ | - |
data_row_ids | ✔ | - | ✔ | - |
The last_activity_at
and label_created_at
filters take the structure of [<start_date>, <end_date>]
and can have the following formats:
YYYY-MM-DD
YYYY-MM-DD hh:mm:ss
YYYY-MM-DDThh:mm:ss±hhmm
(ISO 8601)None
The ISO 8601 format allows you to specify the timezone, while the other two formats assume the timezone from the user's workspace settings.
Last activity at
The last_activity_at
filter captures only the data rows where the following attributes have been created or modified in the specified time frame:
- Labels associated with the data row
- Metadata
- Data row status within a project
- Issues or comments on labels associated with the data row
Label created at
The label_created_at
filter captures only the data rows where labels have been submitted in the specified time frame.
Workflow status
The workflow_status
filter allows you to export only the data rows in a specific status of a project's workflow. The filter accepts the following values:
ToLabel
InReview
InRework
Done
This filter only accepts one value. For example, filters = {"workflow_status": "InReview"}
.
Global keys
The global_keys
filter allows you to export only the data rows with the specified global keys within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"global_keys": ["global_key_1", "global_key_2"]}
.
Data row IDs
The data_row_ids
filter allows you to export only the data rows with the specified IDs within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"data_row_ids": ["data_row_id_1", "data_row_id_2"]}
.
See the below sections specific to each export type for contextual examples of how to apply these filters.
Export data rows from a project
When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history. Then, when you export from a project, you may choose to include or exclude certain attributes in your export.
Please note that export_v2
is not supported for projects that use a custom editor.
See the table at the top of this page to find the JSON export formats for each data type.
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True
}
# You can set the range for last_activity_at and label_created_at. You can also set a list of data
# row ids to export.
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.
# Note: Combinations of filters apply AND logic.
filters= {
"last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"workflow_status": "InReview",
"data_row_ids": ["data_row_id_1", "data_row_id_2"],
"global_keys": ["global_key_1", "global_key_2"]
}
export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
labels = project.label_generator()
## Alternatively you can specify date range to export desired labels from a project
labels = project.label_generator(start="2020-01-01", end="2020-01-02")
for label in labels:
print(label.annotations)
Export data rows from Catalog
You can export data rows and all their information from a Dataset or a Catalog Slice.
When exporting from Catalog, you can include information about a data row from all projects and model runs to which it belongs. Specifically, for the selected data rows, you can export the labels from multiple projects and/or the predictions from multiple model runs.
As shown below, the project_ids
and model_run_ids
parameters accept a list of IDs.
See the table at the top of this page to find the JSON export formats for each data type.
Export from a dataset
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True,
"project_ids": ["<project_id_1>", "<project_id_2>"],
"model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}
# You can set the range for last_activity_at
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.
# Note: This is an AND logic between the filters, so usually using one filter is sufficient.
filters= {
"last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}
dataset = client.get_dataset("<dataset_id>")
export_task = dataset.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
Export a list of selected data rows from a dataset
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True,
"project_ids": ["<project_id_1>", "<project_id_2>"],
"model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}
# Put the selected data rows into a filter, then it will only export those data rows
filters= {
"data_row_ids": ["data_row_id_1", "data_row_id_2"],
"global_keys": ["global_key_1", "global_key_2"]
}
export_task = dataset.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
Export from a slice
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True,
"project_details": True,
"label_details": True,
"performance_details": True,
"project_ids": ["<project_id_1>", "<project_id_2>"],
"model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}
catalog_slice = client.get_catalog_slice("<slice_id>")
export_task = catalog_slice.export_v2(params=export_params)
export_task.wait_till_done()
if export_task.errors:
print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
Export data rows from a model run
Export V2 is the recommended format. See Export from model runs for more details.
See the table at the top of this page to find the JSON export formats for each data type.
# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed
export_params= {
"attachments": True,
"metadata_fields": True,
"data_row_details": True
}
export_task = model_run.export_v2(params=export_params)
export_task.wait_till_done()
print(export_task.errors)
export_json = export_task.result
# Turn on the experimental mode of the SDK
client.enable_experimental=True
# If download=False, this returns the URLs of the data files associated with this ModelRun’s labels.
download = False
model_run.export_labels(download=download)
# If download=True, this instead returns the contents as NDJSON format.
download = True
model_run.export_labels(download=download)