Export overview - Labelbox

Open in Colab

GitHub

To export data via the SDK, use export(), a scalable and efficient method that allows streaming of unlimited data rows while providing a unique task object for tracking progress.

export_v2() is deprecated

The previously available export_v2() method has been deprecated and will be removed in version 7.0 of the SDK. If you’re currently using export_v2(), we strongly encourage you to switch to export() for its enhanced streamable implementation.

To learn how to export data rows from a project via the app UI, visit Export labels from project.

Export specifications

Data type	Annotation export formats	Project export	Model run export
Image	See export formats	See sample	See sample
Video	See export formats	See sample	See sample
Text	See export formats	See sample	See sample
Geospatial	See export formats	See sample	See sample
Documents	See export formats	See sample	Not supported yet
Audio	See export formats	See sample	Not supported yet
Conversational text	See export formats	See sample	Not supported yet
HTML	See export formats	See sample	Not supported yet

There are three ways to export data from Labelbox: export from Catalog, export from a labeling project, and export from a model run.

Required & optional fields

Below is the complete list of required and optional fields supported for exports.

Field	Description	Project export	Model run export	Catalog export
`data_row`	Contains the basic information of the data row: - `id` - `row_data` - `global_key` - `data_row_details` (optional, see below)	Always	Always	Always
`data_row_details`	Contains additional details of the data row: - `dataset_id` - `dataset_name` - `created_at` - `updated_at` - `last_activity_at` - `created_by`	Optional	Optional	Optional
`media_attributes`	See Media attributes	Optional	Optional	Optional
`attachments`	See Attachments	Optional	Optional	Optional
`metadata_fields`	See Metadata	Optional	Optional	Optional
`embeddings`	Contains a list of dictionaries with precomputed and custom embeddings	Optional	Optional	Optional
`projects`	Contains the ID of the project in which the data row was labeled.	Always	n/a	Optional
`<project_id>`	Contains the following sections, which are expanded on below: - `labels` - `project_details`	Always	n/a	Optional
`labels`	Contains a list of labels attached to this data row: - `label_kind` - `version` - `id` - `annotations`	Always	Always	Optional
`label_details`	Contains details of each specific label: - `created_at` - `updated_at` - `created_by` - `reviews`	Optional	n/a	Optional
`performance_details`	Contains label-specific performance details: - `seconds_to_create` - `seconds_to_review` - `skipped` - `performance_details_v2`, which contains: - `seconds_to_create` - `seconds_to_review` - `seconds_to_rework` - `seconds_total`	Optional	n/a	Optional
`project_details`	Contains project-specific information about this data row: - `ontology_id` - `task_id` - `task_name` - `batch_id` - `batch_name` - `workflow_status` - `priority` - `selected_label_id` - `consensus_expected_label_count` - `workflow_history`	Optional	n/a	Optional
`project_tags`	See Project tags	Always	n/a	n/a
`experiments`	Contains the ID of the model experiment(s) in which the data row was stored.	n/a	Always	Optional
`<model_experiment_id>`	Contains the following sections, which are expanded on below: - `name` - `runs`	n/a	Always	Optional
`name`	Name of the model.	n/a	Always	Optional
`runs`	Contains the ID of the model run(s) in which the data row was stored.	n/a	Always	Optional
`<model_run_id>`	Contains the following sections, which are expanded on below: - `name` - `annotation_group_id` - `labels` - `predictions` - `split`	n/a	Always	Optional
`name`	Name of the model run.	n/a	Always	Optional
`run_data_row_id`	Model run data row ID, similar to `data_row_id` but in a model run’s context.	n/a	Always	Optional
`labels`	Contains a list of the ground truth labels attached to this data row and included in this model run: - `label_kind` - `version` - `id` - `annotations`	n/a	Always	Always
`predictions`	Contains a list of predictions attached to this data row and included in this model run: - `label_kind` - `version` - `id` - `annotations`	n/a	Optional	Optional
`split`	Contains the split the data row belongs to (either `Training`, `Validation`, or `Test`).	n/a	Optional	Optional

Optional parameters and filters

Parameters

When you export data rows from a project, a model run, or Catalog, you can set parameters to include optional fields in the exports. The table below expresses the parameters available for each type of export.

Parameter	Project export	Model run export	Dataset export (Catalog)	Slice export (Catalog)
`attachments`	✔	✔	✔	✔
`metadata_fields`	✔	✔	✔	✔
`embeddings`	✔	✔	✔	✔
`data_row_details`	✔	✔	✔	✔
`project_details`	✔	-	✔	✔
`label_details`	✔	-	✔	✔
`performance_details`	✔	-	✔	✔
`interpolated_frames`	✔	✔	✔	✔
`predictions`	-	✔	-	-
`model_run_details`	-	✔	-	-
`model_run_ids`	-	-	✔	✔
`project_ids`	-	-	✔	✔
`all_projects`	-	-	✔	✔
`all_model_runs`	-	-	✔	✔

For explanations of each field and subfield, see Export glossary. For a detailed explanation of the project_ids, model_run_ids, all_projects, and all_model_runs parameters, see Export data rows from Catalog below. To learn how to apply these filters, see the below sections specific to each export type.

Filters

You can use filters to select a subset of data rows to export. The table below contains the filters supported for each export type. You can apply multiple supported filters to the same export. Combinations of filters apply AND operator logic.

Filter	Project export	Model run export	Dataset export (Catalog)	Slice export (Catalog)
`last_activity_at`	✔	-	✔	-
`label_created_at`	✔	-	✔	-
`workflow_status`	✔	-	-	-
`batch_ids`	✔	-	-	-
`global_keys`	✔	-	✔	-
`data_row_ids`	✔	-	✔	-

The last_activity_at and label_created_at filters take the structure of [<start_date>, <end_date>] and can have the following formats:

YYYY-MM-DD(this is an alias of YYYY-MM-DD 00:00:00)
YYYY-MM-DD hh:mm:ss
YYYY-MM-DDThh:mm:ss±hhmm (ISO 8601)
None

The ISO 8601 format allows you to specify the timezone, while the other two formats assume the timezone from the user’s workspace settings.

Last activity at

The last_activity_at filter captures only the data rows where the following changes have been made in the specified time frame:

Changes update a data row’s data (rowData), external ID (externalId), or global key (globalKey)
Changes are made to annotations, attachments, embeddings, or metadata
Data rows are added to batches
Data row labeling tasks change
Labels, reviews, comments, or issues are added to a project containing the data row

Data rows in multiple projects update last_activity_at when such changes occur in any project containing the data rows.

Label created at

The label_created_at filter captures only the data rows where labels have been created in the specified time frame.

Workflow status

The workflow_status filter allows you to export only the data rows in a specific status of a project’s workflow. The filter accepts the following values:

ToLabel
InReview
InRework
Done

This filter only accepts one value. For example, filters = {"workflow_status": "InReview"}.

Batch IDs

The batch_ids filter allows you to export only the data rows in a specific batch or batches. This filter accepts a list of batch IDs. For example, filters = {"batch_ids": ["batch_id_1", "batch_id_2"]}. To get the batches sent to a project and their associated information, you can use the project.batches() method. For more information, see Get the batches.

Global keys

The global_keys filter allows you to export only the data rows with the specified global keys within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"global_keys": ["global_key_1", "global_key_2"]}.

Data row IDs

The data_row_ids filter allows you to export only the data rows with the specified IDs within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"data_row_ids": ["data_row_id_1", "data_row_id_2"]}.

Streamable exports

Streamable exports (compatible with SDK versions 3.56 and above) allow you to get real-time data flow and updates using any of the following streamable export methods:

data_row.export()
dataset.export()
model_run.export()
project.export()
slice.export()

The return type of these methods is an object of the class ExportTask. This class serves as a wrapper around a Task Objects. Because of this relationship, most of the features present in the Task class are also available in the ExportTask class. ExportTask supports the following methods and properties from Task:

uid
deleted
wait_till_done
completion_percentage
created_at
name
status
type
updated_at
get_task
organization
created_by

Creating an ExportTask instance

An instance of an ExportTask can be obtained via the export() method on the classes mentioned above, or by executing the following:

export_task = lb.ExportTask.get_task(client, task_id)
# where `task_id` has to be of type `ExportTask`
export_task.wait_till_done()

Checking for results and errors

To check if a task has a result/errors, the following methods can be executed:

if not export_task.has_result():
  print("no results")

if export_task.has_errors():
print("there are errors")

# These method will raise an ExportTask.TaskNotReadyExceptionexception if the task is neither in a COMPLETE or FAILED state.

Streaming results

To stream the results of exported data rows:

# Start export from project
export_task = project.export()
export_task.wait_till_done()

# Stream the export using a callback function
def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list
export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

Simplified usage

For fine-grained control over the streaming process, you can use a for loop to iterate through the converted items in the stream. This allows you to implement custom streaming logic, process partial results, or apply additional filtering.

export_task = project.export()
export_task.wait_till_done()

for data_row in export_task.get_buffered_stream():
print(data_row.json)

Start streaming at an offset or line

You can define a particular offset to initiate streaming. In the given example, the stream will start from offset 25,548.

export_task.get_buffered_stream().with_offset(25548).start(stream_handler=json_stream_handler)

Note:

Selecting a random offset might result in positioning within the middle of a JSON string, and this behavior is entirely acceptable. The impact of this choice will become apparent in the output as soon as the streaming starts.

Likewise, a specific line can be specified. In the following example, the stream will skip the first 348 lines and start with the 349th line, where a single JSON string is considered a line.

export_task.get_buffered_stream().with_line(348).start(stream_handler=json_stream_handler)

Note:

offsets and lines are indexed starting from 0, thus with_line(3) will start streaming from the 4th line.

The offset within with_offset() cannot exceed the total size, and line in with_line() cannot exceed the total number of lines returned by these methods; otherwise, a ValueError exception will be raised.

Print output size

ExportTask has two methods to output the total size of the exported file and the total number of lines it contains:

total_size = export_task.get_total_file_size(lb.StreamType.RESULT)
total_lines = export_task.get_total_lines(lb.StreamType.ERRORS)

Save export results and log errors

You can store export results in a JSON file and log any errors for monitoring or further processing, like the following example:

    export_task = project.export()
    export_task.wait_till_done()

    # Retrieve and save the export data
    export_data = list(export_task.result)
    with open(EXPORT_FILE_PATH, "w") as file:
        json.dump(export_data, file, indent=4)
    print(f"Exported data saved to {EXPORT_FILE_PATH}")

except Exception as e: # Log the error with a timestamp for monitoring
error_message = f"Error during export: {str(e)}"
logging.error(error_message)
print("An error occurred. Check the log file for details.")

Cancel export tasks

You can cancel an ongoing export task before it completes, like the following example:

# Cancel the task before it completes
success = client.cancel_task(export_task.uid)
assert success is True

# Verify the task was cancelled
cancelled_task = client.get_task_by_id(export_task.uid)
assert cancelled_task.status in ["CANCELING", "CANCELED"]

Export data rows from a project

When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history. Then, when you export from a project, you may choose to include or exclude certain attributes in your export. See the table at the top of this page to find the JSON export formats for each data type.

Export from a project

# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.

filters= {
"last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
"workflow_status": "InReview",
"batch_ids": ["batch_id_1", "batch_id_2"],
"data_row_ids": ["data_row_id_1", "data_row_id_2"],
"global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Stream the export using a callback function

def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list

export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export data rows from Catalog

You can export data rows and all their information from a Dataset or a Catalog Slice. When exporting from Catalog, you can include information about a data row from all projects and model runs to which it belongs. Specifically, you can export the labels from multiple projects and/or the predictions from multiple model runs for the selected data rows. You can use the all_projects and all_model_runs parameters to get information from all projects and model runs attached to your data row. As shown below, the project_ids and model_run_ids parameters accept a list of IDs. See the table at the top of this page to find the JSON export formats for each data type.

Export from a dataset

export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "all_projects": True,
  "all_model_runs": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

dataset = client.get_dataset("<dataset_id>")
export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Stream the export using a callback function
def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
  print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list
export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export a list of selected data rows from a dataset

export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "all_projects": True,
  "all_model_runs": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.

filters= {
"data_row_ids": ["data_row_id_1", "data_row_id_2"],
"global_keys": ["global_key_1", "global_key_2"]
}

export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Stream the export using a callback function

def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list

export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export from a slice

# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "all_projects": True,
  "all_model_runs": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

catalog_slice = client.get_catalog_slice("<slice_id>")
export_task = catalog_slice.export(params=export_params)
export_task.wait_till_done()

# Stream the export using a callback function
def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
  print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list
export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export your entire Catalog

# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "embeddings": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "all_projects": True,
  "all_model_runs": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

catalog = client.get_catalog()
export_task = catalog.export(params=export_params)
export_task.wait_till_done()

# Stream the export using a callback function

def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list

export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export data rows from a model run

See the table at the top of this page to find the JSON export formats for each data type.

# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "interpolated_frames": True,
  "predictions": True
}

model_run = client.get_model_run("<model_run_id>")
export_task = model_run.export(params=export_params)
export_task.wait_till_done()

# Stream the export using a callback function
def json_stream_handler(output: labelbox.BufferedJsonConverterOutput):
  print(output.json)

export_task.get_buffered_stream(stream_type=labelbox.StreamType.RESULT).start(stream_handler=json_stream_handler)

# Collect all exported data into a list
export_json = [data_row.json for data_row in export_task.get_buffered_stream()]

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Open in Colab

GitHub

​export_v2() is deprecated

​Export specifications

​Required & optional fields

​Optional parameters and filters

​Parameters

​Filters

​Last activity at

​Label created at

​Workflow status

​Batch IDs

​Global keys

​Data row IDs

​Streamable exports

​Creating an ExportTask instance

​Checking for results and errors

​Streaming results

​Simplified usage

​Start streaming at an offset or line

​Note:

​Note:

​Print output size

​Save export results and log errors

​Cancel export tasks

​Export data rows from a project

​Export from a project

​Export data rows from Catalog

​Export from a dataset

​Export a list of selected data rows from a dataset

​Export from a slice

​Export your entire Catalog

​Export data rows from a model run

export_v2() is deprecated

Export specifications

Required & optional fields

Optional parameters and filters

Parameters

Filters

Last activity at

Label created at

Workflow status

Batch IDs

Global keys

Data row IDs

Streamable exports

Creating an ExportTask instance

Checking for results and errors

Streaming results

Simplified usage

Start streaming at an offset or line

Note:

Note:

Print output size

Save export results and log errors

Cancel export tasks

Export data rows from a project

Export from a project

Export data rows from Catalog

Export from a dataset

Export a list of selected data rows from a dataset

Export from a slice

Export your entire Catalog

Export data rows from a model run