Export overview

How to export data, with examples for each type of export along with details on optional parameters and filters.

Export specifications

Data typeAnnotation export formatsProject exportModel run export
ImageSee export formatsSee sampleSee sample
VideoSee export formatsSee sampleSee sample
TextSee export formatsSee sampleSee sample
GeospatialSee export formatsSee sampleSee sample
DocumentsSee export formatsSee sampleNot supported yet
AudioSee export formatsSee sampleNot supported yet
Conversational textSee export formatsSee sampleNot supported yet
DICOMSee export formatsSee sampleNot supported yet
HTMLSee export formatsSee sampleNot supported yet

There are three ways to export data from Labelbox: export from Catalog, export from a labeling project, and export from a model run.

Required & optional fields

Below is the complete list of required and optional fields supported for exports.

FieldDescriptionProject exportModel run exportCatalog export
data_rowContains the basic information of the data row:
- id
- row_data
- global_key
- data_row_details (optional, see below)
AlwaysAlwaysAlways
data_row_detailsContains additional details of the data row:
- dataset_id
- dataset_name
- created_at
- updated_at
- last_activity_at
- created_by
OptionalOptionalOptional
media_attributesSee Media attributesOptionalOptionalOptional
attachmentsSee AttachmentsOptionalOptionalOptional
metadata_fieldsSee MetadataOptionalOptionalOptional
projectsContains the ID of the project in which the data row was labeled.Alwaysn/aOptional
<project_id>Contains the following sections, which are expanded on below:
- labels
- project_details
Alwaysn/aOptional
labelsContains a list of labels attached to this data row:
- label_kind
- version
- id
- annotations
AlwaysAlwaysOptional
label_detailsContains details of each specific label:
- created_at
- updated_at
- created_by
- reviews
Optionaln/aOptional
performance_detailsContains label-specific performance details:
- seconds_to_create
- seconds_to_review
- skipped
- benchmark_reference_label
- benchmark_score
- consensus_score
- consensus_label_count
- consensus_labels
Optionaln/aOptional
project_detailsContains project-specific information about this data row:
- ontology_id
- task_id
- task_name
- batch_id
- batch_name
- workflow_status
- priority
- selected_label_id
- consensus_expected_label_count
- workflow_history
Optionaln/aOptional
experimentsContains the ID of the model experiment(s) in which the data row was stored.n/aAlwaysOptional
<model_experiment_id>Contains the following sections, which are expanded on below:
- name
- runs
n/aAlwaysOptional
nameName of the model.n/aAlwaysOptional
runsContains the ID of the model run(s) in which the data row was stored.n/aAlwaysOptional
<model_run_id>Contains the following sections, which are expanded on below:
- name
- annotation_group_id
- labels
- predictions
- split
n/aAlwaysOptional
nameName of the model run.n/aAlwaysOptional
run_data_row_idModel run data row ID, similar to data_row_id but in a model run's context.n/aAlwaysOptional
labelsContains a list of the ground truth labels attached to this data row and included in this model run:
- label_kind
- version
- id
- annotations
n/aAlwaysAlways
predictionsContains a list of predictions attached to this data row and included in this model run:
- label_kind
- version
- id
- annotations
n/aOptionalOptional
splitContains the split the data row belongs to (either Training, Validation, or Test).n/aOptionalOptional

Optional parameters and filters

Parameters

When you export data rows from a project, a model run, or Catalog, you can set parameters to include optional fields in the exports. The table below expresses the parameters that are available for each type of export.

ParameterProject exportModel run exportDataset export (Catalog)Slice export (Catalog)
attachments
metadata_fields
data_row_details
project_details-
label_details-
performance_details-
interpolated_frames
predictions---
project_ids--
model_run_ids--
model_run_details---

For explanations of each field and subfield, see Export v2 glossary. For a detailed explanation of the project_ids and model_run_ids parameters, see Export data rows from Catalog below.

To learn how to apply these filters, see the below sections specific to each export type.

Filters

You can use filters to select a subset of data rows to export. The table below contains the filters that are supported for each type of export. You can apply multiple supported filters to the same export. Combinations of filters apply AND operator logic.

FilterProject exportModel run exportDataset export (Catalog)Slice export (Catalog)
last_activity_at--
label_created_at--
workflow_status---
batch_ids---
global_keys--
data_row_ids--

The last_activity_at and label_created_at filters take the structure of [<start_date>, <end_date>] and can have the following formats:

  • YYYY-MM-DD(this is an alias of YYYY-MM-DD 00:00:00)
  • YYYY-MM-DD hh:mm:ss
  • YYYY-MM-DDThh:mm:ss±hhmm (ISO 8601)
  • None

The ISO 8601 format allows you to specify the timezone, while the other two formats assume the timezone from the user's workspace settings.

Last activity at

The last_activity_at filter captures only the data rows where the following changes have been made in the specified time frame:

  • Changes update a data row's data (rowData), external ID (externalId), or global key (globalKey)
  • Changes are made to annotations, attachments, embeddings, or metadata
  • Data rows are added to batches
  • Data row labeling tasks change
  • Labels, reviews, comments, or issues are added to a project containing the data row

Data rows in multiple projects update last_activity_at when such changes occur in any project containing the data rows.

Label created at

The label_created_at filter captures only the data rows where labels have been submitted in the specified time frame.

Workflow status

The workflow_status filter allows you to export only the data rows in a specific status of a project's workflow. The filter accepts the following values:

  • ToLabel
  • InReview
  • InRework
  • Done

This filter only accepts one value. For example, filters = {"workflow_status": "InReview"}.

Batch IDs

The batch_ids filter allows you to export only the data rows in a specific batch or batches. This filter accepts a list of batch IDs. For example, filters = {"batch_ids": ["batch_id_1", "batch_id_2"]}.

To get the batches sent to a project and their associated information, you can use the project.batches() method. For more information, see Get the batches.

Global keys

The global_keys filter allows you to export only the data rows with the specified global keys within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"global_keys": ["global_key_1", "global_key_2"]}.

Data row IDs

The data_row_ids filter allows you to export only the data rows with the specified IDs within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"data_row_ids": ["data_row_id_1", "data_row_id_2"]}.

See the below sections specific to each export type for contextual examples of how to apply these filters.

Export V1 status

Export V1 is currently deprecated in favor of export V2. You should update your SDK workflows accordingly.

Export V2 methods

📘

Export via UI instructions

To learn how to export data rows from a project via the app UI, visit Export labels from project.

Labelbox has two methods of exporting data using exports V2:export_v2() and export(), the latest method uses the same parameters and filters ofexports_v2() but it allows the streaming of unlimited data rows and outputs a different task object.

Streamable exports

Compatible with SDK versions 3.56 and above.

Streamable exports allow users to get real-time data flow and updates generated by the export V2 Labelbox API. The following streamable methods were added to all methods that support export V2.

  1. data_row.export()
  2. dataset.export()
  3. model_run.export()
  4. project.export()
  5. slice.export()

The return type of these methods is an object of the class ExportTask. This class serves as a wrapper around Task, which is the return type of exports_v2(). Because of this relationship, most of the features present in the Task class are also available in the ExportTask class.

Methods and properties from Task that are supported on ExportTask:

  • uid
  • deleted
  • wait_till_done
  • completion_percentage
  • created_at
  • name
  • status
  • type
  • updated_at
  • get_task
  • organization
  • created_by

Creating an ExportTask instance

An instance of an ExportTask can be obtained via the export() method on the classes mentioned above, or by executing the following:

export_task = lb.ExportTask.get_task(client, task_id)
# where `task_id` has to be of type `ExportTask`
export_task.wait_till_done()

Checking for results and errors

To check if a task has a result/errors, the following methods can be executed:

if not export_task.has_result():  
  print("no results")

if export_task.has_errors():  
  print("there are errors")  
# These method will raise an ExportTask.TaskNotReadyExceptionexception if the task is neither in a COMPLETE or FAILED state.

Streaming results and errors

To stream the results/errors from an ExportTask the following methods can be executed:


# In order to get the actual output from the stream, a callback function needs to be provided:
def json_stream_handler(output: lb.JsonConverter.Output):  
  print(output.json_str)

if export_task.has_result():
  export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)



# In order to get the actual output from the stream, a callback function needs to be provided:
def json_stream_handler(output: lb.JsonConverter.Output):  
  print(output.json_str)
  
if export_task.has_errors():
  export_task.get_stream(
    converter=lb.JsonConverter(), stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=json_stream_handler)

Write errors/results to a file

You can save the results/errors to a file by slightly modifying the callback function.

def file_stream_handler(output: lb.FileConverterOutput):
  print(
    f"offset: {output.current_offset}, progress: {output.bytes_written}/{output.total_size}, "
    f"path: {output.file_path.absolute().resolve()}"
  )

if export_task.has_result():
  export_task.get_stream(
    converter=lb.FileConverter(file_path=output_file_path)
  ).start(stream_handler=file_stream_handler)

Converters

By default, get_stream() uses a JsonConverter, unless a different converter is specified:

export_task.get_stream(converter=lb.FileConverter(file_path=output_file_path)).start()

JsonConverter and FileConverter are the only converters currently supported, and they take as an argument a file_path which will be used to write the output of the exported result/errors.

class JsonConverterOutput:
  current_offset: int
  current_line: int
  json_str: str
    
class FileConverterOutput:
    file_path: Path
    total_size: int
    total_lines: int
    current_offset: int
    current_line: int
    bytes_written: int
 

Both JsonConverterOutput and FileConverterOutput converters have current_offset and current_line fields, which reflect the current offset and line number of the output that is being streamed. See the section below Start streaming at an offset or line

Advanced usage

To provide a fine-grained control over the streaming process, start() can be omitted. This is useful for users who want to implement their own streaming logic, process partial results, or apply additional filtering. To do this, we can use a for loop to iterate through the converted items in the stream:

stream = export_task.get_stream()
for output in stream:
  output: lb.JsonConverterOutput = output
  print(output.json_str)

Start streaming at an offset or line

You can define a particular offset to initiate streaming. In the given example, the stream will start from offset 25,548.

export_task.get_stream().with_offset(25548).start(stream_handler=json_stream_handler)

Note: selecting a random offset might result in positioning within the middle of a JSON string, and this behavior is entirely acceptable. The impact of this choice will become apparent in the output as soon as the streaming starts.

Likewise, a specific line can be specified. In the following example, the stream will skip the first 348 lines and start with the 349th line, where a single JSON string is considered a line.

export_task.get_stream().with_line(348).start(stream_handler=json_stream_handler)

Note: offsets and lines are indexed starting from 0, thus with_line(3) will start streaming from the 4th line.

The offset within with_offset() cannot exceed the total size, and line in with_line() cannot exceed the total number of lines returned by these methods; otherwise, a ValueError exception will be raised.

Print output size

ExportTask has two methods to output the total size of the exported file and the total number of lines it contains:

total_size = export_task.get_total_file_size(lb.StreamType.RESULT)  
total_lines = export_task.get_total_lines(lb.StreamType.ERRORS)

Export data rows from a project

When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history. Then, when you export from a project, you may choose to include or exclude certain attributes in your export.

See the table at the top of this page to find the JSON export formats for each data type.

Export from a project:

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True
}

# You can set the range for last_activity_at and label_created_at. You can also set a list of data 
# row ids to export. 
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.

# Note: Combinations of filters apply AND logic.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", export_json)
# The return type of this method is an `ExportTask`, which is a wrapper of a`Task`
# Most of `Task` features are also present in `ExportTask`.
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "label_created_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "InReview",
  "batch_ids": ["batch_id_1", "batch_id_2"],
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}
# Set this variable to True to utilize this functionality 
client.enable_experimental = True

export_task = project.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

# Method 1
labels = project.label_generator()

## Alternatively you can specify date range to export desired labels from a project
labels = project.label_generator(start="2020-01-01", end="2020-01-02")

for label in labels:
  print(label.annotations)
  

# Method 2 

labels = project.export_labels(download=True, timeout_seconds=600)  

Export data rows from Catalog

You can export data rows and all their information from a Dataset or a Catalog Slice.

When exporting from Catalog, you can include information about a data row from all projects and model runs to which it belongs. Specifically, you can export the labels from multiple projects and/or the predictions from multiple model runs for the selected data rows.

As shown below, the project_ids and model_run_ids parameters accept a list of IDs.

See the table at the top of this page to find the JSON export formats for each data type.

Export from a dataset

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
  	"label_details": True,
    "performance_details": True,
    "project_ids": ["<project_id_1>", "<project_id_2>"],
    "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

# You can set the range for last_activity_at
# For context, last_activity_at captures the creation and modification of labels, metadata, status, comments and reviews.
# Note: This is an AND logic between the filters, so usually using one filter is sufficient.

filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

dataset = client.get_dataset("<dataset_id>")
export_task = dataset.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
  print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"] 
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"]
}

client.enable_experimental = True

dataset = client.get_dataset("<dataset_id>")
export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)


if export_task.has_errors():
  export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export a list of selected data rows from a dataset

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
  	"label_details": True,
    "performance_details": True,
    "project_ids": ["<project_id_1>", "<project_id_2>"],
    "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

# Put the selected data rows into a filter, then it will only export those data rows
filters= {
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

export_task = dataset.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()
if export_task.errors:
  print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"] 
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "data_row_ids": ["data_row_id_1", "data_row_id_2"],
  "global_keys": ["global_key_1", "global_key_2"]
}

client.enable_experimental = True

export_task = dataset.export(params=export_params, filters=filters)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)


if export_task.has_errors():
  export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export from a slice

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True,
    "project_details": True,
  	"label_details": True,
    "performance_details": True,
    "project_ids": ["<project_id_1>", "<project_id_2>"],
    "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

catalog_slice = client.get_catalog_slice("<slice_id>")
export_task = catalog_slice.export_v2(params=export_params)
export_task.wait_till_done()
if export_task.errors:
  print(export_task.errors)
export_json = export_task.result
print("results: ", export_json)
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True,
  "project_ids": ["<project_id_1>", "<project_id_2>"],
  "model_run_ids": ["<model_run_id_1>", "<model_run_id_2>"]
}

client.enable_experimental = True

catalog_slice = client.get_catalog_slice("<slice_id>")
export_task = catalog_slice.export(params=export_params)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))

Export data rows from a model run

See the table at the top of this page to find the JSON export formats for each data type.

# Set the export params to include/exclude certain fields. Make sure each of these fields are correctly grabbed 
export_params= {
    "attachments": True,
    "metadata_fields": True,
    "data_row_details": True
}

model_run = client.get_model_run("<model_run_id>")
export_task = model_run.export_v2(params=export_params)
export_task.wait_till_done()
print(export_task.errors)
export_json = export_task.result
# Set the export params to include/exclude certain fields.
export_params = {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "interpolated_frames": True,
  "predictions": True
}

client.enable_experimental = True

model_run = client.get_model_run("<model_run_id>")
export_task = model_run.export(params=export_params)
export_task.wait_till_done()

# Return a JSON output string from the export task results/errors one by one:

# Callback used for JSON Converter
def json_stream_handler(output: lb.JsonConverterOutput):
  print(output.json_str)

if export_task.has_errors():
  export_task.get_stream(
  converter=lb.JsonConverter(),
  stream_type=lb.StreamType.ERRORS
  ).start(stream_handler=lambda error: print(error))

if export_task.has_result():
  export_json = export_task.get_stream(
    converter=lb.JsonConverter(),
    stream_type=lb.StreamType.RESULT
  ).start(stream_handler=json_stream_handler)

print("file size: ", export_task.get_total_file_size(stream_type=lb.StreamType.RESULT))
print("line count: ", export_task.get_total_lines(stream_type=lb.StreamType.RESULT))
# Turn on the experimental mode of the SDK
client.enable_experimental=True

# If download=False, this returns the URLs of the data files associated with this ModelRun’s labels.
download = False
model_run.export_labels(download=download)
# If download=True, this instead returns the contents as NDJSON format.
download = True
model_run.export_labels(download=download)