export(), a scalable and efficient method that allows streaming of unlimited data rows while providing a unique task object for tracking progress.
export_v2() is deprecated
The previously availableexport_v2() method has been deprecated and will be removed in version 7.0 of the SDK. If you’re currently using export_v2(), we strongly encourage you to switch to export() for its enhanced streamable implementation.Export specifications
| Data type | Annotation export formats | Project export | Model run export | 
|---|---|---|---|
| Image | See export formats | See sample | See sample | 
| Video | See export formats | See sample | See sample | 
| Text | See export formats | See sample | See sample | 
| Geospatial | See export formats | See sample | See sample | 
| Documents | See export formats | See sample | Not supported yet | 
| Audio | See export formats | See sample | Not supported yet | 
| Conversational text | See export formats | See sample | Not supported yet | 
| HTML | See export formats | See sample | Not supported yet | 
Required & optional fields
Below is the complete list of required and optional fields supported for exports.| Field | Description | Project export | Model run export | Catalog export | 
|---|---|---|---|---|
data_row | Contains the basic information of the data row: - id - row_data - global_key - data_row_details (optional, see below) | Always | Always | Always | 
data_row_details | Contains additional details of the data row: - dataset_id - dataset_name - created_at - updated_at - last_activity_at - created_by | Optional | Optional | Optional | 
media_attributes | See Media attributes | Optional | Optional | Optional | 
attachments | See Attachments | Optional | Optional | Optional | 
metadata_fields | See Metadata | Optional | Optional | Optional | 
embeddings | Contains a list of dictionaries with precomputed and custom embeddings | Optional | Optional | Optional | 
projects | Contains the ID of the project in which the data row was labeled. | Always | n/a | Optional | 
<project_id> | Contains the following sections, which are expanded on below: - labels - project_details | Always | n/a | Optional | 
labels | Contains a list of labels attached to this data row: - label_kind - version - id - annotations | Always | Always | Optional | 
label_details | Contains details of each specific label: - created_at - updated_at - created_by - reviews | Optional | n/a | Optional | 
performance_details | Contains label-specific performance details: - seconds_to_create - seconds_to_review - skipped - performance_details_v2, which contains: - seconds_to_create - seconds_to_review - seconds_to_rework - seconds_total | Optional | n/a | Optional | 
project_details | Contains project-specific information about this data row: - ontology_id - task_id - task_name - batch_id - batch_name - workflow_status - priority - selected_label_id - consensus_expected_label_count - workflow_history | Optional | n/a | Optional | 
project_tags | See Project tags | Always | n/a | n/a | 
experiments | Contains the ID of the model experiment(s) in which the data row was stored. | n/a | Always | Optional | 
<model_experiment_id> | Contains the following sections, which are expanded on below: - name - runs | n/a | Always | Optional | 
name | Name of the model. | n/a | Always | Optional | 
runs | Contains the ID of the model run(s) in which the data row was stored. | n/a | Always | Optional | 
<model_run_id> | Contains the following sections, which are expanded on below: - name - annotation_group_id - labels - predictions - split | n/a | Always | Optional | 
name | Name of the model run. | n/a | Always | Optional | 
run_data_row_id | Model run data row ID, similar to data_row_id but in a model run’s context. | n/a | Always | Optional | 
labels | Contains a list of the ground truth labels attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Always | Always | 
predictions | Contains a list of predictions attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Optional | Optional | 
split | Contains the split the data row belongs to (either Training, Validation, or Test). | n/a | Optional | Optional | 
Optional parameters and filters
Parameters
When you export data rows from a project, a model run, or Catalog, you can set parameters to include optional fields in the exports. The table below expresses the parameters available for each type of export.| Parameter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) | 
|---|---|---|---|---|
attachments | ✔ | ✔ | ✔ | ✔ | 
metadata_fields | ✔ | ✔ | ✔ | ✔ | 
embeddings | ✔ | ✔ | ✔ | ✔ | 
data_row_details | ✔ | ✔ | ✔ | ✔ | 
project_details | ✔ | - | ✔ | ✔ | 
label_details | ✔ | - | ✔ | ✔ | 
performance_details | ✔ | - | ✔ | ✔ | 
interpolated_frames | ✔ | ✔ | ✔ | ✔ | 
predictions | - | ✔ | - | - | 
model_run_details | - | ✔ | - | - | 
model_run_ids | - | - | ✔ | ✔ | 
project_ids | - | - | ✔ | ✔ | 
all_projects | - | - | ✔ | ✔ | 
all_model_runs | - | - | ✔ | ✔ | 
project_ids, model_run_ids, all_projects, and all_model_runs parameters, see Export data rows from Catalog below.
To learn how to apply these filters, see the below sections specific to each export type.
Filters
You can use filters to select a subset of data rows to export. The table below contains the filters supported for each export type. You can apply multiple supported filters to the same export. Combinations of filters applyAND operator logic.
| Filter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) | 
|---|---|---|---|---|
last_activity_at | ✔ | - | ✔ | - | 
label_created_at | ✔ | - | ✔ | - | 
workflow_status | ✔ | - | - | - | 
batch_ids | ✔ | - | - | - | 
global_keys | ✔ | - | ✔ | - | 
data_row_ids | ✔ | - | ✔ | - | 
last_activity_at and label_created_at filters take the structure of [<start_date>, <end_date>] and can have the following formats:
YYYY-MM-DD(this is an alias ofYYYY-MM-DD 00:00:00)YYYY-MM-DD hh:mm:ssYYYY-MM-DDThh:mm:ss±hhmm(ISO 8601)None
Last activity at
Thelast_activity_at filter captures only the data rows where the following changes have been made in the specified time frame:
- Changes update a data row’s data (
rowData), external ID (externalId), or global key (globalKey) - Changes are made to annotations, attachments, embeddings, or metadata
 - Data rows are added to batches
 - Data row labeling tasks change
 - Labels, reviews, comments, or issues are added to a project containing the data row
 
last_activity_at when such changes occur in any project containing the data rows.
Label created at
Thelabel_created_at filter captures only the data rows where labels have been created in the specified time frame.
Workflow status
Theworkflow_status filter allows you to export only the data rows in a specific status of a project’s workflow. The filter accepts the following values:
ToLabelInReviewInReworkDone
filters = {"workflow_status": "InReview"}.
Batch IDs
Thebatch_ids filter allows you to export only the data rows in a specific batch or batches. This filter accepts a list of batch IDs. For example, filters = {"batch_ids": ["batch_id_1", "batch_id_2"]}.
To get the batches sent to a project and their associated information, you can use the project.batches() method. For more information, see Get the batches.
Global keys
Theglobal_keys filter allows you to export only the data rows with the specified global keys within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"global_keys": ["global_key_1", "global_key_2"]}.
Data row IDs
Thedata_row_ids filter allows you to export only the data rows with the specified IDs within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"data_row_ids": ["data_row_id_1", "data_row_id_2"]}.
Streamable exports
Streamable exports (compatible with SDK versions 3.56 and above) allow you to get real-time data flow and updates using any of the following streamable export methods:data_row.export()dataset.export()model_run.export()project.export()slice.export()
ExportTask. This class serves as a wrapper around a Task Objects. Because of this relationship, most of the features present in the Task class are also available in the ExportTask class.
ExportTask supports the following methods and properties from Task:
- uid
 - deleted
 - wait_till_done
 - completion_percentage
 - created_at
 - name
 - status
 - type
 - updated_at
 - get_task
 - organization
 - created_by
 
Creating an ExportTask instance
An instance of anExportTask can be obtained via the export() method on the classes mentioned above, or by executing the following:
Checking for results and errors
To check if a task has a result/errors, the following methods can be executed:Streaming results
To stream the results of exported data rows:Simplified usage
For fine-grained control over the streaming process, you can use a for loop to iterate through the converted items in the stream. This allows you to implement custom streaming logic, process partial results, or apply additional filtering.Start streaming at an offset or line
You can define a particular offset to initiate streaming. In the given example, the stream will start from offset 25,548.Note:
Selecting a random offset might result in positioning within the middle of a JSON string, and this behavior is entirely acceptable. The impact of this choice will become apparent in the output as soon as the streaming starts.Note:
offsets and lines are indexed starting from 0, thuswith_line(3) will start streaming from the 4th line.with_offset() cannot exceed the total size, and line in with_line() cannot exceed the total number of lines returned by these methods; otherwise, a ValueError exception will be raised.
Print output size
ExportTask has two methods to output the total size of the exported file and the total number of lines it contains:
Save export results and log errors
You can store export results in a JSON file and log any errors for monitoring or further processing, like the following example:Cancel export tasks
You can cancel an ongoing export task before it completes, like the following example:Export data rows from a project
When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history. Then, when you export from a project, you may choose to include or exclude certain attributes in your export. See the table at the top of this page to find the JSON export formats for each data type.Export from a project
Export data rows from Catalog
You can export data rows and all their information from a Dataset or a Catalog Slice. When exporting from Catalog, you can include information about a data row from all projects and model runs to which it belongs. Specifically, you can export the labels from multiple projects and/or the predictions from multiple model runs for the selected data rows. You can use theall_projects and all_model_runs parameters to get information from all projects and model runs attached to your data row.
As shown below, the project_ids and model_run_ids parameters accept a list of IDs.
See the table at the top of this page to find the JSON export formats for each data type.