How to export data, with examples for each type of export along with details on optional parameters and filters.
export()
, a scalable and efficient method that allows streaming of unlimited data rows while providing a unique task object for tracking progress.
export_v2()
method has been deprecated and will be removed in version 7.0 of the SDK. If you’re currently using export_v2()
, we strongly encourage you to switch to export()
for its enhanced streamable implementation.Data type | Annotation export formats | Project export | Model run export |
---|---|---|---|
Image | See export formats | See sample | See sample |
Video | See export formats | See sample | See sample |
Text | See export formats | See sample | See sample |
Geospatial | See export formats | See sample | See sample |
Documents | See export formats | See sample | Not supported yet |
Audio | See export formats | See sample | Not supported yet |
Conversational text | See export formats | See sample | Not supported yet |
HTML | See export formats | See sample | Not supported yet |
Field | Description | Project export | Model run export | Catalog export |
---|---|---|---|---|
data_row | Contains the basic information of the data row: - id - row_data - global_key - data_row_details (optional, see below) | Always | Always | Always |
data_row_details | Contains additional details of the data row: - dataset_id - dataset_name - created_at - updated_at - last_activity_at - created_by | Optional | Optional | Optional |
media_attributes | See Media attributes | Optional | Optional | Optional |
attachments | See Attachments | Optional | Optional | Optional |
metadata_fields | See Metadata | Optional | Optional | Optional |
embeddings | Contains a list of dictionaries with precomputed and custom embeddings | Optional | Optional | Optional |
projects | Contains the ID of the project in which the data row was labeled. | Always | n/a | Optional |
<project_id> | Contains the following sections, which are expanded on below: - labels - project_details | Always | n/a | Optional |
labels | Contains a list of labels attached to this data row: - label_kind - version - id - annotations | Always | Always | Optional |
label_details | Contains details of each specific label: - created_at - updated_at - created_by - reviews | Optional | n/a | Optional |
performance_details | Contains label-specific performance details: - seconds_to_create - seconds_to_review - skipped - performance_details_v2 , which contains: - seconds_to_create - seconds_to_review - seconds_to_rework - seconds_total | Optional | n/a | Optional |
project_details | Contains project-specific information about this data row: - ontology_id - task_id - task_name - batch_id - batch_name - workflow_status - priority - selected_label_id - consensus_expected_label_count - workflow_history | Optional | n/a | Optional |
project_tags | See Project tags | Always | n/a | n/a |
experiments | Contains the ID of the model experiment(s) in which the data row was stored. | n/a | Always | Optional |
<model_experiment_id> | Contains the following sections, which are expanded on below: - name - runs | n/a | Always | Optional |
name | Name of the model. | n/a | Always | Optional |
runs | Contains the ID of the model run(s) in which the data row was stored. | n/a | Always | Optional |
<model_run_id> | Contains the following sections, which are expanded on below: - name - annotation_group_id - labels - predictions - split | n/a | Always | Optional |
name | Name of the model run. | n/a | Always | Optional |
run_data_row_id | Model run data row ID, similar to data_row_id but in a model run’s context. | n/a | Always | Optional |
labels | Contains a list of the ground truth labels attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Always | Always |
predictions | Contains a list of predictions attached to this data row and included in this model run: - label_kind - version - id - annotations | n/a | Optional | Optional |
split | Contains the split the data row belongs to (either Training , Validation , or Test ). | n/a | Optional | Optional |
Parameter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) |
---|---|---|---|---|
attachments | ✔ | ✔ | ✔ | ✔ |
metadata_fields | ✔ | ✔ | ✔ | ✔ |
embeddings | ✔ | ✔ | ✔ | ✔ |
data_row_details | ✔ | ✔ | ✔ | ✔ |
project_details | ✔ | - | ✔ | ✔ |
label_details | ✔ | - | ✔ | ✔ |
performance_details | ✔ | - | ✔ | ✔ |
interpolated_frames | ✔ | ✔ | ✔ | ✔ |
predictions | - | ✔ | - | - |
model_run_details | - | ✔ | - | - |
model_run_ids | - | - | ✔ | ✔ |
project_ids | - | - | ✔ | ✔ |
all_projects | - | - | ✔ | ✔ |
all_model_runs | - | - | ✔ | ✔ |
project_ids
, model_run_ids
, all_projects
, and all_model_runs
parameters, see Export data rows from Catalog below.
To learn how to apply these filters, see the below sections specific to each export type.
AND
operator logic.
Filter | Project export | Model run export | Dataset export (Catalog) | Slice export (Catalog) |
---|---|---|---|---|
last_activity_at | ✔ | - | ✔ | - |
label_created_at | ✔ | - | ✔ | - |
workflow_status | ✔ | - | - | - |
batch_ids | ✔ | - | - | - |
global_keys | ✔ | - | ✔ | - |
data_row_ids | ✔ | - | ✔ | - |
last_activity_at
and label_created_at
filters take the structure of [<start_date>, <end_date>]
and can have the following formats:
YYYY-MM-DD
(this is an alias of YYYY-MM-DD 00:00:00
)YYYY-MM-DD hh:mm:ss
YYYY-MM-DDThh:mm:ss±hhmm
(ISO 8601)None
last_activity_at
filter captures only the data rows where the following changes have been made in the specified time frame:
rowData
), external ID (externalId
), or global key (globalKey
)last_activity_at
when such changes occur in any project containing the data rows.
label_created_at
filter captures only the data rows where labels have been created in the specified time frame.
workflow_status
filter allows you to export only the data rows in a specific status of a project’s workflow. The filter accepts the following values:
ToLabel
InReview
InRework
Done
filters = {"workflow_status": "InReview"}
.
batch_ids
filter allows you to export only the data rows in a specific batch or batches. This filter accepts a list of batch IDs. For example, filters = {"batch_ids": ["batch_id_1", "batch_id_2"]}
.
To get the batches sent to a project and their associated information, you can use the project.batches()
method. For more information, see Get the batches.
global_keys
filter allows you to export only the data rows with the specified global keys within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"global_keys": ["global_key_1", "global_key_2"]}
.
data_row_ids
filter allows you to export only the data rows with the specified IDs within a project or dataset. This filter accepts a list containing up to 2,000 values. For example, filters = {"data_row_ids": ["data_row_id_1", "data_row_id_2"]}
.
data_row.export()
dataset.export()
model_run.export()
project.export()
slice.export()
ExportTask
. This class serves as a wrapper around a Task Objects. Because of this relationship, most of the features present in the Task
class are also available in the ExportTask
class.
ExportTask
supports the following methods and properties from Task
:
ExportTask
can be obtained via the export()
method on the classes mentioned above, or by executing the following:
with_line(3)
will start streaming from the 4th line.with_offset()
cannot exceed the total size, and line in with_line()
cannot exceed the total number of lines returned by these methods; otherwise, a ValueError exception
will be raised.
ExportTask
has two methods to output the total size of the exported file and the total number of lines it contains:
all_projects
and all_model_runs
parameters to get information from all projects and model runs attached to your data row.
As shown below, the project_ids
and model_run_ids
parameters accept a list of IDs.
See the table at the top of this page to find the JSON export formats for each data type.