Export data from Catalog (beta)
Export data rows from Catalog
Export v2 gives you flexibility and control to retrieve the most valuable information from your Catalog. You can now select and export selections of data rows interest based on predefined or new parameters. Now, you can include information specific to multiple projects or model runs in an export directly from Catalog. We've also simplified and standardized the annotation formats.
We are collecting feedback during this beta period, thus the final format of the export JSON is subject to minor changes until the end of June 2023.
Limit
Currently, you can export up to 10k data rows from Catalog at a time.
Export fields
Nested classifications and frame-based classifications
The nested classifications will maintain their nested structure in export v2; video and DICOM exports will include all frame-based annotations directly in the export file.
Each data row in the NDJSON file can have the following information:
Field | Description | Included |
---|---|---|
data_row | Contains the basic information of the data row: - id - row_data - global_key - data_row_details (optional, see below) | Always |
details (data row) | Contains additional details of the data row: - dataset_id - created_at - updated_at - created_by | Optional |
media_attributes | See Media attributes | Optional |
attachments | See Attachments | Optional |
metadata_fields | See Metadata | Optional |
projects | Contains the ID of the project in which the data row was labeled. | Always |
<project_id> | Contains the following sections, which are expanded on below: - labels - project_details | Always |
labels | Contains a list of labels attached to this data row: - label_kind - version - id - label_details (optional, see below)- performance_details (optional, see below)- annotations | Always |
label_details | Contains details of each specific label: - created_at - updated_at - created_by - reviews | Optional |
performance_details | Contains label-specific performance details: - seconds_to_create - seconds_to_review - skipped - benchmark_reference_label - benchmark_score - consensus_score - consensus_label_count - consensus_labels | Optional |
project_details | Contains project-specific information about this data row: - ontology_id - batch_id - priority - consensus_expected_label_count - workflow_history - task_id - task_name | Optional |
models | Contains the ID of the model in which the data row was stored. | Always |
<model_id> | Contains the following sections, which are expanded on below: - model_name - model_runs | Always |
model_name | Name of the model | Always |
model_runs | Contains the ID of the model run in which the data row was stored. | Always |
<model_run_id> | Contains the following sections, which are expanded on below: - model_run_name - annotation_group_id - labels - predictions | Always |
model_run_name | Name of the model run | Always |
annotation_group_id | Model Run Data Row id, similar to data_row_id but in a Model Run's context | Always |
labels | Contains a list of labels attached to this data row versioned in this model run: - label_kind - version - id - annotations | Always |
predictions | Contains a list of predictions attached to this data row: - label_kind - version - id - annotations | Optional |
Export v2 examples
{
"data_row": {
"id": "<id>",
"external_id": "<id>",
"global_key": "<id>",
"row_data": "<url>",
"details": {
"dataset_id": "<id>",
"created_at": "<time>",
"updated_at": "<time>",
"created_by": "<email>",
}
},
"media_attributes": {},
"attachments": [],
"metadata_fields": [],
"projects": {
"<project_id>": {
"project_name": "",
"labels": [
{
"label_kind": "",
"version": "1.0.0",
"id": "<id>",
"label_details": {
"created_at": "<time>",
"updated_at": "<time>",
"created_by": "<email>",
"reviews": []
},
"performance_details": {
"seconds_to_create": 0,
"seconds_to_review": 0,
"skipped": false
},
"annotations": {
"objects": [],
"classifications": [],
"relationships": []
}
}
],
"project_details": {
"ontology_id": "<id>",
"batch_id": "<id>",
"priority": 5,
"consensus_expected_label_count": 1,
"workflow_history": []
}
}
},
"models": {
"<model_id>": {
"model_name": "",
"model_runs": {
"<model_run_id>": {
"model_run_name": "",
"annotation_group_id": "<id>",
"labels": [
{
"label_kind": "",
"version": "1.0.0",
"id": "<id>",
"annotations": {
"objects": [],
"classifications": [],
"relationships": []
}
}
],
"predictions": [
{
"label_kind": "",
"version": "1.0.0",
"id": "<id>",
"annotations": {
"objects": [],
"classifications": [],
"relationships": []
}
}
]
}
}
}
}
}
Export data rows from Catalog
Apply and combine filters to query data rows based on similarity, natural language search, annotations, metadata, and more. Then, export the resulting data rows along with information associated with their appearance in any project or model run.
Catalog export v2 limit
You can export up to 10,000 data rows at once from the Catalog.
Note that excluding optional fields from your export will make the process faster and the export file smaller.
To export data rows and their information associated with projects and model runs, follow these general steps:
- Navigate to Catalog.
- Narrow down your query to 10,000 data rows or less.
- Open the dropdown under the number of data rows in the query and select Export data v2 (beta).
- Select the optional fields to include and begin the export.
Selecting export fields to include
After determining the data rows to export, a prompt to select optional fields will appear. There are a number of optional fields that provide additional details specific to data rows, labels, and projects. More details on these fields can be found in the Export v2 Glossary.
Additionally, users will have the option to export information on the selected data rows specific to multiple projects and model runs.
Export labels from projects
If this checkbox is selected, you will be prompted to select from the dropdown one or more projects. Only projects in which one or more of the selected data rows have been labeled will appear in the dropdown.
For the selected projects, all labels made in the project will be included in the NDJSON for each respective data row.
Export labels and predictions from model runs
If this checkbox is selected, you will be prompted to select from the dropdown one or more model runs. Only model runs in which one or more of the selected data rows appear will appear in the dropdown.
For the selected model runs, all labels and predictions made in the model run will be included in the NDJSON for each respective data row.
Option 1: Export from multiple datasets
- Select All datasets in the top-left corner.
- Apply a filter or combination of filters.
- Click on All datasets (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.
Option 2: Export from one dataset
- Select a dataset from the list of datasets on the left side menu.
- Apply a filter or combination of filters, if desired.
- Click on (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.
Option 3: Export from a slice
- Select Slices in the toggle on the left side menu.
- Select an existing slice.
- Modify or compliment the filters that comprise the slice, if desired.
- Click on (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.
Option 4: Export specific data rows
- Hand-select data rows to export using the checkboxes in the top-left corner of the thumbnail of each data row.
- Click on # selected in the top-right corner, then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.
Export from Catalog (Python SDK)
Support for export v2 from Catalog via the Python SDK is currently in development.
Updated 5 days ago