Export data from Catalog (beta)

Export data rows from Catalog

Export v2 gives you flexibility and control to retrieve the most valuable information from your Catalog. You can now select and export selections of data rows interest based on predefined or new parameters. Now, you can include information specific to multiple projects or model runs in an export directly from Catalog. We've also simplified and standardized the annotation formats.

We are collecting feedback during this beta period, thus the final format of the export JSON is subject to minor changes until the end of June 2023.

📘

Limit

Currently, you can export up to 10k data rows from Catalog at a time.

Export fields

📘

Nested classifications and frame-based classifications

The nested classifications will maintain their nested structure in export v2; video and DICOM exports will include all frame-based annotations directly in the export file.

Each data row in the NDJSON file can have the following information:

FieldDescriptionIncluded
data_rowContains the basic information of the data row:
- id
- row_data
- global_key
- data_row_details (optional, see below)
Always
details
(data row)
Contains additional details of the data row:
- dataset_id
- created_at
- updated_at
- created_by
Optional
media_attributesSee Media attributesOptional
attachmentsSee AttachmentsOptional
metadata_fieldsSee MetadataOptional
projectsContains the ID of the project in which the data row was labeled.Always
<project_id>Contains the following sections, which are expanded on below:
- labels
- project_details
Always
labelsContains a list of labels attached to this data row:
- label_kind
- version
- id
- label_details (optional, see below)
- performance_details (optional, see below)
- annotations
Always
label_detailsContains details of each specific label:
- created_at
- updated_at
- created_by
- reviews
Optional
performance_detailsContains label-specific performance details:
- seconds_to_create
- seconds_to_review
- skipped
- benchmark_reference_label
- benchmark_score
- consensus_score
- consensus_label_count
- consensus_labels
Optional
project_detailsContains project-specific information about this data row:
- ontology_id
- batch_id
- priority
- consensus_expected_label_count
- workflow_history
- task_id
- task_name
Optional
modelsContains the ID of the model in which the data row was stored.Always
<model_id>Contains the following sections, which are expanded on below:
- model_name
- model_runs
Always
model_nameName of the modelAlways
model_runsContains the ID of the model run in which the data row was stored.Always
<model_run_id>Contains the following sections, which are expanded on below:
- model_run_name
- annotation_group_id
- labels
- predictions
Always
model_run_nameName of the model runAlways
annotation_group_idModel Run Data Row id, similar to data_row_id but in a Model Run's contextAlways
labelsContains a list of labels attached to this data row versioned in this model run:
- label_kind
- version
- id
- annotations
Always
predictionsContains a list of predictions attached to this data row:
- label_kind
- version
- id
- annotations
Optional

Export v2 examples

{
  "data_row": {
    "id": "<id>",
    "external_id": "<id>",
    "global_key": "<id>",
    "row_data": "<url>",
    "details": {
      "dataset_id": "<id>",
      "created_at": "<time>",
      "updated_at": "<time>",
      "created_by": "<email>",
    }
  },
  "media_attributes": {},
  "attachments": [],
  "metadata_fields": [],
  "projects": {
    "<project_id>": {
      "project_name": "",
      "labels": [
        {
          "label_kind": "",
          "version": "1.0.0",
          "id": "<id>",
          "label_details": {
            "created_at": "<time>",
            "updated_at": "<time>",
            "created_by": "<email>",
            "reviews": []
          },
          "performance_details": {
            "seconds_to_create": 0,
            "seconds_to_review": 0,
            "skipped": false
          },
          "annotations": {
            "objects": [],
            "classifications": [],
            "relationships": []
          }
        }
      ],
      "project_details": {
        "ontology_id": "<id>",
        "batch_id": "<id>",
        "priority": 5,
        "consensus_expected_label_count": 1,
        "workflow_history": []
      }
    }
  },
  "models": {
    "<model_id>": {
      "model_name": "",
      "model_runs": {
        "<model_run_id>": {
          "model_run_name": "",
          "annotation_group_id": "<id>",
          "labels": [
            {
              "label_kind": "",
              "version": "1.0.0",
              "id": "<id>",
              "annotations": {
                "objects": [],
                "classifications": [],
                "relationships": []
              }
            }
          ],
          "predictions": [
            {
              "label_kind": "",
              "version": "1.0.0",
              "id": "<id>",
              "annotations": {
                "objects": [],
                "classifications": [],
                "relationships": []
              }
            }
          ]
        }
      }
    }
  }
}

Export data rows from Catalog

Apply and combine filters to query data rows based on similarity, natural language search, annotations, metadata, and more. Then, export the resulting data rows along with information associated with their appearance in any project or model run.

📘

Catalog export v2 limit

You can export up to 10,000 data rows at once from the Catalog.

Note that excluding optional fields from your export will make the process faster and the export file smaller.

To export data rows and their information associated with projects and model runs, follow these general steps:

  1. Navigate to Catalog.
  2. Narrow down your query to 10,000 data rows or less.
  3. Open the dropdown under the number of data rows in the query and select Export data v2 (beta).
  4. Select the optional fields to include and begin the export.

Selecting export fields to include

After determining the data rows to export, a prompt to select optional fields will appear. There are a number of optional fields that provide additional details specific to data rows, labels, and projects. More details on these fields can be found in the Export v2 Glossary.

Additionally, users will have the option to export information on the selected data rows specific to multiple projects and model runs.

Export labels from projects

If this checkbox is selected, you will be prompted to select from the dropdown one or more projects. Only projects in which one or more of the selected data rows have been labeled will appear in the dropdown.

For the selected projects, all labels made in the project will be included in the NDJSON for each respective data row.

Export labels and predictions from model runs

If this checkbox is selected, you will be prompted to select from the dropdown one or more model runs. Only model runs in which one or more of the selected data rows appear will appear in the dropdown.

For the selected model runs, all labels and predictions made in the model run will be included in the NDJSON for each respective data row.

Option 1: Export from multiple datasets

  1. Select All datasets in the top-left corner.
  2. Apply a filter or combination of filters.
  3. Click on All datasets (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.

Option 2: Export from one dataset

  1. Select a dataset from the list of datasets on the left side menu.
  2. Apply a filter or combination of filters, if desired.
  3. Click on (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.

Option 3: Export from a slice

  1. Select Slices in the toggle on the left side menu.
  2. Select an existing slice.
  3. Modify or compliment the filters that comprise the slice, if desired.
  4. Click on (# data rows), then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.

Option 4: Export specific data rows

  1. Hand-select data rows to export using the checkboxes in the top-left corner of the thumbnail of each data row.
  2. Click on # selected in the top-right corner, then select Export data v2 (beta) in the dropdown menu and select any desired optional fields.

Export from Catalog (Python SDK)

Support for export v2 from Catalog via the Python SDK is currently in development.