> ## Documentation Index
> Fetch the complete documentation index at: https://docs.labelbox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Import text annotations

> Developer guide for importing annotations on text data and sample import formats.

<CardGroup cols={2}>
  <Card title="Open In Colab" icon="infinity" iconType="solid" horizontal href="https://colab.research.google.com/github/Labelbox/labelbox-notebooks/blob/main/annotation_import/text.ipynb" />

  <Card title="GitHub" icon="github" iconType="solid" horizontal href="https://github.com/Labelbox/labelbox-notebooks/blob/main/annotation_import/text.ipynb" />
</CardGroup>

## Overview

To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.

### Annotation payload types

Labelbox supports two formats for the annotations payload:

* Python annotation types (recommended)
  * Provides a seamless transition between third-party platforms, machine learning pipelines, and Labelbox.
  * Allows you to build annotations locally with local file paths, numpy arrays, or URLs
  * Easily convert Python Annotation Type format to NDJSON format to quickly import annotations to Labelbox
  * Supports one-level nested classification (radio, checklist, or free-form text) under a tool or classification annotation.
* JSON
  * Skips formatting annotation payload in the Labelbox Python annotation type
  * Supports any levels of nested classification (radio, checklist, or free-form text) under a tool or classification annotation.

### Label import types

Labelbox additionally supports two types of label imports:

* [Model-assisted labeling (MAL)](/docs/model-assisted-labeling)
  * This workflow allows you to import computer-generated predictions (or simply annotations created outside of Labelbox) as pre-labels on an asset.
* [Ground truth](https://www.google.com/url?q=https%3A%2F%2Fdocs.labelbox.com%2Fdocs%2Fimport-ground-truth)
  * This workflow functionality allows you to bulk import your ground truth annotations from an external or third-party labeling system into Labelbox *Annotate*. Using the label import API to import external data is a useful way to consolidate and migrate all annotations into Labelbox as a single source of truth.

## Supported annotations

The following annotations are supported for an video data row:

* Radio
* Checklist
* Free-form text
* Entity
* Relationships

## Classifications

### Radio (single choice)

<CodeGroup>
  ```python Python annotation theme={null}
  radio_annotation = lb_types.ClassificationAnnotation(
      name="radio_question",
      value=lb_types.Radio(answer =
          lb_types.ClassificationAnswer(name = "first_radio_answer")
      )
  )
  ```

  ```json NDJSON theme={null}
  radio_annotation_ndjson = {
    "name": "radio_question",
    "answer": {"name": "first_radio_answer"}
  }
  ```
</CodeGroup>

### Checklist (multiple choice)

<CodeGroup>
  ```python Python annotation theme={null}
  checklist_annotation = lb_types.ClassificationAnnotation(
      name="checklist_question",
      value=lb_types.Checklist(answer = [
          lb_types.ClassificationAnswer(name = "first_checklist_answer"),
          lb_types.ClassificationAnswer(name = "second_checklist_answer"),
          lb_types.ClassificationAnswer(name = "third_checklist_answer")
      ])
    )
  ```

  ```json NDJSON theme={null}
  checklist_annotation_ndjson = {
    "name": "checklist_question",
    "answer": [
      {"name": "first_checklist_answer"},
      {"name": "second_checklist_answer"},
      {"name": "third_checklist_answer"},
    ]
  }
  ```
</CodeGroup>

### Free-form text

<CodeGroup>
  ```python Python annotation theme={null}
  text_annotation = lb_types.ClassificationAnnotation(
      name = "free_text",
      value = lb_types.Text(answer="sample text")
  )
  ```

  ```json NDJSON theme={null}
  text_annotation_ndjson = {
    "name": "free_text",
    "answer": "sample text",
  }
  ```
</CodeGroup>

### Nested classifications

<CodeGroup>
  ```python Python annotation theme={null}
  nested_radio_annotation = lb_types.ClassificationAnnotation(
    name="nested_radio_question",
    value=lb_types.Radio(
      answer=lb_types.ClassificationAnswer(
        name="first_radio_answer",
        classifications=[
          lb_types.ClassificationAnnotation(
            name="sub_radio_question",
            value=lb_types.Radio(
              answer=lb_types.ClassificationAnswer(
                name="first_sub_radio_answer"
              )
            )
          )
        ]
      )
    )
  )

  nested_checklist_annotation = lb_types.ClassificationAnnotation(
    name="nested_checklist_question",
    value=lb_types.Checklist(
      answer=[lb_types.ClassificationAnswer(
        name="first_checklist_answer",
        classifications=[
          lb_types.ClassificationAnnotation(
            name="sub_checklist_question",
            value=lb_types.Checklist(
              answer=[lb_types.ClassificationAnswer(
              name="first_sub_checklist_answer"
            )]
          ))
        ]
      )]
    )
  )
  ```

  ```json NDJSON theme={null}
  nested_radio_annotation_ndjson= {
    "name": "nested_radio_question",
    "answer": {
        "name": "first_radio_answer",
        "classifications": [
          {
            "name": "sub_radio_question",
            "answer": {"name": "first_sub_radio_answer"}
          }
        ]
      }
  }

  nested_checklist_annotation_ndjson = {
    "name": "nested_checklist_question",
    "answer": [{
        "name": "first_checklist_answer",
        "classifications" : [
          {
            "name": "sub_checklist_question",
            "answer": {"name": "first_sub_checklist_answer"}
          }
        ]
    }]
  }
  ```
</CodeGroup>

## Tools

### Entity

<CodeGroup>
  ```python Python annotation theme={null}
  named_entity = lb_types.TextEntity(start=10, end=20)
  named_entitity_annotation = lb_types.ObjectAnnotation(value=named_entity, name = "named_entity")
  ```

  ```python NDJSON theme={null}
  entities_ndjson = {
      "name": "named_entity",
      "location": {
          "start": 67,
          "end": 128
      }
  }
  ```
</CodeGroup>

### Tool with nested classification

<CodeGroup>
  ```python Python annotation theme={null}
  tool_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
      name=# Feature name,
      value=# Add tool annotation (lb_types."tool"),
      classifications=[
          lb_types.ClassificationAnnotation(
              name="sub_radio_question",
              value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                  name="first_sub_radio_answer")))
      ])
  ```

  ```python Entity Example theme={null}
  bbox_with_radio_subclass_annotation = lb_types.ObjectAnnotation(
      name="entity_with_radio_subclass",
      value=lb_types.TextEntity(start=10, end=20),
      classifications=[
          lb_types.ClassificationAnnotation(
              name="sub_radio_question",
              value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
                  name="first_sub_radio_answer")))
      ])
  ```

  ```python NDJSON theme={null}
  bbox_with_radio_subclass_ndjson = {
      "name": "bbox_with_radio_subclass",
      "classifications": [{
          "name": "sub_radio_question",
          "answer": {
              "name": "first_sub_radio_answer"
          }
      }],
      "bbox": {
          "top": 933,
          "left": 541,
          "height": 191,
          "width": 330
      }
  }
  ```
</CodeGroup>

## Relationship

<CodeGroup>
  ```python Python annotation theme={null}
  relationship = lb_types.RelationshipAnnotation(
      name=# Relationship name,
      value=lb_types.Relationship(
          source=# Source tool,
          target=# Target tool,
          type=lb_types.Relationship.Type.UNIDIRECTIONAL,
      ))
  ```

  ```python Entity Example theme={null}
  ner_source = lb_types.ObjectAnnotation(
      name="named_entity",
      value=lb_types.TextEntity(
        start=133,
        end=140
      )
  )

  ner_target = lb_types.ObjectAnnotation(
      name="named_entity",
      value=lb_types.TextEntity(
        start=143,
        end=159
      )
  )

  ner_relationship = lb_types.RelationshipAnnotation(
      name="relationship",
      value=lb_types.Relationship(
          source=ner_source, # UUID is not required for annotation types
          target=ner_target,
          type=lb_types.Relationship.Type.UNIDIRECTIONAL,
      ))
  ```

  ```python NDJSON theme={null}
  uuid_source = str(uuid.uuid4())
  uuid_target = str(uuid.uuid4())

  entity_source_ndjson = {
    "name": "named_entity",
    "uuid": uuid_source,
    "location": {
            "start" : 133,
            "end": 140
        }
  }

  entity_target_ndjson = {
    "name": "named_entity",
    "uuid": uuid_target,
    "location": {
      "start": 143,
      "end": 159
    }
  }

  ner_relationship_annotation_ndjson = {
      "name": "relationship",
      "relationship": {
        "source": uuid_source, # UUID reference to entity source annotation
        "target": uuid_target, # UUID reference to target source annotation
        "type": "unidirectional"
      }
  }
  ```
</CodeGroup>

## Example: Import pre-labels or ground truths

The steps to import annotations as pre-labels (machine-assisted learning) are similar to those to import annotations as ground truth labels. However, they vary slightly, and we will describe the differences for each scenario.

### Before you start

The below imports are needed to use the code examples in this section.

<CodeGroup>
  ```python Python theme={null}
  import labelbox as lb
  import labelbox.types as lb_types
  import uuid
  import json
  ```
</CodeGroup>

Replace the value of `API_KEY` with a valid [API key](/reference/create-api-key) to connect to the Labelbox client.

<CodeGroup>
  ```python Python theme={null}
  API_KEY = None
  client = lb.Client(API_KEY)
  ```
</CodeGroup>

### Step 1: Import data rows

Data rows must first be uploaded to **Catalog** to attach annotations.

This example shows how to create a data row in **Catalog** by attaching it to a [dataset](/reference/dataset) .

<CodeGroup>
  ```python Python theme={null}
  # You can now include ohter fields like attachments, media type and metadata in the data row creation step: /reference/import-text-data
  global_key = "lorem-ipsum.txt"
  text_asset = {
      "row_data": "https://storage.googleapis.com/labelbox-sample-datasets/nlp/lorem-ipsum.txt",
      "global_key": global_key,
      "media_type": "TEXT",
      "attachments": [{"type": "TEXT_URL", "value": "https://storage.googleapis.com/labelbox-sample-datasets/Docs/text_attachment.txt"}]
      }

  dataset = client.create_dataset(name="text_annotation_import_demo_dataset")
  task = dataset.create_data_rows([text_asset])
  task.wait_till_done()
  print("Errors:",task.errors)
  print("Failed data rows:", task.failed_data_rows)
  ```
</CodeGroup>

### Step 2: Set up ontology

Your project ontology should support the tools and classifications required by your annotations. To ensure accurate schema feature mapping, the value used as the `name` parameter should match the value of the `name` field in your annotation.

For example, when we created an annotation above, we provided a name`annotation_name`. Now, when we set up our ontology, we must ensure that the name of our bounding box tool is also `anotations_name`. The same alignment must hold true for the other tools and classifications we create in our ontology.

This example shows how to create an ontology containing all supported [annotation types](#supported-annotations) .

<CodeGroup>
  ```python Python theme={null}

  ontology_builder = lb.OntologyBuilder(
  classifications=[ # List of Classification objects
    lb.Classification(
      class_type=lb.Classification.Type.RADIO,
      name="radio_question",
      options=[lb.Option(value="first_radio_answer")]
    ),
    lb.Classification(
      class_type=lb.Classification.Type.RADIO,
      name="nested_radio_question",
      options=[
        lb.Option(value="first_radio_answer",
          options=[
              lb.Classification(
                class_type=lb.Classification.Type.RADIO,
                name="sub_radio_question",
                options=[
                  lb.Option(value="first_sub_radio_answer")
                ]
            ),
          ]
        ),
      ],
    ),
     lb.Classification(
      class_type=lb.Classification.Type.CHECKLIST,
      name="nested_checklist_question",
      options=[
          lb.Option("first_checklist_answer",
            options=[
              lb.Classification(
                  class_type=lb.Classification.Type.CHECKLIST,
                  name="sub_checklist_question",
                  options=[lb.Option("first_sub_checklist_answer")]
              )
          ]
        )
      ]
    ),
    lb.Classification(
      class_type=lb.Classification.Type.CHECKLIST,
      name="checklist_question",
      options=[
        lb.Option(value="first_checklist_answer"),
        lb.Option(value="second_checklist_answer"),
        lb.Option(value="third_checklist_answer")
      ]
    ),
     lb.Classification(
      class_type=lb.Classification.Type.TEXT,
      name="free_text"
    )
  ],
  tools=[ # List of Tool objects
         lb.Tool(
            tool=lb.Tool.Type.NER,
            name="named_entity"
          ),
         lb.Tool(
            tool=lb.Tool.Type.RELATIONSHIP,
            name="relationship"
          )
    ]
  )
  ontology = client.create_ontology("Ontology Text Annotations", ontology_builder.asdict())
  ```
</CodeGroup>

### Step 3: Set Up a Labeling Project

<CodeGroup>
  ```python Python theme={null}
  # Project defaults to batch mode with benchmark quality settings if this argument is not provided
  # Queue mode will be deprecated once dataset mode is deprecated

  project = client.create_project(name="text_project_demo",
  queue_mode=lb.QueueMode.Batch,
  media_type=lb.MediaType.Text)

  project.connect_ontology(ontology)
  ```
</CodeGroup>

### Step 4: Send Data Rows to Project

<CodeGroup>
  ```python Python theme={null}
  # Set up batches and ontology

  # Create a batch to send to your MAL project
  batch = project.create_batch(
    "first-batch-text-demo", # Each batch in a project must have a unique name
    global_keys=[global_key] , # a list of global keys, data rows, or data row ids
    priority=5 # priority between 1(highest) - 5(lowest)
  )

  print("Batch: ", batch)
  ```
</CodeGroup>

### Step 5: Create annotation payloads

For help understanding annotation payloads, see [overview](#overview). To declare payloads, you can use Python annotation types (*preferred*) or NDJSON objects. For annotations that you want to import as ground truth labels, you can also specify [benchmarks](/docs/benchmark) using the `is_benchmark_reference` flag.

These examples demonstrate each format and how to compose annotations into labels attached to data rows.

<CodeGroup>
  ```python Python annotation payload theme={null}
  labels = []
  labels.append(
      lb_types.Label(
          data={"global_key" : global_key }
          annotations = [
              named_entitity_annotation,
              radio_annotation,
              checklist_annotation,
              text_annotation,
              ner_source,
              ner_target,
              ner_relationship,
              nested_checklist_annotation,
              nested_radio_annotation
          ],
          # optional: set the label as a benchmark
           is_benchmark_reference = True
      )
  )
  labels = []
  labels.append(
      lb_types.Label(
          data={"global_key" : global_key },
          annotations = [
              named_entitity_annotation,
              radio_annotation,
              checklist_annotation,
              text_annotation,
              nested_checklist_annotation,
              nested_radio_annotation
          ],
          # Optional: set the label as a benchmark
          # Only supported for groud truth imports
          is_benchmark_reference = True
      )
  )
  ```

  ```python NDJSON Payload theme={null}
  label_ndjson = []
  for annotations in [entities_ndjson,
                     radio_annotation_ndjson,
                     checklist_annotation_ndjson,
                     text_annotation_ndjson,
                     nested_radio_annotation_ndjson,
                     nested_checklist_annotation_ndjson,
                     entity_source_ndjson,
                     entity_target_ndjson,
                     ner_relationship_annotation_ndjson,
                      ] :
    annotations.update({
        "dataRow": { "globalKey": global_key }
    })
    label_ndjson.append(annotations)
  ```
</CodeGroup>

### Step 6: Import annotation payload

For prelabeled (model-assisted labeling) scenarios, pass your payload as the value of the `predictions` parameter. For ground truths, pass the payload to the `labels` parameter.

<Warning>
  ### Warning

  Relationship annotations are not supported for ground truth import jobs.
</Warning>

#### Option A: Upload as [prelabels (model assisted labeling)](/docs/model-assisted-labeling)

This option is helpful for speeding up the initial labeling process and reducing the manual labeling workload for high-volume datasets.

<CodeGroup>
  ```python MAL import theme={null}
  # Upload MAL label for this data row in project
  upload_job = lb.MALPredictionImport.create_from_objects(
      client = client,
      project_id = project.uid,
      name="mal_job"+str(uuid.uuid4()),
      predictions=label
  )

  print(f"Errors: {upload_job.errors}", )
  print(f"Status of uploads: {upload_job.statuses}")
  ```
</CodeGroup>

#### Option B: Upload to a labeling project as [ground truth](/docs/import-ground-truth)

This option is helpful for loading high-confidence labels from another platform or previous projects that just need review rather than manual labeling effort.

<CodeGroup>
  ```python Label import theme={null}
  # Upload label for this data row in project
  upload_job = lb.LabelImport.create_from_objects(
      client = client,
      project_id = project.uid,
      name="label_import_job"+str(uuid.uuid4()),
      labels=label
  )

  print(f"Errors: {upload_job.errors}", )
  print(f"Status of uploads: {upload_job.statuses}")
  ```
</CodeGroup>
