Import conversational text annotations

How to import annotations on conversational text data and sample import formats.

Open this Colab for an interactive tutorial on importing annotations on conversational data.

Supported annotations

To import annotations in Labelbox, you need to create an annotations payload. In this section, we provide this payload for every supported annotation type.

Labelbox supports two formats for the annotations payload:

  • Python annotation types (recommended)
  • NDJSON

Both are described below.

Entity (Message-based)

ner_annotation = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=0,
        end=8,
        message_id="4" 
    )
)
ner_annotation = { 
    "name": "ner",
    "location": { 
      "start": 0, 
      "end": 8 
    },
    "messageId": "4" # this should match the message 
}

Classification: Free-form text (Message-based)

text_annotation = lb_types.ClassificationAnnotation(
    name="text_convo",
    value=lb_types.Text(answer="the answer to the text questions right here"),
    message_id="0" # Remove argument if importing annotation as a global classification (not message-based)
)
text_annotation_ndjson = {
    "name": "text_convo",
    "answer": "the answer to the text questions right here",
    "messageId": "0" # Remove argument if importing annotation as a global classification (not message-based)
}

Classification: Checklist (Multi-choice, Message-based)

checklist_annotation= lb_types.ClassificationAnnotation(
  name="checklist_convo", # must match your ontology feature"s name
  value=lb_types.Checklist(
      answer = [
        lb_types.ClassificationAnswer(
            name = "first_checklist_answer"
        ), 
        lb_types.ClassificationAnswer(
            name = "second_checklist_answer"
        )
      ]
    ),
  message_id="2" # Remove argument if importing annotation as a global classification (not message-based)
 )

checklist_annotation_ndjson = {
    "name": "checklist_convo",
    "answers": [
        {"name": "first_checklist_answer"},
        {"name": "second_checklist_answer"}
    ],
    "messageId": "2" # Remove argument if importing annotation as a global classification (not message-based)
}

Classification: Radio (Single-choice, Message-based)

radio_annotation = lb_types.ClassificationAnnotation(
    name="radio_convo", 
    value=lb_types.Radio(answer = lb_types.ClassificationAnswer(name = "first_radio_answer")),
    message_id="0" # Remove argument if importing annotation as a global classification (not message-based)
)
radio_annotation_ndjson = {
    "name": "radio_convo",
    "answer": {
        "name": "first_radio_answer"
    },
    "messageId": "0", # Remove argument if importing global classifications 
}

Relationship with Entity (Message-based)

Relationship annotations are only supported for MAL import jobs.

ner_source = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=16,
        end=26,
        message_id="4"
    )
)
ner_target = lb_types.ObjectAnnotation(
    name="ner",
    value=lb_types.ConversationEntity(
        start=29, 
        end=34, 
        message_id="4"
    )
)

ner_relationship = lb_types.RelationshipAnnotation(
    name="relationship",
    value=lb_types.Relationship(
        source=ner_source,
        target=ner_target,
        type=lb_types.Relationship.Type.UNIDIRECTIONAL,
    ))
uuid_source = str(uuid.uuid4())
uuid_target = str(uuid.uuid4())

ner_source_ndjson = { 
        "uuid": uuid_source,             
        "name": "ner",
        "location": { 
            "start": 16, 
            "end": 26 
        },
        "messageId": "4"
    }

ner_target_ndjson = { 
        "uuid": uuid_target,
        "name": "ner",
        "location": { 
            "start": 29, 
            "end": 34
        },
        "messageId": "4"
    }

ner_relationship_annotation_ndjson = {
    "name": "relationship", 
    "relationship": {
      "source": uuid_source, #UUID reference to the source annotation 
      "target": uuid_target, # UUID reference to the target annotation
      "type": "bidirectional"
    }
}

End-to-end example: Import pre-labels or ground truth

Whether you are importing annotations as pre-labels or as ground truth, the steps are very similar. Steps 5 and 6 (creating and importing the annotation payload) are where the process becomes slightly different and is explained below in detail.

Before you start

You must import these libraries to use the code examples in this section.

import labelbox as lb
import labelbox.types as lb_types
from labelbox.schema.queue_mode import QueueMode
import uuid
import json
import numpy as np

Replace with your API key

API_KEY = ""
client = lb.Client(api_key=API_KEY)

Step 1: Import data rows

To attach annotations to a data row, it must first be uploaded to Catalog. Here we create an example data row in Catalog.

# Create one Labelbox dataset

global_key = "conversation-1.json"

asset = {
    "row_data": "https://storage.googleapis.com/labelbox-developer-testing-assets/conversational_text/1000-conversations/conversation-1.json",
    "global_key": global_key
}

dataset = client.create_dataset(name="conversational_annotation_import_demo_dataset")
task = dataset.create_data_rows([asset])
task.wait_till_done()
print("Errors:", task.errors)
print("Failed data rows: ", task.failed_data_rows)

Step 2: Create an ontology

Your project should have the correct ontology set up with all the tools and classifications supported for your annotations. The value for the name parameter should match the name field in your annotations to ensure the correct feature schemas are matched.

Here is an example of creating an ontology programmatically for all the sample annotations above.

ontology_builder = lb.OntologyBuilder(
  tools=[ 
    lb.Tool(tool=lb.Tool.Type.NER,name="ner"),
    lb.Tool(tool=lb.Tool.Type.RELATIONSHIP,name="relationship")
    ], 
  classifications=[ 
    lb.Classification( 
      class_type=lb.Classification.Type.TEXT,
      scope=lb.Classification.Scope.INDEX,  # Remove this line or set scope to "GLOBAL" if importing global text annotations
      instructions="text_convo"), 
    lb.Classification( 
      class_type=lb.Classification.Type.CHECKLIST, 
      scope=lb.Classification.Scope.INDEX,  # Remove this line or set scope to "GLOBAL" if importing global checklist annotations                 
      instructions="checklist_convo", 
      options=[
        lb.Option(value="first_checklist_answer"),
        lb.Option(value="second_checklist_answer")            
      ]
    ), 
    lb.Classification( 
      class_type=lb.Classification.Type.RADIO, 
      instructions="radio_convo", 
      scope=lb.Classification.Scope.INDEX, # Remove this line or set scope to "GLOBAL" if importing global radio  annotations         
      options=[
        lb.Option(value="first_radio_answer"),
        lb.Option(value="second_radio_answer")
      ]
    )
  ]
)

ontology = client.create_ontology("Ontology Conversation Annotations", ontology_builder.asdict())

Step 3: Create a labeling project

Create a project and connect the ontology created above

# Create Labelbox project
project = client.create_project(name="conversational_project", 
                                    media_type=lb.MediaType.Conversational)

# Setup your ontology 
project.setup_editor(ontology) # Connect your ontology and editor to your project

Step 4: Send a batch of data rows to the project

# Setup Batches and Ontology

# Create a batch to send to your MAL project
batch = project.create_batch(
  "first-batch-convo-demo", # Each batch in a project must have a unique name
  global_keys=[global_key], # a list of global keys, data row ids or global keys
  priority=5 # priority between 1(highest) - 5(lowest)
)

print("Batch: ", batch)

Step 5: Create the annotation payload

Create the annotations payload using the snippets of code shown above.

Labelbox supports two formats for the annotations payload: NDJSON and Python annotation types. Both approaches are described below with instructions to compose annotations into Labels attached to the data rows.

The resulting label_ndjson and label from each approach will include every annotation (created above) supported by the respective method.

label = []
label.append(
  lb_types.Label(
    data=lb_types.ConversationData(
      global_key=global_key
    ),
    annotations=[
      ner_annotation,
      text_annotation,
      checklist_annotation,
      radio_annotation,
      ner_source,
      ner_target,
      ner_relationship
    ]
  )
)
label_ndjson = []
for annotations in [
    ner_annotation_ndjson,
    text_annotation_ndjson,
    checklist_annotation_ndjson,
    radio_annotation_ndjson,
    ner_source_ndjson,
    ner_target_ndjson,
    ner_relationship_annotation_ndjson,
    ]:
  annotations.update({
      "dataRow": {
          "globalKey": global_key
      }
  })
  label_ndjson.append(annotations)

Step 6: Import the annotation payload

For both options, you can pass either the label_ndjson and label payload as the value for the predictions or labels parameter.

Here, we opt to use the payload from the NDJSON approach since more example annotations are supported.

Option A: Upload to a labeling project as pre-labels (Model-assisted labeling)

# Upload our label using Model-Assisted Labeling
upload_job = lb.MALPredictionImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name=f"mal_job-{str(uuid.uuid4())}", 
    predictions=label)

upload_job.wait_until_done()
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

Option B: Upload to a labeling project as ground truth

πŸ“˜

Relationship annotations are not supported in label import jobs

# Upload label for this data row in project 
upload_job = lb.LabelImport.create_from_objects(
    client = client, 
    project_id = project.uid, 
    name="label_import_job"+str(uuid.uuid4()),  
    labels=label)

upload_job.wait_until_done();
print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)